Data Types in pandas


This entry is part 1 of 2 in the series Pandas Data Types

Are you performing data analysis on a dataset? In other words, are you performing exploratory data analysis (EDA)? One of the very first things you will want to know about the dataset is the data types. Are you working with a pandas DataFrame? Let’s look at a very simple example in Python. It’s important to ensure you are using the correct data types, otherwise you may get unexpected results or errors.

Let’s create a DataFrame manually in Python. I’m using Anaconda on Windows 11.

import pandas as pd    # import the pandas library into Python
data = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rohan'],
       'dollars': [12, 67, 33, 41.50],
       'age': [45, 48, 29, 52],
       'group': ['B', 'A', 'A', 'B']}
df = pd.DataFrame(data)
df

In my Jupyter Notebook of Anaconda, here is what the DataFrame looks like.

You can use dtypes to see the data types of each of the columns. Here’s the input and output.

df.dtypes
firstname     object
dollars      float64
age            int64
group         object
mixed         object
dtype: object

Let’s use info().

df.info()

Below is the output.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   firstname  4 non-null      object 
 1   dollars    4 non-null      float64
 2   age        4 non-null      int64  
 3   group      4 non-null      object 
 4   mixed      4 non-null      object 
dtypes: float64(1), int64(1), object(3)
memory usage: 292.0+ bytes


Object

The object data type can contain multiple different types. For instance, the a column could include integers, floats and strings which collectively are labeled as an object . Therefore, you may need some additional techniques to handle mixed data types in object columns.

Series NavigationExploring Data Types in pandas >>

Leave a Reply