Filtering a pandas DataFrame 2


This post is the second part of filtering a pandas DataFrame. The first part is in the post called Filtering a pandas DataFrame. In this second part I look at multiple conditions.

Let’s use a simple example by creating a pandas DataFrame manually.

import pandas as pd
data1 = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rohan'],
       'amount': [12, 67, 33, 41]}
df1 = pd.DataFrame(data1)
df1

Filtering a pandas DataFrame

df2 = df.loc[df["amount"] >= 35]
df2

The above query returns the Sally and Rohan rows.

# Filter the DataFrame to only include rows where the firstname contains the substring "S" 
df2 = df.loc[df["firstname"].str.contains("S")] 
df2

The above query returns Sally and Suzie. The contains function is case sensitive. If I had uses ‘s’ instead, it would not have returned any rows.

Using query method

# Filter the DataFrame to only include rows where the firstname contains the substring "a" 
# not case sensitive, so I can use s
df3 = df.query("firstname.str.contains('a', case=False)") 
df3

The above query returns the Sally and Rohan rows.

# firstname has an a or a z character
df4 = df.query("firstname.str.contains('a', case=False) or firstname.str.contains('z')") 
df4

The above returns the Sally, Suzie and Rohan rows, which is what we expected.

# the firstname has an 'a' or the amount is 12.
df5 = df.query("firstname.str.contains('a', case=False) or amount == 12") 
df5

The above query returns the Bob, Sally and Rohan rows, which is what we expected. You’ll notice that the quotes need to be different.

# the firstname has an 'a' AND the amount is 12.
df6 = df.query("firstname.str.contains('a', case=False) and amount == 67") 
df6

The above query returns the Sally row.

Leave a Reply