This post is the second part of filtering a pandas DataFrame. The first part is in the post called Filtering a pandas DataFrame. In this second part I look at multiple conditions.
Let’s use a simple example by creating a pandas DataFrame manually.
import pandas as pd
data1 = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rohan'],
'amount': [12, 67, 33, 41]}
df1 = pd.DataFrame(data1)
df1
df2 = df.loc[df["amount"] >= 35] df2
The above query returns the Sally and Rohan rows.
# Filter the DataFrame to only include rows where the firstname contains the substring "S"
df2 = df.loc[df["firstname"].str.contains("S")]
df2
The above query returns Sally and Suzie. The contains function is case sensitive. If I had uses ‘s’ instead, it would not have returned any rows.
Using query method
# Filter the DataFrame to only include rows where the firstname contains the substring "a"
# not case sensitive, so I can use s
df3 = df.query("firstname.str.contains('a', case=False)")
df3
The above query returns the Sally and Rohan rows.
# firstname has an a or a z character
df4 = df.query("firstname.str.contains('a', case=False) or firstname.str.contains('z')")
df4
The above returns the Sally, Suzie and Rohan rows, which is what we expected.
# the firstname has an 'a' or the amount is 12.
df5 = df.query("firstname.str.contains('a', case=False) or amount == 12")
df5
The above query returns the Bob, Sally and Rohan rows, which is what we expected. You’ll notice that the quotes need to be different.
# the firstname has an 'a' AND the amount is 12.
df6 = df.query("firstname.str.contains('a', case=False) and amount == 67")
df6
The above query returns the Sally row.
