This post is the second part of filtering a pandas DataFrame. The first part is in the post called Filtering a pandas DataFrame. In this second part I look at multiple conditions.
Let’s use a simple example by creating a pandas DataFrame manually.
import pandas as pd data1 = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rohan'], 'amount': [12, 67, 33, 41]} df1 = pd.DataFrame(data1) df1
df2 = df.loc[df["amount"] >= 35] df2
The above query returns the Sally and Rohan rows.
# Filter the DataFrame to only include rows where the firstname contains the substring "S" df2 = df.loc[df["firstname"].str.contains("S")] df2
The above query returns Sally and Suzie. The contains function is case sensitive. If I had uses ‘s’ instead, it would not have returned any rows.
Using query method
# Filter the DataFrame to only include rows where the firstname contains the substring "a" # not case sensitive, so I can use s df3 = df.query("firstname.str.contains('a', case=False)") df3
The above query returns the Sally and Rohan rows.
# firstname has an a or a z character df4 = df.query("firstname.str.contains('a', case=False) or firstname.str.contains('z')") df4
The above returns the Sally, Suzie and Rohan rows, which is what we expected.
# the firstname has an 'a' or the amount is 12. df5 = df.query("firstname.str.contains('a', case=False) or amount == 12") df5
The above query returns the Bob, Sally and Rohan rows, which is what we expected. You’ll notice that the quotes need to be different.
# the firstname has an 'a' AND the amount is 12. df6 = df.query("firstname.str.contains('a', case=False) and amount == 67") df6
The above query returns the Sally row.