Filtering a pandas DataFrame 2


This post is the second part of filtering a pandas DataFrame. The first part is in the post called Filtering a pandas DataFrame. In this second part I look at multiple conditions.

Let’s use a simple example by creating a pandas DataFrame manually.

1import pandas as pd
2data1 = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rohan'],
3       'amount': [12, 67, 33, 41]}
4df1 = pd.DataFrame(data1)
5df1

Filtering a pandas DataFrame

1df2 = df.loc[df["amount"] >= 35]
2df2

The above query returns the Sally and Rohan rows.

1# Filter the DataFrame to only include rows where the firstname contains the substring "S"
2df2 = df.loc[df["firstname"].str.contains("S")]
3df2

The above query returns Sally and Suzie. The contains function is case sensitive. If I had uses ‘s’ instead, it would not have returned any rows.

Using query method

1# Filter the DataFrame to only include rows where the firstname contains the substring "a"
2# not case sensitive, so I can use s
3df3 = df.query("firstname.str.contains('a', case=False)")
4df3

The above query returns the Sally and Rohan rows.

1# firstname has an a or a z character
2df4 = df.query("firstname.str.contains('a', case=False) or firstname.str.contains('z')")
3df4

The above returns the Sally, Suzie and Rohan rows, which is what we expected.

1# the firstname has an 'a' or the amount is 12.
2df5 = df.query("firstname.str.contains('a', case=False) or amount == 12")
3df5

The above query returns the Bob, Sally and Rohan rows, which is what we expected. You’ll notice that the quotes need to be different.

1# the firstname has an 'a' AND the amount is 12.
2df6 = df.query("firstname.str.contains('a', case=False) and amount == 67")
3df6

The above query returns the Sally row.

Leave a Reply