Filtering a pandas DataFrame


How do you filter or select rows from a pandas DataFrame and have the output produce a filtered DataFrame? We have a second part to this post Filtering a pandas DataFrame 2.

Let’s use a simple example by creating a pandas DataFrame manually.

import pandas as pd
data1 = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rohan'],
       'amount': [12, 67, 33, 41]}
df1 = pd.DataFrame(data1)
df1

Filtering a pandas DataFrame

Filter.

df2 = df1[df1['amount'] == 67] 
df2

df3 = df1[df1['firstname'] == 'Bob'] 
df3

How do we use the OR operator in pandas?

# or is the | symbol; and is the & symbol
df4 = df1[(df1['firstname'] == 'Bob') | (df1['firstname'] == 'Suzie')] 
df4

For or use |. For and use &.

df6 = df1[df1['firstname'].str.contains("S")]
df6

This opens up a lot of power. We know that a column in a DataFrame is a Series. Series.str.contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

Have a look at the webpage How do I select a subset of a DataFrame? Scroll down the page for the answered question How do I filter specific rows from a DataFrame? This article uses the Titanic data.

The website Geeks for Geeks has a page on Series.str.contains().

Leave a comment

Your email address will not be published. Required fields are marked *