How do you filter or select rows from a pandas DataFrame and have the output produce a filtered DataFrame? We have a second part to this post Filtering a pandas DataFrame 2.
Let’s use a simple example by creating a pandas DataFrame manually.
import pandas as pd data1 = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rohan'], 'amount': [12, 67, 33, 41]} df1 = pd.DataFrame(data1) df1
Filter.
df2 = df1[df1['amount'] == 67] df2
df3 = df1[df1['firstname'] == 'Bob'] df3
How do we use the OR operator in pandas?
# or is the | symbol; and is the & symbol df4 = df1[(df1['firstname'] == 'Bob') | (df1['firstname'] == 'Suzie')] df4
For or use |. For and use &.
df6 = df1[df1['firstname'].str.contains("S")] df6
This opens up a lot of power. We know that a column in a DataFrame is a Series. Series.str.contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
Have a look at the webpage How do I select a subset of a DataFrame? Scroll down the page for the answered question How do I filter specific rows from a DataFrame? This article uses the Titanic data.
The website Geeks for Geeks has a page on Series.str.contains().