How do we apply a custom Python function to a column in a pandas DataFrame? Here we’ll use two very small datasets and two very simple functions to illustrate how we can do this. To be able to write a function and apply it to every column in a DataFrame is powerful. Here we’ll also illustrate some other options, such as lambda and apply. Feel free to copy this code and modify it for your own projects.
Let’s do this with an example that uses fake made-up data.
import pandas as pd data = {'company': ['ABC Inc.', 'XYZ Corp.', 'Acme Ltd', 'Widget LLC'], 'sales': ['$1.286M', '$6.722M', '$3.320M', '$4.197M'], 'industry': ['Technology', 'Foods', 'Foods', 'Technology'], 'date_founded': ['2/25/2006', '5/17/2003', '3/7/2011', '11/2/2012'], 'number':[23, 45, 69, 19]} df = pd.DataFrame(data) df
We’ll define a custom function that removes the letters “e” and ‘o”. You would of course write a more useful function, but this is for illustration (pedagogical purposes) only.
def customfunct(a_string): # remove the letters "e" and "o" a_string = a_string.replace('e','') a_string = a_string.replace('o','') return a_string
df['industry'] = df2['industry'].apply(customfunct) df
Built-In Functions
df['sales'] = df['sales'].str.strip('$M').astype(float) df
Example Two – DataFrame
data = { 'city': ["New York", "Paris", "Thornbury", "London"], 'number': [285, 320, 201, 482]} df = pd.DataFrame(data) df
Lambda Function & Two Other Ways
If we have a fairly simple function to apply to a column in a DataFrame (which is actually a pandas Series) we can use a lambda function. We can just use the columns and a formula. We can use assign().
# add a column to the DataFrame that doubles the amount df['twice_sales'] = df['sales'].apply(lambda x: x * 2) df['amount_doubled'] = df['number'] * 2 df.assign(twotimes = df['number'] * 2)
Custom Functions
def doublethenumber(a_number): return a_number * 2 def doublelargenumbers(a_number): if a_number > 300: x = a_number * 2 else: x = a_number return x
We use apply().
df['2X'] = df['number'].apply(doublethenumber) df['2_X'] = df['number'].apply(doublelargenumbers)
Lambda Double It Function.ipynb
Add a Progress Bar
While applying a method to a DataFrame using apply(), by default, we don’t get to see the progress and an estimated remaining time. When working with large datasets or complex operations this feature can be helpful. To resolve this, instead of using the apply() method, you can use progress_apply() from tqdm.
# to use it, integrate it with pandas from tqdm.notebook import tqdm tqdm.pandas()
Below is an example of how to use it.
df['2X'] = df['number'].progress_apply(doublethenumber)