Apply a Custom Function to a DataFrame


How do we apply a custom Python function to a column in a pandas DataFrame? Here we’ll use two very small datasets and two very simple functions to illustrate how we can do this. To be able to write a function and apply it to every column in a DataFrame is powerful. Here we’ll also illustrate some other options, such as lambda and apply. Feel free to copy this code and modify it for your own projects.

Let’s do this with an example that uses fake made-up data.

import pandas as pd
data = {'company': ['ABC Inc.', 'XYZ Corp.', 'Acme Ltd', 'Widget LLC'],
       'sales': ['$1.286M', '$6.722M', '$3.320M', '$4.197M'],
       'industry': ['Technology', 'Foods', 'Foods', 'Technology'],
        'date_founded': ['2/25/2006', '5/17/2003', '3/7/2011', '11/2/2012'],
         'number':[23, 45, 69, 19]}
df = pd.DataFrame(data)
df

We’ll define a custom function that removes the letters “e” and ‘o”. You would of course write a more useful function, but this is for illustration (pedagogical purposes) only.

def customfunct(a_string):
    # remove the letters "e" and "o"
    a_string = a_string.replace('e','')
    a_string = a_string.replace('o','')
    return a_string

df['industry'] = df2['industry'].apply(customfunct)
df

Built-In Functions

df['sales'] = df['sales'].str.strip('$M').astype(float)
df

Example Two – DataFrame

data = { 'city':  ["New York", "Paris", "Thornbury", "London"],
          'number': [285, 320, 201, 482]}
df = pd.DataFrame(data)
df

Lambda Function & Two Other Ways

If we have a fairly simple function to apply to a column in a DataFrame (which is actually a pandas Series) we can use a lambda function. We can just use the columns and a formula. We can use assign().

# add a column to the DataFrame that doubles the amount
df['twice_sales'] = df['sales'].apply(lambda x: x * 2)
df['amount_doubled'] = df['number'] * 2
df.assign(twotimes = df['number'] * 2)

Custom Functions

def doublethenumber(a_number):
    return a_number * 2

def doublelargenumbers(a_number):
    if a_number > 300:
        x = a_number * 2
    else:
        x = a_number
    return x

We use apply().

df['2X'] = df['number'].apply(doublethenumber)
df['2_X'] = df['number'].apply(doublelargenumbers)

Lambda Double It Function.ipynb

Add a Progress Bar

While applying a method to a DataFrame using apply(), by default, we don’t get to see the progress and an estimated remaining time. When working with large datasets or complex operations this feature can be helpful. To resolve this, instead of using the apply() method, you can use progress_apply() from tqdm.

# to use it, integrate it with pandas
from tqdm.notebook import tqdm
tqdm.pandas()

Below is an example of how to use it.

df['2X'] = df['number'].progress_apply(doublethenumber)

Set a DataFrame Caption (title)

Leave a comment

Your email address will not be published. Required fields are marked *