Clean Phone Numbers in Pandas


Do you have a pandas DataFrame that has a column of phone numbers that need to be cleaned? Are the phone numbers in different formats? I created a project called Loop Through pandas DataFrame to illustrate this. Actually, it doesn’t need to be phone numbers. It could be any string that you need to clean by getting rid of delimiters for example.

import numpy as np
import pandas as pd
import re
data = {'CustomerId': [1001,1002,1003,1004,1005],
        'First_name': ['Bob','Sally','Linda','Stan','Ahmed'],
        'Phone': ['123.555-4320', '(997) 555-3434', '123|555-0839', '1235553205', '123555-8765'],
        'Do_Not_Contact': ['N','N','N','Y','N']
         }
df = pd.DataFrame(data)
df

# drop rows that have 'Y' in Do_Not_Contact
for x in df.index:
    if df.loc[x, "Do_Not_Contact"] == "Y":
        df.drop(x, inplace=True)
df

df = df.reset_index(drop=True)

Create a Function

# create a function
def clean_ph(phn):
    new_ph = re.sub('[-()| .]','', phn)
    return new_ph

We could use the following for loop. It is better to use apply.

for x in df.index:
    df['Phone'][x] = clean_ph(df['Phone'][x])

df

We can “apply” our custom function.

df['Phone'] = df['Phone'].apply(clean_ph)

A similar post exists here called Apply a Custom Function to a DataFrame.

Format the Phone Number with Dashes

Now that we’ve removed all of the bad characters and just have numbers, we can insert the dashes after we ensure that the phone number is actually a string. We create a function that remains nameless and apply the lambda to the column of the DataFrame.

df['Phone'].apply(lambda x: str(x))
df['Phone'] = df['Phone'].apply(lambda x: x[0:3] + '-' + x[3:6] + '-' + x[6:10])
df

As a final step you might have some not a number values or nulls. We are now working with strings so we can replace these values with empty strings by using the replace function.

df['Phone'] = df['Phone'].str.replace('Na--','')

Leave a comment

Your email address will not be published. Required fields are marked *