Clean Phone Numbers in Pandas


Do you have a pandas DataFrame that has a column of phone numbers that need to be cleaned? Are the phone numbers in different formats? I created a project called Loop Through pandas DataFrame to illustrate this. Actually, it doesn’t need to be phone numbers. It could be any string that you need to clean by getting rid of delimiters for example.

1import numpy as np
2import pandas as pd
3import re
4data = {'CustomerId': [1001,1002,1003,1004,1005],
5        'First_name': ['Bob','Sally','Linda','Stan','Ahmed'],
6        'Phone': ['123.555-4320', '(997) 555-3434', '123|555-0839', '1235553205', '123555-8765'],
7        'Do_Not_Contact': ['N','N','N','Y','N']
8         }
9df = pd.DataFrame(data)
10df

1# drop rows that have 'Y' in Do_Not_Contact
2for x in df.index:
3    if df.loc[x, "Do_Not_Contact"] == "Y":
4        df.drop(x, inplace=True)
5df

1df = df.reset_index(drop=True)

Create a Function

1# create a function
2def clean_ph(phn):
3    new_ph = re.sub('[-()| .]','', phn)
4    return new_ph

We could use the following for loop. It is better to use apply.

1for x in df.index:
2    df['Phone'][x] = clean_ph(df['Phone'][x])
3 
4df

We can “apply” our custom function.

1df['Phone'] = df['Phone'].apply(clean_ph)

A similar post exists here called Apply a Custom Function to a DataFrame.

Format the Phone Number with Dashes

Now that we’ve removed all of the bad characters and just have numbers, we can insert the dashes after we ensure that the phone number is actually a string. We create a function that remains nameless and apply the lambda to the column of the DataFrame.

1df['Phone'].apply(lambda x: str(x))
2df['Phone'] = df['Phone'].apply(lambda x: x[0:3] + '-' + x[3:6] + '-' + x[6:10])
3df

As a final step you might have some not a number values or nulls. We are now working with strings so we can replace these values with empty strings by using the replace function.

1df['Phone'] = df['Phone'].str.replace('Na--','')

Leave a Reply