Do you have a pandas DataFrame that has a column of phone numbers that need to be cleaned? Are the phone numbers in different formats? I created a project called Loop Through pandas DataFrame to illustrate this. Actually, it doesn’t need to be phone numbers. It could be any string that you need to clean by getting rid of delimiters for example.
import numpy as np import pandas as pd import re data = {'CustomerId': [1001,1002,1003,1004,1005], 'First_name': ['Bob','Sally','Linda','Stan','Ahmed'], 'Phone': ['123.555-4320', '(997) 555-3434', '123|555-0839', '1235553205', '123555-8765'], 'Do_Not_Contact': ['N','N','N','Y','N'] } df = pd.DataFrame(data) df
# drop rows that have 'Y' in Do_Not_Contact for x in df.index: if df.loc[x, "Do_Not_Contact"] == "Y": df.drop(x, inplace=True) df
df = df.reset_index(drop=True)
Create a Function
# create a function def clean_ph(phn): new_ph = re.sub('[-()| .]','', phn) return new_ph
We could use the following for loop. It is better to use apply.
for x in df.index: df['Phone'][x] = clean_ph(df['Phone'][x]) df
We can “apply” our custom function.
df['Phone'] = df['Phone'].apply(clean_ph)
A similar post exists here called Apply a Custom Function to a DataFrame.
Format the Phone Number with Dashes
Now that we’ve removed all of the bad characters and just have numbers, we can insert the dashes after we ensure that the phone number is actually a string. We create a function that remains nameless and apply the lambda to the column of the DataFrame.
df['Phone'].apply(lambda x: str(x)) df['Phone'] = df['Phone'].apply(lambda x: x[0:3] + '-' + x[3:6] + '-' + x[6:10]) df
As a final step you might have some not a number values or nulls. We are now working with strings so we can replace these values with empty strings by using the replace function.
df['Phone'] = df['Phone'].str.replace('Na--','')