Split a Delimited String in a DataFrame


Are you needing to split a string inside a column of a pandas DataFrame? Do you want to split it out into more columns so that only one part of the delimited string is in a column? This is a very easy way to get this done. Parsing is the process of analyzing a string of characters. Parsing means to make something understandable (by analyzing its parts).

To illustrate how string splitting works, I created a simple DataFrame manually with the code below. I use Jupyter Notebook.

import pandas as pd
data = {'artist': ['Beatles', 'Michael Jackson', 'Eagles', 'empty', 'nothing'],
       'genre': ['rock,pop', 'pop', 'country,  pop , rock', '', None]}
df = pd.DataFrame(data)
df

Here below is what that look like.

# Adding three new columns to the existing dataframe.
# splitting is done on the basis of commas.
df[['genre1','genre2', 'genre3']] = df.genre.str.split(",",expand=True)
df

The empty string is NOT null, but the “None” in the DataFrame IS null.

Leave a comment

Your email address will not be published. Required fields are marked *