Do you need to parse a delimited string in Python? Is you string comma-delimited? Are there several strings in a column of data in a pandas DataFrame? As a data professional you may encounter a column of data that needs to be split into multiple columns for better processing. A row may contain a column that classifies the row into categories or types by providing a few categories separated by commas.
Suppose you have a pandas DataFrame of music CDs that has a column called genre. Some music fits neatly into one genre while other CDs are a mix of two or three genres.
A Simple Example Using Split
First, let’s look at the function split(). I created a simple Python example called Split a String. It splits a string into three strings by using the comma as the delimiter. The delimiter does not have to be a comma.
value = "a,d, separate " type(value)
str
value.split(",") # specify the delimiter in the split function
['a', 'd', ' separate ']
type(value)
str
parts = [x.strip() for x in value.split(",")] parts
['a', 'd', 'separate']
type(parts)
list
parts[0]
'a'
More Examples
tweet = 'Some fitness tips for you #fitness #exercise' tweet.split() # splits on white space
['Some', 'fitness', 'tips', 'for', 'you', '#fitness', '#exercise']
print(tweet.split('#')[0]) print(tweet.split('#')[1]) print(tweet.split('#')[2])
Some fitness tips for you fitness exercise