Parse a String in Python


Do you need to parse a delimited string in Python? Is you string comma-delimited? Are there several strings in a column of data in a pandas DataFrame? As a data professional you may encounter a column of data that needs to be split into multiple columns for better processing. A row may contain a column that classifies the row into categories or types by providing a few categories separated by commas.

Suppose you have a pandas DataFrame of music CDs that has a column called genre. Some music fits neatly into one genre while other CDs are a mix of two or three genres.

A Simple Example Using Split

First, let’s look at the function split(). I created a simple Python example called Split a String. It splits a string into three strings by using the comma as the delimiter. The delimiter does not have to be a comma.

value = "a,d, separate  "
type(value)
str
value.split(",")    # specify the delimiter in the split function
['a', 'd', ' separate  ']
type(value)
str
parts = [x.strip() for x in value.split(",")]
parts
['a', 'd', 'separate']
type(parts)
list
parts[0]
'a'

More Examples

tweet = 'Some fitness tips for you #fitness #exercise'
tweet.split() # splits on white space
['Some', 'fitness', 'tips', 'for', 'you', '#fitness', '#exercise']
print(tweet.split('#')[0])
print(tweet.split('#')[1])
print(tweet.split('#')[2])
Some fitness tips for you 
fitness 
exercise

Leave a Reply