A set is a data structure in Python that contains only unordered, non-interchangeable elements. A set is an unordered collection of unique elements. There can be no duplicates. You can think of them like dictionaries, with keys only, no values. Each set element is unique and immutable. The set itself is mutable.
The basic Python data structures in Python include list, set, tuples, and dictionary. Each of the data structures is unique in its own way. Data structures are “containers” that organize and group data according to type. I’ve created a small example program in my Jupyter Notebook (Anaconda)
Sets support mathematical set operations like union, intersection, difference, and symmetric difference.
Creating Sets
There are two ways to create a set. The first way is with the set function. The set function takes an iterable as an argument and returns a new set object. The second way is with a set literal with curly braces { }. The set() function is the preferrable way to create a set. If you try to create a set with empty curly braces, the computer will interpret that as and empty dictionary.
Below is a bit of code to practice with.
my_list = ['foo', 'hello', 'foo'] my_set = set(my_list) # can do in one line: my_set = (['foo', 'hello', 'foo']) print(my_set) my_set_2 = {2, 2, 5, 19} print(my_set_2) my_set_3 = {2, 'hello', 21} print(my_set_3) # pass a tuple through the set function using two sets of parentheses # the inner brackets signify a tuple # the outer parentheses are required for the set() function to take a single arguement set_4 = set(('foo', 'baz', 'bar', 'baz', 7)) print(set_4) x = set('hello world') print(x)
We have functions available to us when we are working with sets. We have intersection()
Create a set from a series from a dataframe in pandas
Do you have a series in pandas that you are examining and wondering what the unique values are? Perhaps you are a data analyst and you have a DataFrame. You are exploring the data and are wondering about a particular column. It’s suppose to have only North, South, East and West in it so you wonder if there are any typos. What if you could get a unique list from that column in the DataFrame?
import pandas as pd # import the pandas library into Python data = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rohan'], 'amount': [12, 67, 33, 41], 'group': ['B', 'A', 'A', 'B']} df = pd.DataFrame(data) df
Create a series called ser from a column in a DataFrame.
ser = df['firstname'] ser
From the series ser, we can use the set() function to get unique values. Sets must contain unique values, by definition.
# find unique values set(ser)
Here’s the output.
# find unique values set(ser)
Sets support mathematical set operations like union, intersection, difference, symmetric difference.
a = {1, 2, 3, 4, 5} b = {3, 4, 5, 6, 7, 8}
a union b will combine everything from a with b and then remove the duplicates.
a.union(b)
a.intersection(b)
The output is {3, 4, 5}.