Groupby Multiple Columns


This entry is part 4 of 3 in the series Pandas EDA Structuring

How do you take a pandas DataFrame of data (dataset) and produce a multi-column bar chart? You can manipulate the DataFrame with groupby first to aggregate the data. Then you can use matplotlib and seaborn to plot the data.

To do this we’ll work with an example. We have a list of companies that belong to two different industries and two different regions. Here I create a DataFrame manually and call it df. When we group the data, we’ll create a new DataFrame and call it df_2. We’ll use that DataFrame to plot the two bar charts.

1import pandas as pd
2data = {'company': ['ABC Inc.', 'XYZ Corp.', 'Acme Ltd', 'Widget LLC', 'Smith J', 'Johnson & Co.', 'Jackson, B',
3                    'ABC Corp.', 'Acme LLC', 'Doohickey Co.'],
4       'sales': [1286, 6722, 4320, 4197, 7228, 1980, 9801, 887, 2208, 7794],
5       'industry': ['Technology', 'Foods', 'Foods', 'Technology', 'Technology', 'Foods', 'Technology', 'Foods',
6                   'Technology', 'Foods'],
7       'date_founded': ['2/25/2022', '5/17/2022', '3/7/2022', '11/2/2022', '7/12/2022', '2/15/2022', '6/15/2022',
8                        '5/15/2022', '9/2/20222', '11/14/2022'],
9       'geo_region': ['A', 'B', 'A', 'A', 'B', 'B', 'A', 'B', 'A', 'B']}
10df = pd.DataFrame(data)
11df

Click to Enlarge

Groupby Two Columns and Sum Another Column

1# Groupby industry and geo_region, and sum the sales column
2df_2 = df.groupby(['industry','geo_region'])['sales'].sum().reset_index()
3df_2
4# in the above code, ['geo_region', 'industry'] would also work fine.

1import matplotlib.pyplot as plt
2import seaborn as sns
3plt.figure(figsize=(6,3))
4sns.barplot(x=df_2['industry'],
5           y=df_2['sales'],
6           hue=df_2['geo_region'])
7plt.title('Sales by Industry and Geographic Region')
8plt.xlabel('Industry')
9plt.ylabel('Sales in USD')
10plt.legend(title='Region',loc='upper left')
11plt.plot()
12plt.show()

Once you have the DataFrame (df2) prepared, you can build two types of graphs. Below is the second one. Notice that we’ve switched the legend and the x-axis.

1plt.figure(figsize=(7,3))
2sns.barplot(x=df_2['geo_region'],
3           y=df_2['sales'],
4           hue=df_2['industry'])
5plt.title('Sales by Geographic Region and Industry')
6plt.xlabel('Region')
7plt.ylabel('Sales in USD')
8plt.legend(title='Industry',loc='upper left')
9plt.plot()
10plt.show()

Warning

Here is an article called pandas groupby warning.

Series Navigation<< Data Type Conversion in pandas

Leave a Reply