Histograms in Matplotlib


This entry is part 3 of 8 in the series Matplotlib

Matplotlib is a library in Python language. In order to write this blog post I created a new project called Histogram in Jupyter Notebook.

A histogram is a graph showing frequency distributions. It is a graph showing the number of observations within each given interval.

Here is some code that you can copy and paste into a Jupyter notebook and run.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Create a pandas DataFrame.

# manually create a dataframe
data = {"type": ['A', 'A', 'B', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
        "amount": [5, 4, 6, 6, 5, 7, 3, 3, 5, 5]}
df = pd.DataFrame(data)

Set up and plot the histogram.

# create a histogram
fig = plt.figure(figsize =(3, 2))  # width and height of graph
plt.hist(df['amount'], bins=5, color='green')
plt.show() 

# Sort Counts in Ascending Order
df['amount'].value_counts().sort_values()
amount
4    1
7    1
6    2
3    2
5    4
Name: count, dtype: int64

Group by Type and Display with Color

We still want to do a histogram but we need to split the A and B types into separate colors. It is easy if you only have a few different types. This example only has two types. Let’s get to the Python code.

# Filter for each type
df_A = df.loc[df['type'] == 'A']
df_B = df.loc[df['type'] == 'B']

Now we can just plot both A and B on the same figure, giving each type their own color.

fig = plt.figure(figsize =(3, 3))  # width and height of graph
plt.hist(df_A['amount'], bins=5, color='red', alpha=0.45)
plt.hist(df_B['amount'], bins=5, color='blue', alpha=0.45)
plt.legend(['A', 'B'], loc='upper left')
plt.xlabel("Amount")
plt.ylabel("Number of Observations")
plt.title('Histogram')
plt.show() 

You can see where it plots both A and B because the color is a darker color. Remember that we have set an alpha (transparent value here so that it shows the overlap.

Frequency of Types

# create a histogram of the frequency of types
plt.figure(figsize =(1.5, 2))  # width and height of graph
plt.hist(df['type'], color='pink')
plt.xlabel("Types")
plt.ylabel("Number of Observations")
plt.suptitle('Histogram')
plt.show() 

Series Navigation<< Bar Charts in MatplotlibRandom Histogram in Matplotlib >>

Leave a Reply