Matplotlib is a library in Python language. In order to write this blog post I created a new project called Histogram in Jupyter Notebook.
A histogram is a graph showing frequency distributions. It is a graph showing the number of observations within each given interval.
Here is some code that you can copy and paste into a Jupyter notebook and run.
import numpy as np import pandas as pd import matplotlib.pyplot as plt
Create a pandas DataFrame.
# manually create a dataframe data = {"type": ['A', 'A', 'B', 'B', 'A', 'B', 'A', 'B', 'A', 'B'], "amount": [5, 4, 6, 6, 5, 7, 3, 3, 5, 5]} df = pd.DataFrame(data)
Set up and plot the histogram.
# create a histogram fig = plt.figure(figsize =(3, 2)) # width and height of graph plt.hist(df['amount'], bins=5, color='green') plt.show()
# Sort Counts in Ascending Order df['amount'].value_counts().sort_values()
amount 4 1 7 1 6 2 3 2 5 4 Name: count, dtype: int64
Group by Type and Display with Color
We still want to do a histogram but we need to split the A and B types into separate colors. It is easy if you only have a few different types. This example only has two types. Let’s get to the Python code.
# Filter for each type df_A = df.loc[df['type'] == 'A'] df_B = df.loc[df['type'] == 'B']
Now we can just plot both A and B on the same figure, giving each type their own color.
fig = plt.figure(figsize =(3, 3)) # width and height of graph plt.hist(df_A['amount'], bins=5, color='red', alpha=0.45) plt.hist(df_B['amount'], bins=5, color='blue', alpha=0.45) plt.legend(['A', 'B'], loc='upper left') plt.xlabel("Amount") plt.ylabel("Number of Observations") plt.title('Histogram') plt.show()
You can see where it plots both A and B because the color is a darker color. Remember that we have set an alpha (transparent value here so that it shows the overlap.
Frequency of Types
# create a histogram of the frequency of types plt.figure(figsize =(1.5, 2)) # width and height of graph plt.hist(df['type'], color='pink') plt.xlabel("Types") plt.ylabel("Number of Observations") plt.suptitle('Histogram') plt.show()