Correlation Heatmap in Python

A heatmap is a type of data visualization that depicts the magnitude of an instance or set of values based on two colors. It is a very valuable chart for showing the concentration of values between two different points.

I created a local project called Heatmap with Seaborn to test this.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# get some random data

lstnum = []
import random
for i in range(6):
        lstnum.append(random.randrange(5,8))
lstnum

[6, 5, 6, 6, 5, 7]

# manually create the dataframe with a dictionary of lists.
data = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rowan', 'Sandra', 'Shirley'],
       'tenure': [2, 4, 5, 7, 7, 8],
       'salary': [45000, 55000, 60000, 89000, 91000, 99000],
       'random': lstnum}
df = pd.DataFrame(data)
df

Let’s create a heatmap of correlations.

df1 = df.drop('firstname', axis=1)

plt.figure(figsize=(2.5,2.5))
heatmap = sns.heatmap(df1.corr(), vmin=-1, vmax=1, annot=True, cmap=sns.color_palette("vlag", as_cmap=True))
heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':12}, pad=10)

In the above code, instead of dropping the firstname column, we could simply list out the columns we want and continue to use df. The syntax would look like this: df[[‘my_col’, ‘my_col2’, ‘my_col3’]].corr()… Also, the other way to set the title is to use plt.title(‘The Title’) and then plt.show() instead of heatmap.set_title().

So we see a very high correlation (0.97) between the tenure (number of years worked at the company) and the salary. We see a low correlation between the random numbers and the salary and the tenure.

BeginCodingNow.com

for data analysts & software developers

for data analysts & software developers

Leave a ReplyCancel reply