A heatmap is a type of data visualization that depicts the magnitude of an instance or set of values based on two colors. It is a very valuable chart for showing the concentration of values between two different points.
I created a local project called Heatmap with Seaborn to test this.
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # get some random data lstnum = [] import random for i in range(6): lstnum.append(random.randrange(5,8)) lstnum
[6, 5, 6, 6, 5, 7]
# manually create the dataframe with a dictionary of lists. data = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rowan', 'Sandra', 'Shirley'], 'tenure': [2, 4, 5, 7, 7, 8], 'salary': [45000, 55000, 60000, 89000, 91000, 99000], 'random': lstnum} df = pd.DataFrame(data) df
Let’s create a heatmap of correlations.
df1 = df.drop('firstname', axis=1) plt.figure(figsize=(2.5,2.5)) heatmap = sns.heatmap(df1.corr(), vmin=-1, vmax=1, annot=True, cmap=sns.color_palette("vlag", as_cmap=True)) heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':12}, pad=10)
In the above code, instead of dropping the firstname column, we could simply list out the columns we want and continue to use df. The syntax would look like this: df[[‘my_col’, ‘my_col2’, ‘my_col3’]].corr()… Also, the other way to set the title is to use plt.title(‘The Title’) and then plt.show() instead of heatmap.set_title().
So we see a very high correlation (0.97) between the tenure (number of years worked at the company) and the salary. We see a low correlation between the random numbers and the salary and the tenure.