'Pythonic way to generate seaborn heatmap subplots
I have a dataframe that contains 7 columns. The Regressor column has 3 different regressors (DT, DT-2, and DT-4).
I wanted to generate a correlation heatmap plot.
df_dt = df[(df["Regressor"]=="DT")]
df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()
df_dt2 = df[(df["Regressor"]=="DT-2")]
df_dt2_corr = df_dt2.drop(["Regressor"], axis=1).corr()
df_dt4 = df[(df["Regressor"]=="DT-4")]
df_dt4_corr = df_dt4.drop(["Regressor"], axis=1).corr()
# SUBPLOTS
fig = plt.figure(figsize=(12,6))
plt.subplot(221)
plt.title('Regressor: DT')
sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap = 'Reds_r')
plt.subplot(222)
plt.title('Regressor: DT-2')
sns.heatmap(df_dt2_corr, annot=True, fmt='.2f', square=True, cmap = 'Blues_r')
plt.subplot(223)
plt.title('Regressor: DT-4')
sns.heatmap(df_dt4_corr, annot=True, fmt='.2f', square=True, cmap = 'BuGn_r')
plt.show()
Now, the problem is, if I have 10 regressors, then I have to write 10 times repeated code for each regressor. Which is not a pythonic way or good programming practice.
Is there any way to do the same job in a pythonic way (i.e, using a loop, etc.)?
Please Note: In the demo dataframe I have 3 regressors but in my main dataframe I could have more regressors. So, I need a dynamic way to generate the plot based on the regressors.
Demo Data:
{'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}
Solution 1:[1]
The answer that is already available is to use looping, but I looked around to see if I could use faceted grids to deal with this. Here is a great answer. I've modified it to fit your code. A single data frame is broken down into columns with a category variable to limit the number of columns. The map function draws a heat map with the split data. However, we could not find a way to set up a color map. I think the expansion with a single color map works well for analysis.
import pandas as pd
import seaborn as sns
data = {'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}
df_dt_corr = pd.DataFrame(data)
g = sns.FacetGrid(df_dt_corr, col="Regressor", col_wrap=2)
g.map_dataframe(lambda data, color:sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True))
Solution 2:[2]
This is simply just the case put putting everything inside a loop. First off, the program finds the regressors it should use by getting all the unique values in df['Regressors'].values.
axes is automatically decided based on how many regressors there are. It will try to make a square.
The possible colormaps are defined as colors, change this list if you want different colors. The program starts with the first color, then the second, and so on. If there are too few colors, it will loop back to the start.
regressors = set(df['Regressor'].values)
fig = plt.figure(figsize=(12,6))
import math
axes = (math.ceil(math.sqrt(len(regressors))),) * 2
colors = [
'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds',
'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu',
'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']
for index, regressor in enumerate(regressors):
df_dt = df[(df['Regressor']==regressor)]
df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()
plt.subplot(*axes, index + 1)
plt.title('Regressor: ' + regressor)
sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap=colors[index%len(colors)])
plt.show()
I changed the way you use plt.subplot, as the method you were using only supports up to 9 plots, and it was easier to automatically modify the axes this way.
Solution 3:[3]
Select the unique values first
I stored the unique values in the Regressor column to vals variable. Then I used it to loop around each value. See the solution below:
# get the unique values in "Regressor" column
vals=df['Regressor'].unique()
plt.figure(figsize=[10,10],dpi=200)
plt.suptitle("Correlation Map") # Super Title
# start the loop for selecting data and plotting
for idx, value in enumerate(vals):
#get the dataframe for the unique value and drop the unwanted column using the "iloc"
data=df[df['Regressor']==value].iloc[:,2:] # 2: selects the thrid column onwards
# plot the correlation map
plt.subplot(len(vals),2,idx+1)
plt.title(f"Regressor={value}")
sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True) here
All you have to select here is the number of columns in the columns in the subplots and the supertitle.
Result
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | r-beginners |
| Solution 2 | sommervold |
| Solution 3 |


