'Displaying a stacked bar graph with nested lists
I am trying to display a stacked bar graph.I have 3 lists as shown below-
totalpointperxaxis [6, 9, 13, 5, 14, 382, 26, 2, 45, 2]
clusternamesList [['Cluster1', 'Cluster2'], ['Cluster1', 'Cluster3'], ['Cluster2', 'Cluster4'], ['Cluster1', 'Cluster3'], ['Cluster2', 'Cluster5'], ['Cluster3', 'Cluster6', 'Cluster7'], ['Cluster2', 'Cluster4', 'Cluster6', 'Cluster7'], ['Cluster1', 'Cluster3'], ['Cluster1', 'Cluster2', 'Cluster4', 'Cluster5', 'Cluster6'], ['Cluster1', 'Cluster3']]
ppclusterList [[1, 5], [4, 5], [12, 1], [1, 4], [13, 1], [6, 173, 203], [21, 2, 1, 2], [1, 1], [2, 34, 2, 6, 1], [1, 1]]
Here, "totalpointperxaxis" would define the heights of each bar "ppcluster" (pointspercluster) , would be part of each of these bars, colour coordinated according to the cluster name. The number of clusters is not known before hand and the list may change when i add more data points.
As you can see, each list has 10 sublists. The idea is to display a stacked bar graph as seen in this example enter image description here
Solution 1:[1]
An idea is to first create a long form dataframe to collect all the values. And then transform it to a pivot_table to be plotted.
import pandas as pd
import numpy as np
clusternamesList = [['Cluster1', 'Cluster2'], ['Cluster1', 'Cluster3'], ['Cluster2', 'Cluster4'], ['Cluster1', 'Cluster3'], ['Cluster2', 'Cluster5'], ['Cluster3', 'Cluster6', 'Cluster7'], ['Cluster2', 'Cluster4', 'Cluster6', 'Cluster7'], ['Cluster1', 'Cluster3'], ['Cluster1', 'Cluster2', 'Cluster4', 'Cluster5', 'Cluster6'], ['Cluster1', 'Cluster3']]
ppclusterList = [[1, 5], [4, 5], [12, 1], [1, 4], [13, 1], [6, 173, 203], [21, 2, 1, 2], [1, 1], [2, 34, 2, 6, 1], [1, 1]]
df = pd.DataFrame([{'id': point_id, 'cluster': cluster, 'point': point}
for point_id, (clusternames, ppcluster) in enumerate(zip(clusternamesList, ppclusterList))
for cluster, point in zip(clusternames, ppcluster)])
df_table = df.pivot_table(values='point', index='id', columns='cluster', fill_value=0)
df_table.plot.bar(stacked=True, rot=0)
The dataframe looks like:
id cluster point
0 0 Cluster1 1
1 0 Cluster2 5
2 1 Cluster1 4
3 1 Cluster3 5
4 2 Cluster2 12
5 2 Cluster4 1
6 3 Cluster1 1
7 3 Cluster3 4
8 4 Cluster2 13
9 4 Cluster5 1
10 5 Cluster3 6
11 5 Cluster6 173
12 5 Cluster7 203
13 6 Cluster2 21
14 6 Cluster4 2
15 6 Cluster6 1
16 6 Cluster7 2
17 7 Cluster1 1
18 7 Cluster3 1
19 8 Cluster1 2
20 8 Cluster2 34
21 8 Cluster4 2
22 8 Cluster5 6
23 8 Cluster6 1
24 9 Cluster1 1
25 9 Cluster3 1
It could be handy to store the data directly in this form instead of as nested lists.
The pivot table then looks like:
cluster Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7
id
0 1 5 0 0 0 0 0
1 4 0 5 0 0 0 0
2 0 12 0 1 0 0 0
3 1 0 4 0 0 0 0
4 0 13 0 0 1 0 0
5 0 0 6 0 0 173 203
6 0 21 0 2 0 1 2
7 1 0 1 0 0 0 0
8 2 34 0 2 6 1 0
9 1 0 1 0 0 0 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | JohanC |

