'create a pivot table from a dataset with duplicates
I am trying to create a pivot table and a heat map from this dataset (gender pay)
My code is:
df = df.pivot('Seniority', 'TotalPay', 'Gender')
ax = sns.heatmap(df)
But then I get an error:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self,
key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
6 frames
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Gender'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self,
key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'Gender'
Can anyone help me with this? I tried to removed duplicates using drop_duplicates(), but still does not work. Many thanks in advance.
Solution 1:[1]
My guess is that I created the code with the understanding that I want to create a heatmap of monthly and bonus totals by year and gender. I am posting this as the questioner has given me an opportunity to respond. The grouping of the SENIORITY to be indexed, which is the cause of the error in the pivot, is eliminated. Pivot and transform the data for the heatmap.
import pandas as pd
import seaborn as sns
df = pd.read_csv('/content/Glassdoor Gender Pay Gap.csv', sep=';')
df['TotalPay'] = df['BasePay'] + df['Bonus']
df.head()
JobTitle Gender Age PerfEval Education Dept Seniority BasePay Bonus TotalPay
0 Graphic Designer Female 18 5 College Operations 2 42363 9938 52301
1 Software Engineer Male 21 5 College Management 5 108476 11128 119604
2 Warehouse Associate Female 19 4 PhD Administration 5 90208 9268 99476
3 Software Engineer Male 20 5 Masters Sales 4 108080 10154 118234
4 Graphic Designer Male 26 5 Masters Engineering 5 99464 9319 108783
df = df.groupby(['Seniority', 'Gender'])['TotalPay'].sum().to_frame('TotalPay').reset_index()
df.head()
Seniority Gender TotalPay
0 1 Female 6267525
1 1 Male 9892202
2 2 Female 8738918
3 2 Male 10224439
4 3 Female 10359925
df = df.pivot('Seniority', 'Gender', 'TotalPay')
df
Gender Female Male
Seniority
1 6267525 9892202
2 8738918 10224439
3 10359925 11775112
4 8483675 11932083
5 11273034 11992901
sns.heatmap(df, annot=True, fmt=',')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | r-beginners |

