'Pivot wider to longer with value and unit (currency) columns
My data is in wide format as such:
data = [{
id: '1',
'timestamp': '2021-10-01',
'product_type' : 'Quarterly',
'applicable_tariff_per_local_currency_kwh_d_value' : 1.11,
'applicable_tariff_per_local_currency_kwh_d_unit' : 'CZK/(kWh/d)/q',
'applicable_tariff_per_local_currency_kwh_h_value' : 11.11,
'applicable_tariff_per_local_currency_kwh_h_unit' : 'CZK/(kWh/h)/q',
'applicable_tariff_per_eur_kwh_h_value' : 0.1111,
'applicable_tariff_per_eur_kwh_h_unit' : 'Euro/(kWh/h)/q',
'applicable_tariff_per_eur_kwh_d_value' : 0.0011,
'applicable_tariff_per_eur_kwh_d_unit' : 'Euro/(kWh/d)/q',
'applicable_tariff_in_common_unit_value' : 0.0001111,
'applicable_tariff_in_common_unit_unit' : 'Euro/(kWh/h)/d'
},
{
'id': '2',
'timestamp': '2021-10-01',
'product_type' : 'Quarterly',
'applicable_tariff_per_local_currency_kwh_d_value' : 2.22,
'applicable_tariff_per_local_currency_kwh_d_unit' : 'CZK/(kWh/d)/q',
'applicable_tariff_per_local_currency_kwh_h_value' : 22.22,
'applicable_tariff_per_local_currency_kwh_h_unit' : 'CZK/(kWh/h)/q',
'applicable_tariff_per_eur_kwh_h_value' : 0.2222,
'applicable_tariff_per_eur_kwh_h_unit' : 'Euro/(kWh/h)/q',
'applicable_tariff_per_eur_kwh_d_value' : 0.0022,
'applicable_tariff_per_eur_kwh_d_unit' : 'Euro/(kWh/d)/q',
'applicable_tariff_in_common_unit_value' : 0.0002222,
'applicable_tariff_in_common_unit_unit' : 'Euro/(kWh/h)/d'
}
]
df = pd.DataFrame(data)
I would like to pivot longer this data into the following specified format:
pivot_data = [{
'id': '1',
'timestamp': '2021-10-01',
'product_type': 'Quarterly',
'currency': 'CZK', # Can be obtained through regex on code
'unit': 'kWh/h', # Can be obtained through regex on code
'product_code': 'q', # Can be obtained through regex on code
'code': 'CZK/(kWh/d)/q',
'value': 1.11
},
'id': '1',
'timestamp': '2021-10-01',
'product_type': 'Quarterly',
'currency': 'CZK',
'unit': 'kWh/h',
'product_code': 'q',
'code': 'CZK/(kWh/h)/q',
'value': 11.11
},
'id': '1',
'timestamp': '2021-10-01',
'product_type': 'Quarterly',
'currency': 'Euro',
'unit': 'kWh/h',
'product_code': 'q',
'code': 'Euro/(kWh/h)/q',
'value': 0.1111
},
'id': '1',
'timestamp': '2021-10-01',
'product_type': 'Quarterly',
'currency': 'Euro',
'unit': 'kWh/d',
'product_code': 'q',
'code': 'Euro/(kWh/d)/q',
'value': 0.001111
},
# This one is extra and not super neccesary...
'id': '1',
'timestamp': '2021-10-01',
'product_type': 'common_unit',
'currency': 'Euro',
'unit': 'kWh/h',
'product_code': 'd',
'code': 'Euro/(kWh/d)/d',
'value': 0.001111
}]
df_pivoted = pd.DataFrame(pivot_data)
I know I can do this with pd.pivot_wider_to_longer() or pandas.merge, however I cannot seem to get it to work. Is anyone able to shed more light on this? I've added this example as it was the data I am struggling with, however if you have a more simple dataset you could show me an example of this as well!
I did not add id 2 as it would be too much data in the pivoted_longer dataset.
Thanks in advance.
Solution 1:[1]
Because you require two related columns (value, unit) in the output, you need to run melt twice:
In : d1 = pd.melt(
...: df,
...: id_vars=["id"],
...: value_vars=[
...: "applicable_tariff_per_local_currency_kwh_d_value",
...: "applicable_tariff_per_local_currency_kwh_h_value",
...: "applicable_tariff_per_eur_kwh_h_value",
...: "applicable_tariff_per_eur_kwh_d_value",
...: "applicable_tariff_in_common_unit_value",
...: ],
...: )
In : d1.head(n=2)
Out :
id variable value
0 1 applicable_tariff_per_local_currency_kwh_d_value 1.11
1 2 applicable_tariff_per_local_currency_kwh_d_value 2.22
and same thing for for the unit columns:
In : d2 = pd.melt(
...: df,
...: id_vars=["id"],
...: value_vars=[
...: "applicable_tariff_per_local_currency_kwh_d_unit",
...: "applicable_tariff_per_local_currency_kwh_h_unit",
...: "applicable_tariff_per_eur_kwh_h_unit",
...: "applicable_tariff_per_eur_kwh_d_unit",
...: "applicable_tariff_in_common_unit_unit",
...: ],
...: )
You can now merge these two dataframes to get to your desired result:
d1["variable"]= d1["variable"].str.replace("_value", "")
d2["variable"] = d2["variable"].str.replace("_unit$", "", regex=True)
d2 = d2.rename(columns={"value": "code"})
df_pivoted = d1.merge(d2, on=["id", "variable"]).drop(columns=["variable"])
For the rest you can use pandas string operations to split out the code columns and you can merge with the original dataframe to get the timestamp and product_type info back but this should be clear enough.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
