'Getting alternating results with pandas melt
I was trying to convert the first image in this album into the second with pandas but all I got was the third one...
- Original
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1981 453.1 126.3 5.8 47.1 25.3 16.8 0 1.1 4.4 17.8 52.5 72.4
1982 211.4 23.1 231.2 0.8 0.2 0 0 0 15.3 0.9 8.6 59.9
1983 45.2 22.1 537.7 22.8 29.9 0 0 0.1 0.7 1.2 47 20.9
1984 390.2 514.2 140.3 7.3 0 0 2.8 0.1 0 18.3 23.2 91.7
- Wanted result:
Year Month Value
1981 Jan 453.1
1981 Feb 126.3
1981 Mar 5.8
1981 Apr 47.1
...
- What I have:
Year Month Value
1981 Jan 453.1
1982 Jan 211.4
1983 Jan 45.2
1984 Jan 390.2
...
My code is just this:
data = pd.read_csv("Burketown.csv")
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
fixed_data = pd.melt(data, id_vars=['Year'], value_vars=months)
print(fixed_data)
How can I alternate the months for each year instead of what I got? I want to have the first year with every month, then the second year with every month... Not the first month with every year.
Solution 1:[1]
Use ordered catagorical for columns names, so possible DataFrame.sort_values
by both columns for correct ordering:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
data = pd.read_csv("Burketown.csv", index_col=['Year'])[months]
data.columns = pd.CategoricalIndex(data.columns, ordered=True, categories=months)
df = data.reset_index()[months]
fixed_data = (pd.melt(data, id_vars=['Year'], value_vars=months)
.sort_values(['Year', 'variable']))
Or convert Series to ordered categorical
:
data = pd.read_csv("Burketown.csv")
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
fixed_data = pd.melt(data, id_vars=['Year'], value_vars=months)
fixed_data['variable'] = (pd.CategoricalIndex(fixed_data['variable'],
ordered=True,
categories=months))
fixed_data = fixed_data.sort_values(['Year', 'variable'])
Solution 2:[2]
One option where you do not need to convert to categorical dtype, and maintain the order is with pivot_longer from pyjanitor
# pip install pyjanitor
import pandas as pd
import janitor
result = df.pivot_longer('Year', sort_by_appearance=True)
result.head(10)
Year variable value
0 1981 Jan 453.1
1 1981 Feb 126.3
2 1981 Mar 5.8
3 1981 Apr 47.1
4 1981 May 25.3
5 1981 Jun 16.8
6 1981 Jul 0.0
7 1981 Aug 1.1
8 1981 Sep 4.4
9 1981 Oct 17.8
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | sammywemmy |