'How to set categorical row variables as columns [duplicate]
I have a dataframe which looks like this:
date day normalized_returns
0 2020-01-02 Thursday
1 2020-01-03 Friday 0.4707137200215769
2 2020-01-06 Monday 0.23570968223068864
3 2020-01-07 Tuesday -0.001668590491460948
4 2020-01-08 Wednesday 0.22958625059099985
5 2020-01-09 Thursday -0.08590873731461762
6 2020-01-10 Friday -0.022114285207011284
7 2020-01-13 Monday 0.051517831335272615
8 2020-01-14 Tuesday 0.0750307910411988
9 2020-01-15 Wednesday 0.07460670647513437
10 2020-01-16 Thursday -0.04093466297777635
11 2020-01-17 Friday -0.020633373620607427
12 2020-01-21 Tuesday 0.21581535779928024
13 2020-01-22 Wednesday -0.07193613192906866
14 2020-01-23 Thursday 0.06265156014426786
15 2020-01-24 Friday -0.040948868344091706
16 2020-01-27 Monday 0.4539355576206815
17 2020-01-28 Tuesday -0.1635280764724063
18 2020-01-29 Wednesday -0.0851700137388777
19 2020-01-30 Thursday 0.20162612444937758
With that, I'm trying to create a dataframe where the columns are day and the values under each day are the normalized returns associated with each day, such as in this format:
Monday Tuesday Wednesday Thursday Friday
returns 0.034 0.099 0.02 0.007 -0.007
returns 0.283 0.115 0.025 0.009 -0.04
returns 0.253 0.044 0.019 0.011 0.045
... ... ... ... ... ...
... ... ... ... ... ...
I've tried using pivot table, but that returns a mean aggregation of the data in the form of:
Monday Tuesday Wednesday Thursday Friday
value 0.066 0.09 0.04 0.009 -0.015
However, this doesn't obtain the correct result.
I've looked at the question linked here:, but the dataframe in question has categorical row values to set as the index, where here I just have sequential dates.
I could attempt to make unique identifiers of each day as a count, but unsure how to do that.
Any help or advice to obtain the answer is greatly appreciated; thank you!
Solution 1:[1]
Does this work:
years = df.date.dt.year.rename('year')
weeks = df.date.dt.isocalendar().week.rename('week')
df.set_index([years, weeks, 'day']).normalized_returns.unstack()
day Friday Monday Thursday Tuesday Wednesday
year week
2020 1 0.470714 NaN NaN NaN NaN
2 -0.022114 0.235710 -0.085909 -0.001669 0.229586
3 -0.020633 0.051518 -0.040935 0.075031 0.074607
4 -0.040949 NaN 0.062652 0.215815 -0.071936
5 NaN 0.453936 0.201626 -0.163528 -0.085170
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | piRSquared |
