'pandas - explanation of unstack method description
Please explain what the unstack function description DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels means.
pandas.DataFrame.unstack(level=- 1, fill_value=None)
Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.
Parameters
level: int, str, or list of these, default -1 (last level)
Level(s) of index to unstack, can pass level name.fill_value: int, str or dict
Replace NaN with this value if the unstack produces missing values.
Suppose there is a long format dataframe.
def make_long_aapl():
day_1 = dt.date(2015, 12, 29)
day_2 = dt.date(2015, 12, 30)
col_close = 'close'
col_open = 'open'
cols = ['date', 'ticker', 'attribute', 'value']
rv = pd.DataFrame([
{'ticker': 'AAPL', 'date': day_1, 'attribute': col_open, 'value': 106.96},
{'ticker': 'AAPL', 'date': day_1, 'attribute': col_close, 'value': 108.74},
{'ticker': 'AAPL', 'date': day_2, 'attribute': col_open, 'value': 108.58},
{'ticker': 'AAPL', 'date': day_2, 'attribute': col_close, 'value': 107.32},
{'ticker': 'MSFT', 'date': day_1, 'attribute': col_open, 'value': 106.96},
{'ticker': 'MSFT', 'date': day_1, 'attribute': col_close, 'value': 108.74},
{'ticker': 'MSFT', 'date': day_2, 'attribute': col_open, 'value': 108.58},
{'ticker': 'MSFT', 'date': day_2, 'attribute': col_close, 'value': 107.32},
], columns=cols)
return rv
aapl_long_format = make_long_aapl()
aapl_long_format
| index | date | ticker | attribute | value |
|---|---|---|---|---|
| 0 | 2015-12-29 | AAPL | open | 106.96 |
| 1 | 2015-12-29 | AAPL | close | 108.74 |
| 2 | 2015-12-30 | AAPL | open | 108.58 |
| 3 | 2015-12-30 | AAPL | close | 107.32 |
| 4 | 2015-12-29 | MSFT | open | 106.96 |
| 5 | 2015-12-29 | MSFT | close | 108.74 |
| 6 | 2015-12-30 | MSFT | open | 108.58 |
| 7 | 2015-12-30 | MSFT | close | 107.32 |
Is it possible to create a wide format dataframe below using the unstack method?
| index | date | ticker | close | open |
|---|---|---|---|---|
| 0 | 2015-12-29 | AAPL | 108.74 | 106.96 |
| 1 | 2015-12-29 | MSFT | 108.74 | 106.96 |
| 2 | 2015-12-30 | AAPL | 107.32 | 108.58 |
| 3 | 2015-12-30 | MSFT | 107.32 | 108.58 |
If possible, what to specify to the level parameter and what are a new level of column label, inner-most level and pivoted index labels in those dataframes?
Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.
Solution 1:[1]
Yes, it is possible, but first is necessary create MultiIndex for possible use unstack, because:
Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.
Here inner-most level of MultiIndex is attribute, so is used for new columns:
print (aapl_long_format.set_index(['date','ticker','attribute'])['value'])
date ticker attribute
2015-12-29 AAPL open 106.96
close 108.74
2015-12-30 AAPL open 108.58
close 107.32
2015-12-29 MSFT open 106.96
close 108.74
2015-12-30 MSFT open 108.58
close 107.32
Name: value, dtype: float64
print (aapl_long_format.set_index(['date','ticker','attribute'])['value'].index)
MultiIndex([('2015-12-29', 'AAPL', 'open'),
('2015-12-29', 'AAPL', 'close'),
('2015-12-30', 'AAPL', 'open'),
('2015-12-30', 'AAPL', 'close'),
('2015-12-29', 'MSFT', 'open'),
('2015-12-29', 'MSFT', 'close'),
('2015-12-30', 'MSFT', 'open'),
('2015-12-30', 'MSFT', 'close')],
names=['date', 'ticker', 'attribute'])
print (aapl_long_format.set_index(['date','ticker','attribute'])['value'].index.levels)
[[2015-12-29 00:00:00, 2015-12-30 00:00:00], ['AAPL', 'MSFT'], ['close', 'open']]
print (aapl_long_format.set_index(['date','ticker','attribute'])['value'].index.names)
['date', 'ticker', 'attribute']
df = aapl_long_format.set_index(['date','ticker','attribute'])['value'].unstack().reset_index()
print (df)
attribute date ticker close open
0 2015-12-29 AAPL 108.74 106.96
1 2015-12-29 MSFT 108.74 106.96
2 2015-12-30 AAPL 107.32 108.58
3 2015-12-30 MSFT 107.32 108.58
For this 2 function exist in pandas pivot:
df = aapl_long_format.pivot(index=['date','ticker'], columns='attribute', values='value').reset_index()
print (df)
attribute date ticker close open
0 2015-12-29 AAPL 108.74 106.96
1 2015-12-29 MSFT 108.74 106.96
2 2015-12-30 AAPL 107.32 108.58
3 2015-12-30 MSFT 107.32 108.58
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
