'pandas - explanation of unstack method description

Please explain what the unstack function description DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels means.

pandas.DataFrame.unstack(level=- 1, fill_value=None)

Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

Parameters

  • level: int, str, or list of these, default -1 (last level)
    Level(s) of index to unstack, can pass level name.

  • fill_value: int, str or dict
    Replace NaN with this value if the unstack produces missing values.

Suppose there is a long format dataframe.

def make_long_aapl():
    day_1 = dt.date(2015, 12, 29)
    day_2 = dt.date(2015, 12, 30)
    
    col_close = 'close'
    col_open = 'open'
    
    cols = ['date', 'ticker', 'attribute', 'value']
    
    rv = pd.DataFrame([
      {'ticker': 'AAPL', 'date': day_1, 'attribute': col_open,  'value': 106.96},
      {'ticker': 'AAPL', 'date': day_1, 'attribute': col_close, 'value': 108.74},
      {'ticker': 'AAPL', 'date': day_2, 'attribute': col_open,  'value': 108.58},
      {'ticker': 'AAPL', 'date': day_2, 'attribute': col_close, 'value': 107.32},
      {'ticker': 'MSFT', 'date': day_1, 'attribute': col_open,  'value': 106.96},
      {'ticker': 'MSFT', 'date': day_1, 'attribute': col_close, 'value': 108.74},
      {'ticker': 'MSFT', 'date': day_2, 'attribute': col_open,  'value': 108.58},
      {'ticker': 'MSFT', 'date': day_2, 'attribute': col_close, 'value': 107.32},
    ], columns=cols)
    
    return rv

aapl_long_format = make_long_aapl()
aapl_long_format
index date ticker attribute value
0 2015-12-29 AAPL open 106.96
1 2015-12-29 AAPL close 108.74
2 2015-12-30 AAPL open 108.58
3 2015-12-30 AAPL close 107.32
4 2015-12-29 MSFT open 106.96
5 2015-12-29 MSFT close 108.74
6 2015-12-30 MSFT open 108.58
7 2015-12-30 MSFT close 107.32

Is it possible to create a wide format dataframe below using the unstack method?

index date ticker close open
0 2015-12-29 AAPL 108.74 106.96
1 2015-12-29 MSFT 108.74 106.96
2 2015-12-30 AAPL 107.32 108.58
3 2015-12-30 MSFT 107.32 108.58

If possible, what to specify to the level parameter and what are a new level of column label, inner-most level and pivoted index labels in those dataframes?

Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.



Solution 1:[1]

Yes, it is possible, but first is necessary create MultiIndex for possible use unstack, because:

Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

Here inner-most level of MultiIndex is attribute, so is used for new columns:

print (aapl_long_format.set_index(['date','ticker','attribute'])['value'])
date        ticker  attribute
2015-12-29  AAPL    open         106.96
                    close        108.74
2015-12-30  AAPL    open         108.58
                    close        107.32
2015-12-29  MSFT    open         106.96
                    close        108.74
2015-12-30  MSFT    open         108.58
                    close        107.32
Name: value, dtype: float64


print (aapl_long_format.set_index(['date','ticker','attribute'])['value'].index)
MultiIndex([('2015-12-29', 'AAPL',  'open'),
            ('2015-12-29', 'AAPL', 'close'),
            ('2015-12-30', 'AAPL',  'open'),
            ('2015-12-30', 'AAPL', 'close'),
            ('2015-12-29', 'MSFT',  'open'),
            ('2015-12-29', 'MSFT', 'close'),
            ('2015-12-30', 'MSFT',  'open'),
            ('2015-12-30', 'MSFT', 'close')],
           names=['date', 'ticker', 'attribute'])


print (aapl_long_format.set_index(['date','ticker','attribute'])['value'].index.levels)
[[2015-12-29 00:00:00, 2015-12-30 00:00:00], ['AAPL', 'MSFT'], ['close', 'open']]

print (aapl_long_format.set_index(['date','ticker','attribute'])['value'].index.names)
['date', 'ticker', 'attribute']

df = aapl_long_format.set_index(['date','ticker','attribute'])['value'].unstack().reset_index()
print (df)
attribute       date ticker   close    open
0         2015-12-29   AAPL  108.74  106.96
1         2015-12-29   MSFT  108.74  106.96
2         2015-12-30   AAPL  107.32  108.58
3         2015-12-30   MSFT  107.32  108.58

For this 2 function exist in pandas pivot:

df = aapl_long_format.pivot(index=['date','ticker'], columns='attribute', values='value').reset_index()
print (df)
attribute        date ticker   close    open
0          2015-12-29   AAPL  108.74  106.96
1          2015-12-29   MSFT  108.74  106.96
2          2015-12-30   AAPL  107.32  108.58
3          2015-12-30   MSFT  107.32  108.58

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1