'two header rows to one header row pandas
Given a dataframe that looks like this:
min max mean std wav
symbol
1000038 0.0110 0.0022 0.003296 -0.002611 0.005514
1000043 0.0090 0.0102 0.004124 -0.000507 0.004847
1000055 0.0100 0.0030 0.001358 -0.001719 0.004363
1000059 0.0050 0.0017 0.003372 -0.001089 0.008595
1000060 0.0012 0.0062 0.002367 0.000407 0.006935
The rows headers are on two different rows, but are not multi-index as far as i am aware, so when i try to columns.droplevel(0) this fails with the error: Cannot remove 1 levels from an index with 1 levels: at least one level must be left.
The same csv file looks flat, like the below (but wish to avoid writing and reading back a csv) to correct the format.
symbol min max mean std wav
1000038 0.0110 0.0022 0.003296 -0.002611 0.005514
1000043 0.0090 0.0102 0.004124 -0.000507 0.004847
1000055 0.0100 0.0030 0.001358 -0.001719 0.004363
1000059 0.0050 0.0017 0.003372 -0.001089 0.008595
1000060 0.0012 0.0062 0.002367 0.000407 0.006935
so my question is how to turn the above into a tidy (square) shape like the above with just one header row ?
Solution 1:[1]
You can convert index with level symbol to column by DataFrame.reset_index:
print (df.index.name)
symbol
print (df.columns.name)
None
df1 = df.reset_index()
print (df1)
symbol min max mean std wav
0 1000038 0.0110 0.0022 0.003296 -0.002611 0.005514
1 1000043 0.0090 0.0102 0.004124 -0.000507 0.004847
2 1000055 0.0100 0.0030 0.001358 -0.001719 0.004363
3 1000059 0.0050 0.0017 0.003372 -0.001089 0.008595
4 1000060 0.0012 0.0062 0.002367 0.000407 0.006935
but then get default index, RangeIndex.
print (df1.index)
RangeIndex(start=0, stop=5, step=1)
Then if write to file is possible omit indices:
df1.to_csv(file, index=False)
If need only display like square by DataFrame.rename_axis - set index name to column name:
df2 = df.rename_axis(index=None, columns=df.index.name)
print (df2)
symbol min max mean std wav
1000038 0.0110 0.0022 0.003296 -0.002611 0.005514
1000043 0.0090 0.0102 0.004124 -0.000507 0.004847
1000055 0.0100 0.0030 0.001358 -0.001719 0.004363
1000059 0.0050 0.0017 0.003372 -0.001089 0.008595
1000060 0.0012 0.0062 0.002367 0.000407 0.006935
but now first column is not column, it is index, only looks like column ;)
print (df2.index)
Int64Index([1000038, 1000043, 1000055, 1000059, 1000060], dtype='int64')
print (df2.columns)
Index(['min', 'max', 'mean', 'std', 'wav'], dtype='object', name='symbol')
print (df2.columns.name)
symbol
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
