'Is there a way to handle dtypes of pandas.DataFrame in rows and not columns?
Big Data file formats like parquet, feather and hdf5 are able to work with a columnar oriented table to accelerate the speed of reading columns.
In my use case I would like to switch from netcdf4 files to a feather file format because I can read some columns 10 times faster than using netcdf4. But unfortunately I am losing dtype specification which increases the size of the file.
So my idea is to define dtypes of rows but pandas only accepting column dtypes.
Is there a way to handle DataFrames more like a columnar oriented table and specifiy dtypes for each row?
Solution 1:[1]
Pandas dataframes are a collection of series objects, so you can't have more than one data type specified per column (i.e. a column with [2, 'dog', 3] will have the dtype object because of the string likewise [2, 2.5, 3] can't be type int because of the 2.5.
If you want to work row-based you'll need to transpose your DataFrame usingdf.transpose() (or shorthand df.T) this will make your columns become rows. If you're importing your data you can transpose your dataframe and cast to each column to the data type you want, if it's the case that you're preparing data to be exported then at your last step before exporting transpose.
Eg:
import pandas as pd
df = pd.DataFrame({'col_1': [1, 'cat', 3],
'col_2': [4, 'dog', 6]},
index=['row_1', 'row_2', 'row_3'])
>>> df
col_1 col_2
row_1 1 4
row_2 cat dog
row_3 3 6
# Due to the the strings both columns are dtype object
>>> df.dtypes
col_1 object
col_2 object
# Transpose the df
>>> df.T
row_1 row_2 row_3
col_1 1 cat 3
col_2 4 dog 6
# Now our data is in columns but still dtype object
>>> df.T.dtypes
row_1 object
row_2 object
row_3 object
# We can cast our columns (originally rows) to new dtypes now
>>> df.T.astype({'row_1': 'int', 'row_2': str, 'row_3': 'int'})
row_1 row_2 row_3
col_1 1 cat 3
col_2 4 dog 6
>>> df.T.astype({'row_1': 'int', 'row_2': str, 'row_3': 'int'}).dtypes
row_1 int64
row_2 object
row_3 int64
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jason |
