'How to read a C struct (or Numpy record array) into a Polars Dataframe?
I have a binary file containing records from a C struct. I would like to read that file into a Polars Dataframe.
I can accomplish that as below, but I'm wondering if there is a more direct path?
My current solution involves:
- Reading the file into a Numpy record array (see below) using using
np.fromfile() - Converting that into a Pandas DataFrame
- Converting that to a Polars DataFrame
# Data read in from file using np.fromfile()
data = np.array([(1, 2002, 2, 13, 0.3),
(2, 2005, 1, -10, 1.5),
(3, 2004, 2, 54, -0.12)],
dtype=[("id", "<i4"),("yr", "<u2"),("sex", "<u2"),("val1", "<i2"),("val2", "<f4")]
)
df = pl.from_pandas(pd.DataFrame(data))
df
id yr sex val1 val2
i32 u16 u16 i16 f32
1 2002 2 13 0.3
2 2005 1 -10 1.5
3 2004 2 54 -0.12
I've tried reading data directly into Polars from numpy using pl.DataFrame(data) or pl.from_records(data), but in both cases I get a single column dataframe of type "object", which I can't work out how to separate into separate columns or convert to a struct.
Solution 1:[1]
data = np.array([(1, 2002, 2, 13, 0.3),
(2, 2005, 1, -10, 1.5),
(3, 2004, 2, 54, -0.12)],
dtype=[("id", "<i4"),("yr", "<u2"),("sex", "<u2"),("val1", "<i2"),("val2", "<f4")]
)
pl.DataFrame(
{
field_name: data[field_name]
for field_name in data.dtype.fields
}
)
???????????????????????????????????
? id ? yr ? sex ? val1 ? val2 ?
? --- ? --- ? --- ? --- ? --- ?
? i32 ? u16 ? u16 ? i16 ? f32 ?
???????????????????????????????????
? 1 ? 2002 ? 2 ? 13 ? 0.3 ?
???????????????????????????????????
? 2 ? 2005 ? 1 ? -10 ? 1.5 ?
???????????????????????????????????
? 3 ? 2004 ? 2 ? 54 ? -0.12 ?
???????????????????????????????????
To convert back to a numpy struct array, assign a numpy array per field:
# Create numpy struct array of the correct size.
numpy_struct_array = np.empty(df.height, data.dtype)
# Fill in the correct values.
for field, col in zip(data.dtype.fields, df.columns):
numpy_struct_array[field] = df.get_column(col).to_numpy()
numpy_struct_array
array([(1, 2002, 2, 13, 0.3 ), (2, 2005, 1, -10, 1.5 ),
(3, 2004, 2, 54, -0.12)],
dtype=[('id', '<i4'), ('yr', '<u2'), ('sex', '<u2'), ('val1', '<i2'), ('val2', '<f4')])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
