'Using a reduction ufunc in agg
How do I use a ufunc that reduces to a scalar in the context of aggregation? For example, summarizing a table using numpy.trapz:
import polars as pl
import numpy as np
df = pl.DataFrame(dict(id=[0, 0, 0, 1, 1, 1], t=[2, 4, 5, 10, 11, 14], y=[0, 1, 1, 2, 3, 4]))
df.groupby('id').agg(pl.map(['t', 'y'], np.trapz))
# Segmentation fault (core dumped)
Solution 1:[1]
Edit: as of Polars 0.13.18, the apply method converts Numpy datatypes to Polars datatypes without requiring the Numpy item method.
Use apply in a groupby context (rather than map).
In this case, the numpy trapz function takes only one positional parameter (y)
numpy.trapz(y, x=None, dx=1.0, axis=- 1)
So, we'll need to specify the x keyword parameter explicitly in our call. (I also assumed that you meant for your y column to be mapped as the y parameter, and your t column to be mapped as the x parameter in the call to numpy.)
The Series 'y' and 't' will be passed as a list of Series to the lambda function, so we'll use indices to indicate which column maps to which numpy parameter.
One additional wrinkle, numpy returns a value of type numpy.float64, rather than a Python float.
type(np.trapz([0, 1, 1], x=[2, 4, 5]))
<class 'numpy.float64'>
Presently, the apply function in Polars will not automatically convert a numpy.float64 to polars.Float64. To remedy this, we'll use the numpy item method to have numpy return a Python float, rather than a numpy.float64.
type(np.trapz([0, 1, 1], x=[2, 4, 5]).item())
<class 'float'>
With this in hand, we can now write our apply statement.
df.groupby("id").agg(
pl.apply(
["y", "t"],
lambda lst: np.trapz(y=lst[0], x=lst[1]).item()
)
)
shape: (2, 2)
??????????????
? id ? y ?
? --- ? --- ?
? i64 ? f64 ?
??????????????
? 1 ? 13.0 ?
??????????????
? 0 ? 2.0 ?
??????????????
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
