'Using a reduction ufunc in agg

How do I use a ufunc that reduces to a scalar in the context of aggregation? For example, summarizing a table using numpy.trapz:

import polars as pl
import numpy as np

df = pl.DataFrame(dict(id=[0, 0, 0, 1, 1, 1], t=[2, 4, 5, 10, 11, 14], y=[0, 1, 1, 2, 3, 4]))
df.groupby('id').agg(pl.map(['t', 'y'], np.trapz))
# Segmentation fault (core dumped)


Solution 1:[1]

Edit: as of Polars 0.13.18, the apply method converts Numpy datatypes to Polars datatypes without requiring the Numpy item method.

Use apply in a groupby context (rather than map).

In this case, the numpy trapz function takes only one positional parameter (y)

numpy.trapz(y, x=None, dx=1.0, axis=- 1)

So, we'll need to specify the x keyword parameter explicitly in our call. (I also assumed that you meant for your y column to be mapped as the y parameter, and your t column to be mapped as the x parameter in the call to numpy.)

The Series 'y' and 't' will be passed as a list of Series to the lambda function, so we'll use indices to indicate which column maps to which numpy parameter.

One additional wrinkle, numpy returns a value of type numpy.float64, rather than a Python float.

type(np.trapz([0, 1, 1], x=[2, 4, 5]))
<class 'numpy.float64'>

Presently, the apply function in Polars will not automatically convert a numpy.float64 to polars.Float64. To remedy this, we'll use the numpy item method to have numpy return a Python float, rather than a numpy.float64.

type(np.trapz([0, 1, 1], x=[2, 4, 5]).item())
<class 'float'>

With this in hand, we can now write our apply statement.

df.groupby("id").agg(
    pl.apply(
        ["y", "t"],
        lambda lst: np.trapz(y=lst[0], x=lst[1]).item()
    )
)
shape: (2, 2)
??????????????
? id  ? y    ?
? --- ? ---  ?
? i64 ? f64  ?
??????????????
? 1   ? 13.0 ?
??????????????
? 0   ? 2.0  ?
??????????????

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1