'Questions about Polars groupby

Q1: In polars-rust, when you do gourpby.agg , we can use head(10) to get the first 10 elements in a col. But if the groups have different length and I need to get first 20% elements in each group like 0-24 elements in a 120 elements group. how to make it work?
Q2: with a dataframe sample like below, my goal is to loop the dataframe. Beacuse polars is column major, so I downcasted df into serval ChunkedArrays and iterated via iter().zip().I found it is faster than the same action after goupby(col("date")) which is loop some list elemnts. How is that? In my opinion, the length of df is shorter after groupby, which means a shorter loop.

Date	Stock	Price
2010-01-01	IBM	1000
2010-01-02	IBM	1001
2010-01-03	IBM	1002
2010-01-01	AAPL	2900
2010-01-02	AAPL	2901
2010-01-03	AAPL	2902

python-polars rust-polars

Solution 1:^[1]

I don't really understand your 2nd question. Maybe you can create another question with a small example.

I will answer the 1st question:

we can use head(10) to get the first 10 elements in a col. But if the groups have different length and I need to get first 20% elements in each group like 0-24 elements in a 120 elements group. how to make it work?

We can use expressions to take a head(n) where n = 0.2 group_size.

df = pl.DataFrame({
    "groups": ["a"] * 10 + ["b"] * 20,
    "values": range(30)
})

(df.groupby("groups")
    .agg(pl.all().head(pl.count() * 0.2))
    .explode(pl.all().exclude("groups"))
)

which outputs:

shape: (6, 2)
???????????????????
? groups ? values ?
? ---    ? ---    ?
? str    ? i64    ?
???????????????????
? a      ? 0      ?
???????????????????
? a      ? 1      ?
???????????????????
? b      ? 10     ?
???????????????????
? b      ? 11     ?
???????????????????
? b      ? 12     ?
???????????????????
? b      ? 13     ?
???????????????????

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	ritchie46

'Questions about Polars groupby

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]