'How to sort a dask dataframe in descending order?

Apparently, the ascending keyword does not exist in dask, which is funny because dask is designed to resemble pandas. This does not work:

res = (ddf
    .groupby(['An important column'])
    .mean()
    .sort_values('Score', ascending=False)
    .compute()
)

What would be the best way to do that descending sorting with dask?

> NotImplementedError: The ascending= keyword is not supported

dask version: 2021.4.0



Solution 1:[1]

If the delayed result is very small (fits in worker/client memory) and has a task graph that does not involve a lot of data shuffling, then it’s usually OK to run .compute first (to turn the delayed value into pandas df) and then run the missing/not implemented function.

For example, this could be done as follows:

res = (ddf
    .groupby(['some_col'])
    .agg({'Score': 'mean'})
    .compute()
    .sort_values('Score', ascending=False)
)

Sorting across partitions is an expensive operation:

res = (ddf
    .sort_values('some_col', ascending=False)
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1