'How to sort a dask dataframe in descending order?
Apparently, the ascending keyword does not exist in dask, which is funny because dask is designed to resemble pandas. This does not work:
res = (ddf
.groupby(['An important column'])
.mean()
.sort_values('Score', ascending=False)
.compute()
)
What would be the best way to do that descending sorting with dask?
> NotImplementedError: The ascending= keyword is not supported
dask version: 2021.4.0
Solution 1:[1]
If the delayed result is very small (fits in worker/client memory) and has a task graph that does not involve a lot of data shuffling, then it’s usually OK to run .compute first (to turn the delayed value into pandas df) and then run the missing/not implemented function.
For example, this could be done as follows:
res = (ddf
.groupby(['some_col'])
.agg({'Score': 'mean'})
.compute()
.sort_values('Score', ascending=False)
)
Sorting across partitions is an expensive operation:
res = (ddf
.sort_values('some_col', ascending=False)
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
