'Pyspark: orderby top 10 count descending breaking ties?
I have the following dataframe where I am grouping by id1 and id2, then ordering by count in descending order. I want to break tie breakers by whichever has the higher rate value. Here is what I have for my code currently:
df.groupBy('ID1', 'ID2').agg((avg("amount")/avg("distance")).alias("rate"), count("*").alias("count")).orderBy(desc('count'), desc('rate')).limit(10).show()
and my output is:
Any help is greatly appreciated!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

