'What sorting algorithm does dplyr's arrange function use?
I have a dataset where we are bringing in a dataframe of values, and assigning each of the values a number 1-5. We're then sorting these values according to this column using dplyr::arrange.
In looking through the resulting data, it's clear that, aside from being ordered by the numbers in the 1-5 column, the original order the rows came in impacts the final order. However, I can't figure out what's impacting the order of rows within each group.
To this end, I've been trying to find the sorting algorithm that dplyr's arrange function uses - however, I can't find it anywhere here or in the documentation. Any help would be appreciated
Solution 1:[1]
The documentation doesn't tell you, and doesn't guarantee that order is preserved within ties. That means you shouldn't assume anything about the behaviour within ties.
All you should assume is that things are ordered as the documentation says they will be. If that is violated, it's a bug. If the documentation doesn't say what will happen, then you should assume that whatever happens today may be different tomorrow.
It's easy enough to make any sort into a stable sort (that maintains the original ordering within ties). Just add an extra column containing the original position, and include it as the final column for breaking ties. For example,
dplyr::arrange(mtcars, carb)
doesn't say anything about the order within rows that have the same value of carb. But
dplyr::arrange(data.frame(i = 1:nrow(mtcars), mtcars), carb, i)[-1]
guarantees that the original order is kept within carb values.
Solution 2:[2]
The code shows it eventually calls base::order with the default method, so:
method: the method to be used: partial matches are allowed. The
default (‘"auto"’) implies ‘"radix"’ for short numeric
vectors, integer vectors, logical vectors and factors.
Otherwise, it implies ‘"shell"’. For details of methods
‘"shell"’, ‘"quick"’, and ‘"radix"’, see the help for ‘sort’.
Though, it does pass through vctrs::vec_proxy_order first - not sure it that matters.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | merv |
