'What sorting algorithm does dplyr's arrange function use?

I have a dataset where we are bringing in a dataframe of values, and assigning each of the values a number 1-5. We're then sorting these values according to this column using dplyr::arrange.

In looking through the resulting data, it's clear that, aside from being ordered by the numbers in the 1-5 column, the original order the rows came in impacts the final order. However, I can't figure out what's impacting the order of rows within each group.

To this end, I've been trying to find the sorting algorithm that dplyr's arrange function uses - however, I can't find it anywhere here or in the documentation. Any help would be appreciated



Solution 1:[1]

The documentation doesn't tell you, and doesn't guarantee that order is preserved within ties. That means you shouldn't assume anything about the behaviour within ties.

All you should assume is that things are ordered as the documentation says they will be. If that is violated, it's a bug. If the documentation doesn't say what will happen, then you should assume that whatever happens today may be different tomorrow.

It's easy enough to make any sort into a stable sort (that maintains the original ordering within ties). Just add an extra column containing the original position, and include it as the final column for breaking ties. For example,

dplyr::arrange(mtcars, carb)

doesn't say anything about the order within rows that have the same value of carb. But

dplyr::arrange(data.frame(i = 1:nrow(mtcars), mtcars), carb, i)[-1]

guarantees that the original order is kept within carb values.

Solution 2:[2]

The code shows it eventually calls base::order with the default method, so:

  method: the method to be used: partial matches are allowed.  The
          default (‘"auto"’) implies ‘"radix"’ for short numeric
          vectors, integer vectors, logical vectors and factors.
          Otherwise, it implies ‘"shell"’.  For details of methods
          ‘"shell"’, ‘"quick"’, and ‘"radix"’, see the help for ‘sort’.

Though, it does pass through vctrs::vec_proxy_order first - not sure it that matters.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 merv