'Using gtools::mixedsort or alternatives with dplyr::arrange
I am trying to order a dataframe by making use of dplyr::arrange. The issue is that the column I am trying to sort on contains both a fixed string followed by a number, as for instance generated by the dummycode below.
dummydf<-data.frame(values=rnorm(100),sortcol=paste0("ABC",sample(1:100,100,replace=FALSE)))
By default, using dummydf %>% arrange(sortcol) would generate a df which is sorted alphanumerically (?) but this is of course not the desired result:
values sortcol
0.708081720 ABC1
0.041348322 ABC10
1.730962886 ABC100
0.423480861 ABC11
-1.545837266 ABC12
-1.345539947 ABC13
-0.078998792 ABC14
0.088712174 ABC15
0.670583024 ABC16
1.238837680 ABC17
-1.459044293 ABC18
-2.028535223 ABC19
0.779514385 ABC2
1.360509910 ABC20
In this example, I would like to sort the column as gtools::mixedsort would do, making sure ABC2 follows ABC1 and is not preceed by ABC1-19 and ABC100 mixedsort(as.character(dummydf$sortcol)) would do that trick.
Now, I am aware I could do this by using sub in my arrange argument: dummydf %>% arrange(as.numeric(sub("ABC","",sortcol))) but that is mainly because my string is something fixed (although any regex could be used to capture the last digits following any string I suppose).
I am just wondering: is there a more "elegant" and generic way to get this done with dplyr::arrange, in the same fashion as gtools::mixedsort?
Kind regards,
FM
Solution 1:[1]
Here's a functional solution making use of the mysterious identity order(order(x)) == rank(x).
mixedrank = function(x) order(gtools::mixedorder(x))
dummydf %>% dplyr::arrange(mixedrank(sortcol))
Solution 2:[2]
I don't see this answer posted so I'll throw it out. You can use mixedorder with slice to arrange it.
dummydf %>%
slice(mixedorder(sortcol))
Solution 3:[3]
Using data.table
library(data.table)
dummydf = data.table(dummydf)
dummydf[gtools::mixedorder(as.character(sortcol))]
Honestly just copied your example and stuck it in as the select argument in the data.table syntax. You already did all the hard work :).
Solution 4:[4]
Credit to Akhil Nair for his data.table answer which is what the first code snippet derives from. If you like the data.table answer but still want magrittr piping, you can consider calculating a new column and using piping with data.table to get your output:
dummydf %>%
dplyr::mutate(row_lookup = gtools::mixedorder(as.character(sortcol))) %>%
data.table::data.table() %>%
.[.$row_lookup]
I think it's debatable whether that helps or detracts from the readability.
If you don't want to call data.table, you can go through some extra contortions to calculate a column you can use dplyr::arrange on. Here's one example:
library(dplyr)
bind_cols(dummydf,
dummydf %>%
tibble::rowid_to_column("order") %>%
mutate(rowname = gtools::mixedorder(as.character(sortcol))) %>%
arrange(rowname) %>%
select(order)) %>%
arrange(order)
I think this code is more confusing to read and isn't worth those extra contortions to avoid data.table.
Solution 5:[5]
Here is a solution that will allow for sorting if there are repeats and multiple conditions to sort. Most previous answers are not generic: they freeze the ordering at level 1.
df <- data.frame(values = rnorm(100),
sortcol1 = paste0("ASORT", sample(1:100, 100, replace = TRUE)),
sortcol2 = paste0("BSORT", sample(1:100, 100, replace = TRUE)),
stringsAsFactors = F)
df %>%
mutate(
`sortcol1` = factor(`sortcol1`, ordered = T, levels = unique(gtools::mixedsort(`sortcol1`))),
`sortcol2` = factor(`sortcol2`, ordered = T, levels = unique(gtools::mixedsort(`sortcol2`)))
) %>%
arrange(`sortcol1`, `sortcol2`)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | bischrob |
| Solution 3 | Akhil Nair |
| Solution 4 | |
| Solution 5 | Viktor Horváth |
