'How can I sort a vector of bigz integers in R?

I am working in R using arbitrary precision arithmetic in the gmp package. This package creates and stores large integers in the bigz form. For example, you can create a vector of arbitrarily large integers as follows:

X <- as.bigz(c("734876349856913169345", "610034193791098", "82348779011105371828395319",
               "810367198176345917234", "92573840155289", "729811850143511981", "51385",
               "358934723", "751938475", "72265018270590", "12838756105612376401932875"));

I would like to sort this vector of large integers (smallest to largest). Although the documentation for bigz objects notes that they can be compared with inequality operations, unfortunately the standard sort function does not work on them:

sort(X)
Error in rank(x, ties.method = "min", na.last = "keep") : 
  raw vectors cannot be sorted

Question: How can I take a bigz vector like the one above and sort it in ascending order?



Solution 1:[1]

Another option is mixedsort from gtools after converting to character

as.bigz(gtools::mixedsort(as.character(BIGINTEGERS)))
#Big Integer ('bigz') object of length 11:
# [1] 51385                      358934723                  751938475                 
# [4] 72265018270590             92573840155289             610034193791098           
# [7] 729811850143511981         734876349856913169345      810367198176345917234     
#[10] 12838756105612376401932875 82348779011105371828395319

as methods for class 'bigz' include as.character

grep('as.character', methods(class = 'bigz'), fixed = TRUE, value = TRUE)
#[1] "as.character.bigz"

Solution 2:[2]

I wrote a function to do this by first grouping the big integers by number of digits, then sorting each group as a character vector. It's not exactly elegant, but it works:

library(gmp)
X <- as.bigz(c("734876349856913169345", "610034193791098", "82348779011105371828395319",
               "810367198176345917234", "92573840155289", "729811850143511981", "51385",
               "358934723", "751938475", "72265018270590", "12838756105612376401932875"))

sortbigz <- function(N, decreasing = FALSE) { 
  stopifnot(is.bigz(N))
  # returns a list with the following:
  #  [[1]] a bigz vector, sorted as if NA represented infinity
  #  [[2]] the original argument, converted to a character vector, unsorted
  #  [[3]] integer vector showing the rank of each element of the original vector, in the sorted vector

  z <- is.na(N)
  Ch <- as.character(N)
  is.na(Ch) <- z
  negnumbers <- N < 0
  negnumbers[z] <- FALSE
  str.length <- nchar(Ch)
  n.digits <- ifelse(negnumbers, -(str.length - 1L), str.length)  # number of digits in each element, where for example -582 is deemed to have -3 digits
  r <- rank(n.digits, ties.method = "min")
  upr <- unique(r[!negnumbers]) # unique ranks of positive numbers in N
  unr <- unique(r[negnumbers])  # unique ranks of negative numbers in N
  for(s in upr) r[r == s] <- (s - 1L) + rank(Ch[r == s], ties.method = "min")
  for(s in unr) r[r == s] <- (s + sum(r == s)) - rank(Ch[r == s], ties.method = "random") 
  if(decreasing) r <- (1L + length(N)) - r
  list(sorted.bigz   = N[order(r)], 
       unsorted.char = Ch,
       ranking       = r)
}

sortbigz(X)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 akrun
Solution 2 Montgomery Clift