'How to Find the Second Highest digit in a string R

Question:

Write a function that accepts a string and returns the second highest numerical digit in the input as an integer.

The following rules should apply:

Inputs with no numerical digits should return -1 Inputs with only one numerical digit should return -1 Non-numeric characters should be ignored Each numerical input should be treated individually, meaning in the event of a joint highest digit then the second highest digit will also be the highest digit For example:

 "abc:1231234" returns 3
 "123123" returns 3

[execution time limit] 5 seconds (r)

[input] string input

The input string

[output] integer

The second-highest digit

I can convert the string into a numeric vector with strsplit and as.numeric and get rid of NAs (letters). But not sure where to go from here.

Clarification: Ideally base R solution.

I've got this code so far which, while messy, deals with all but the case where there are joint highest numbers:

solution <- function(input) {
  d <- as.integer(strsplit(input,"")[[1]])
  if (any(is.na(d))) {
    d <- d[-which(is.na(d))]
  }
  if(all(is.na(d))) {
    return(-1)
  }
  if (length(is.na(d)) == length(d)-1) {
    return(-1)
  }
  sort(d,TRUE)[2]
}


Solution 1:[1]

A stringr::str_count solution:

library(stringr)

secondHighest1 <- function(str) {
  ans <- 10L - match(TRUE, cumsum(str_count(str, paste0(9:0))) > 1L)
  if (is.na(ans)) -1L else ans
}

A base R solution:

secondHighest2 <- function(str) {
  suppressWarnings(ans <- 10L - match(TRUE, cumsum(tabulate(10L - as.integer(strsplit(str, "")[[1]]))) > 1L))
  if (is.na(ans)) -1L else ans
}

UPDATE: Borrowing Adam's idea of using utf8ToInt instead of strsplit gives a big speed boost:

secondHighest3 <- function(str) {
  nums <- utf8ToInt(str)
  nums <- nums[nums < 58L]
  if (length(nums) > 1L) max(-1L, max(nums[-which.max(nums)]) - 48L) else -1L
}

set.seed(94)
chrs <- c(paste0(9:0), letters, LETTERS)
str <- paste0(sample(chrs, 1e5, TRUE, (1:62)^4), collapse = "")
secondHighest1(str)
#> [1] 3
secondHighest2(str)
#> [1] 3
secondHighest3(str)
#> [1] 3

microbenchmark::microbenchmark(secondHighest1(str),
                               secondHighest2(str),
                               secondHighest3(str))
#> Unit: microseconds
#>                 expr     min       lq      mean   median      uq     max neval
#>  secondHighest1(str)  1193.8  1279.55  1524.712  1338.80  1525.2  5013.7   100
#>  secondHighest2(str) 16825.3 18049.65 21297.962 19339.75 24126.4 36652.6   100
#>  secondHighest3(str)   706.0   774.80  1371.196   867.40  1045.0 17780.4   100

Solution 2:[2]

As a one-liner

string <- "abc:1231234"

sort(unique(suppressWarnings(as.integer(strsplit(string, "", fixed = TRUE)[[1]]))), decreasing = TRUE)[2]
#> [1] 3

Or using the magrittr pipe:

library(magrittr)
suppressWarnings(
  strsplit(string, "", fixed = TRUE)[[1]] %>% 
    as.integer()  %>% 
    unique()  %>%  
    sort(decreasing = TRUE)  %>% 
    .[2]
)
#> [1] 3

Created on 2022-03-25 by the reprex package (v2.0.1)

Solution 3:[3]

You can also do something like this to convert to ASCII integers.

solution <- function(input) {
  if (nchar(input) < 2L) return(-1L)
  
  # ASCII codes for 0:9 are 48:57
  int_input <- utf8ToInt(input) - 48L
  sort(replace(int_input, !(int_input %in% 0L:9L), -1L), decreasing = TRUE)[2]
}

Testing a few strings...

str1 <- "abc:1231234"
str2 <- "987654321"
str3 <- "abcdefg4"
str4 <- "abc$<>$#%fgdgLJJ"
str5 <- "123123"

solution(str1)
# [1] 3
solution(str2)
# [1] 8
solution(str3)
# [1] -1
solution(str4)
# [1] -1
solution(str5
# [1] 3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 JBGruber
Solution 3