'How to Find the Second Highest digit in a string R
Question:
Write a function that accepts a string and returns the second highest numerical digit in the input as an integer.
The following rules should apply:
Inputs with no numerical digits should return -1 Inputs with only one numerical digit should return -1 Non-numeric characters should be ignored Each numerical input should be treated individually, meaning in the event of a joint highest digit then the second highest digit will also be the highest digit For example:
"abc:1231234" returns 3
"123123" returns 3
[execution time limit] 5 seconds (r)
[input] string input
The input string
[output] integer
The second-highest digit
I can convert the string into a numeric vector with strsplit and as.numeric and get rid of NAs (letters). But not sure where to go from here.
Clarification: Ideally base R solution.
I've got this code so far which, while messy, deals with all but the case where there are joint highest numbers:
solution <- function(input) {
d <- as.integer(strsplit(input,"")[[1]])
if (any(is.na(d))) {
d <- d[-which(is.na(d))]
}
if(all(is.na(d))) {
return(-1)
}
if (length(is.na(d)) == length(d)-1) {
return(-1)
}
sort(d,TRUE)[2]
}
Solution 1:[1]
A stringr::str_count solution:
library(stringr)
secondHighest1 <- function(str) {
ans <- 10L - match(TRUE, cumsum(str_count(str, paste0(9:0))) > 1L)
if (is.na(ans)) -1L else ans
}
A base R solution:
secondHighest2 <- function(str) {
suppressWarnings(ans <- 10L - match(TRUE, cumsum(tabulate(10L - as.integer(strsplit(str, "")[[1]]))) > 1L))
if (is.na(ans)) -1L else ans
}
UPDATE: Borrowing Adam's idea of using utf8ToInt instead of strsplit gives a big speed boost:
secondHighest3 <- function(str) {
nums <- utf8ToInt(str)
nums <- nums[nums < 58L]
if (length(nums) > 1L) max(-1L, max(nums[-which.max(nums)]) - 48L) else -1L
}
set.seed(94)
chrs <- c(paste0(9:0), letters, LETTERS)
str <- paste0(sample(chrs, 1e5, TRUE, (1:62)^4), collapse = "")
secondHighest1(str)
#> [1] 3
secondHighest2(str)
#> [1] 3
secondHighest3(str)
#> [1] 3
microbenchmark::microbenchmark(secondHighest1(str),
secondHighest2(str),
secondHighest3(str))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> secondHighest1(str) 1193.8 1279.55 1524.712 1338.80 1525.2 5013.7 100
#> secondHighest2(str) 16825.3 18049.65 21297.962 19339.75 24126.4 36652.6 100
#> secondHighest3(str) 706.0 774.80 1371.196 867.40 1045.0 17780.4 100
Solution 2:[2]
As a one-liner
string <- "abc:1231234"
sort(unique(suppressWarnings(as.integer(strsplit(string, "", fixed = TRUE)[[1]]))), decreasing = TRUE)[2]
#> [1] 3
Or using the magrittr pipe:
library(magrittr)
suppressWarnings(
strsplit(string, "", fixed = TRUE)[[1]] %>%
as.integer() %>%
unique() %>%
sort(decreasing = TRUE) %>%
.[2]
)
#> [1] 3
Created on 2022-03-25 by the reprex package (v2.0.1)
Solution 3:[3]
You can also do something like this to convert to ASCII integers.
solution <- function(input) {
if (nchar(input) < 2L) return(-1L)
# ASCII codes for 0:9 are 48:57
int_input <- utf8ToInt(input) - 48L
sort(replace(int_input, !(int_input %in% 0L:9L), -1L), decreasing = TRUE)[2]
}
Testing a few strings...
str1 <- "abc:1231234"
str2 <- "987654321"
str3 <- "abcdefg4"
str4 <- "abc$<>$#%fgdgLJJ"
str5 <- "123123"
solution(str1)
# [1] 3
solution(str2)
# [1] 8
solution(str3)
# [1] -1
solution(str4)
# [1] -1
solution(str5
# [1] 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | JBGruber |
| Solution 3 |
