'In R, how to extract major.minor version numbers?

long time reader, first time poster

I have a dataframe, that among other things contain the different version numbers for a given object such as: 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 , 0.10 , 0.11 , 1.0 , 1.1

I need to separate it into a column of major version and minor version numbers. For the example it would be major: 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 1 minor: 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 0 , 1

But when i attempt to do it it thinks that minor version 10 is just 1.

Is there any smart way to do this correctly?

-Thanks

r


Solution 1:[1]

library(tidyverse)

data <- tibble(name = c("foo", "bar"), version = c(0.1, 1.5))

data %>%
  mutate(across(version, as.character)) %>%
  separate(version, sep = "\\.", into = c("major", "minor"))
#> # A tibble: 2 × 3
#>   name  major minor
#>   <chr> <chr> <chr>
#> 1 foo   0     1    
#> 2 bar   1     5

Created on 2022-04-07 by the reprex package (v2.0.0)

Solution 2:[2]

The problem is to mistake the output of print with what the numeric vector elements are.
When you read in the data as numeric, R will see the numbers 0.1 and 0.10 as the same, which is obvious since they both are the number 0.1.
But when printed, the print method for numeric data will print all vector elements with the same length, the same number of digits.
So if any vector element has two decimal digits then 0.1 and 0.10 (in fact the same number 0.1) will print as 0.10.
This can only be made right if the data is read in as character.

Read in the question's test data as character and as numeric.

vec <- " 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 , 0.10 , 0.11 , 1.0 , 1.1"
x <- scan(text = vec, sep = ",", what = character())
x <- trimws(x)
y <- scan(text = vec, sep = ",")

Print the numbers

The first example prints the numeric vector y, then prints only its problem elements and then 3 elements, one of them with 2 decimal digits. The 2nd and 3rd printed versions of the same numbers are different, when 0.11 is printed they have 2 decimals. But the numbers 0.1 and 0.10 are still exactly equal.

y
#>  [1] 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.10 0.11 1.00 1.10
y[c(1, 10)]
#> [1] 0.1 0.1
y[c(1, 10, 11)]
#> [1] 0.10 0.10 0.11

Created on 2022-04-07 by the reprex package (v2.0.1)

This also happens when printing 1.0. Printed as numeric all vector elements have the same number of decimal digits but as printed character the useless digits are discarded.

y[c(1, 12)]
#> [1] 0.1 1.0
as.character(y)[c(1, 12)]
#> [1] "0.1" "1"
y[c(11, 12)]
#> [1] 0.11 1.00
as.character(y)[c(11, 12)]
#> [1] "0.11" "1"

Created on 2022-04-07 by the reprex package (v2.0.1)

And if only 1.0 is printed, the minor version number doesn't show up.

y[12]
#> [1] 1
as.character(y)[12]
#> [1] "1"

Created on 2022-04-07 by the reprex package (v2.0.1)

A solution

The following function avoids this unwanted behavior by throwing an error if the input is numeric.

major_minor <- function(x, split = "."){
  s <- strsplit(x, split = split, fixed = TRUE)
  y <- do.call(rbind.data.frame, s)
  setNames(y, c("major", "minor"))
}

major_minor(x)
#>    major minor
#> 1      0     1
#> 2      0     2
#> 3      0     3
#> 4      0     4
#> 5      0     5
#> 6      0     6
#> 7      0     7
#> 8      0     8
#> 9      0     9
#> 10     0    10
#> 11     0    11
#> 12     1     0
#> 13     1     1
major_minor(y)
#> Error in strsplit(x, split = split, fixed = TRUE): non-character argument

Created on 2022-04-07 by the reprex package (v2.0.1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 danlooo
Solution 2 Rui Barradas