'Selecting all but the first element of a vector in data frame
I have some data that looks like this:
X1
A,B,C,D,E
A,B
A,B,C,D
A,B,C,D,E,F
I want to generate one column that holds the first element of each vector ("A"), and another column that holds all the rest of the values ("B","C" etc.):
X1 Col1 Col2
A,B,C,D,E A B,C,D,E
A,B A B
A,B,C,D A B,C,D
A,B,C,D,E,F A B,C,D,E,F
I have tried the following:
library(dplyr)
testdata <- data.frame(X1 = c("A,B,C,D,E",
"A,B",
"A,B,C,D",
"A,B,C,D,E,F")) %>%
mutate(Col1 = sapply(strsplit(X1, ","), "[", 1),
Col2 = sapply(strsplit(X1, ","), "[", -1))
However I cannot seem to get rid of the pesky vector brackets around the values in Col2. Any way of doing this?
Solution 1:[1]
A possible solution, using tidyr::separate:
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
X1 = c("A,B,C,D,E", "A,B", "A,B,C,D", "A,B,C,D,E,F")
)
df %>%
separate(X1, into = str_c("col", 1:2), sep = "(?<=^.),", remove = F)
#> X1 col1 col2
#> 1 A,B,C,D,E A B,C,D,E
#> 2 A,B A B
#> 3 A,B,C,D A B,C,D
#> 4 A,B,C,D,E,F A B,C,D,E,F
Solution 2:[2]
Try the base R code below using sub + read.table
cbind(
df,
read.table(
text = sub(",", " ", df$X1)
)
)
which gives
X1 V1 V2
1 A,B,C,D,E A B,C,D,E
2 A,B A B
3 A,B,C,D A B,C,D
4 A,B,C,D,E,F A B,C,D,E,F
Solution 3:[3]
You can use str_sub() function as follow:
> df
# A tibble: 4 x 1
X1
<chr>
1 A,B,C,D,E
2 A,B
3 A,B,C,D
4 A,B,C,D,E,F
> df %>% mutate(X2 = str_sub(X1, 1,1), X3 = str_sub(X1, 3))
# A tibble: 4 x 3
X1 X2 X3
<chr> <chr> <chr>
1 A,B,C,D,E A B,C,D,E
2 A,B A B
3 A,B,C,D A B,C,D
4 A,B,C,D,E,F A B,C,D,E,F
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | ThomasIsCoding |
| Solution 3 | Erfan Ghasemi |
