'remove blanks from strsplit in R

> dc1
  V1                V2
1 20140211-0100     |Box
2 20140211-1782     |Office|Ball
3 20140211-1783     |Office
4 20140211-1784     |Office
5 20140221-0756     |Box
6 20140203-0418     |Box
> strsplit(as.character(dc1[,2]),"^\\|")
[[1]]
[1] ""    "Box"


[[2]]
[1] ""             "Office" "Ball"


[[3]]
[1] ""             "Office"


[[4]]
[1] ""             "Office"


[[5]]
[1] ""    "Box"


[[6]]
[1] ""    "Box"  

How do i remove the blank ("") from strsplit results.The result should look like:

[[1]]
[1] "Box"
[[2]]
[1] "Office"    "Ball"
r


Solution 1:[1]

You can check use lapply on your list. I changed the definition of your strsplit to match your intended output.

dc1 <- read.table(text = 'V1                V2
1 20140211-0100     |Box
2 20140211-1782     |Office|Ball
3 20140211-1783     |Office
4 20140211-1784     |Office
5 20140221-0756     |Box
6 20140203-0418     |Box', header = TRUE)

out <- strsplit(as.character(dc1[,2]),"\\|")

> lapply(out, function(x){x[!x ==""]})
[[1]]
[1] "Box"

[[2]]
[1] "Office" "Ball"  

[[3]]
[1] "Office"

[[4]]
[1] "Office"

[[5]]
[1] "Box"

[[6]]
[1] "Box"

Solution 2:[2]

I do not have a global solution, but for your example you could try :

strsplit(sub("^\\|", "", as.character(dc1[,2])),"\\|")

It removes the first | (this is what the regex "^\\|" says), which is the reason for the "", before performing the split.

Solution 3:[3]

You could use:

library(stringr)
str_extract_all(dc1[,2], "[[:alpha:]]+")
[[1]]
 [1] "Box"

[[2]]
 [1] "Office" "Ball"  

[[3]]
 [1] "Office"

[[4]]
 [1] "Office"

[[5]]
 [1] "Box"

[[6]]
 [1] "Box"

Solution 4:[4]

In this case, you can just remove the first element of each vector by calling "[" in sapply

> sapply(strsplit(as.character(dc1[,2]), "\\|"), "[", -1)
# [[1]]
# [1] "Box"

# [[2]]
# [1] "Office" "Ball"  

# [[3]]
# [1] "Office"

# [[4]]
# [1] "Office"

# [[5]]
# [1] "Box"

# [[6]]
# [1] "Box"

Solution 5:[5]

Another method uses nzchar() after unlisting the result of strsplit():

out <- unlist(strsplit(as.character(dc1[,2]),"\\|"))

out[nzchar(x=out)] # removes the extraneous "" marks

Solution 6:[6]

library("stringr")

lapply(str_split(dc1$V2, "\\|"), function(x) x[-1])

[[1]]
[1] "Box"

[[2]]
[1] "Office" "Ball"  

[[3]]
[1] "Office"

[[4]]
[1] "Office"

[[5]]
[1] "Box"

[[6]]
[1] "Box"

Solution 7:[7]

This post is cold but if this helps someone:

strsplit(as.character(dc1[,2]),"^\\|") %>% 
               lapply(function(x){paste0(x, collapse="")})

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jdharrison
Solution 2 Math
Solution 3 akrun
Solution 4
Solution 5 lawyeR
Solution 6 dondapati
Solution 7 Vasco Pereira