'Extract a word and numbers from a string in R
I have a problem that I think is too complicated. Although I have experience in R, this problem has driven me crazy. The example database is the following:
ID <- c("A123", "B123")
observation <- c("This codes are LIQ 1234 3453 2342 for the date 01-03-2022","For this ID are LIQ 3249 23 290 23402 this are for the date 01-02-2022")
df <- data.frame(ID,observation)
This data base look like this:

The problem I'm having is that I need to create a new row for each value that is after the word "LIQ". The result I need should look like this, the same as what is in the "new_variable" column:
I've been trying multiple ways but I can't figure out this problem. If anyone can help me I would be completely grateful. Thank you very much for your attention.
Solution 1:[1]
Here's a simple approach with the dplyr, stringr and tidyr:
df %>%
mutate(new_variable = str_extract(observation,"(?<=LIQ )[0-9 ]+(?= \\D)") %>%
str_split(" ")) %>%
unnest_longer(new_variable) %>%
mutate(new_variable = str_c("LIQ ",new_variable))
## A tibble: 7 × 3
# ID observation new_variable
# <chr> <chr> <chr>
#1 A123 This codes are LIQ 1234 3453 2342 for the date 01-03-2022 LIQ 1234
#2 A123 This codes are LIQ 1234 3453 2342 for the date 01-03-2022 LIQ 3453
#3 A123 This codes are LIQ 1234 3453 2342 for the date 01-03-2022 LIQ 2342
#4 B123 For this ID are LIQ 3249 23 290 23402 this are for the date 01-02-2022 LIQ 3249
#5 B123 For this ID are LIQ 3249 23 290 23402 this are for the date 01-02-2022 LIQ 23
#6 B123 For this ID are LIQ 3249 23 290 23402 this are for the date 01-02-2022 LIQ 290
#7 B123 For this ID are LIQ 3249 23 290 23402 this are for the date 01-02-2022 LIQ 23402
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ian Campbell |

