'Extract a word and numbers from a string in R

I have a problem that I think is too complicated. Although I have experience in R, this problem has driven me crazy. The example database is the following:

ID <- c("A123", "B123")
observation <- c("This codes are LIQ 1234 3453 2342 for the date 01-03-2022","For this ID are LIQ 3249 23 290 23402 this are for the date 01-02-2022")

df <- data.frame(ID,observation)

This data base look like this: enter image description here

The problem I'm having is that I need to create a new row for each value that is after the word "LIQ". The result I need should look like this, the same as what is in the "new_variable" column:

enter image description here

I've been trying multiple ways but I can't figure out this problem. If anyone can help me I would be completely grateful. Thank you very much for your attention.



Solution 1:[1]

Here's a simple approach with the dplyr, stringr and tidyr:

df %>%
   mutate(new_variable = str_extract(observation,"(?<=LIQ )[0-9 ]+(?= \\D)") %>% 
                           str_split(" ")) %>%
   unnest_longer(new_variable) %>%
   mutate(new_variable = str_c("LIQ ",new_variable))
## A tibble: 7 × 3
#  ID    observation                                                            new_variable
#  <chr> <chr>                                                                  <chr>       
#1 A123  This codes are LIQ 1234 3453 2342 for the date 01-03-2022              LIQ 1234    
#2 A123  This codes are LIQ 1234 3453 2342 for the date 01-03-2022              LIQ 3453    
#3 A123  This codes are LIQ 1234 3453 2342 for the date 01-03-2022              LIQ 2342    
#4 B123  For this ID are LIQ 3249 23 290 23402 this are for the date 01-02-2022 LIQ 3249    
#5 B123  For this ID are LIQ 3249 23 290 23402 this are for the date 01-02-2022 LIQ 23      
#6 B123  For this ID are LIQ 3249 23 290 23402 this are for the date 01-02-2022 LIQ 290     
#7 B123  For this ID are LIQ 3249 23 290 23402 this are for the date 01-02-2022 LIQ 23402 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ian Campbell