'Search numeric value upto 3 words before a particular word
Is it possible to search for numeric value upto 3 words before a particular word let's say years via regex? In the example below I am searching for a word before years, it works but if you look at third element it returns more. Here I need 2 instead. Pattern of XX or more years is not fixed, hence I am trying to find numeric value upto 3 words before years
Description <- c("Candidate having bachelor degree. Minimum 5 years in R", "Excellent academic background plus 3 years of experience in Python", "Analytics Professionals having minimum of 2 or more years of experience", "Candidate possessing credit risk experience plus 2+ years in Python", "Candidate possessing credit risk experience plus two or more years in Python")
[1] "Candidate having bachelor degree. Minimum 5 years in R"
[2] "Excellent academic background plus 3 years of experience in Python"
[3] "Analytics Professionals having minimum of 2 or more years of experience"
[4] "Candidate possessing credit risk experience plus 2+ years in Python"
[5] "Candidate possessing credit risk experience plus two or more years in Python"
Code
str_extract(Description, "\\w+(\\+)?(?= +years(\\s+of)?(\\s+programming|experience)?\\b)")
[1] "5" "3" "more" "2+"
Solution 1:[1]
We may use a named vector to replace the english elements to numeric, and then do the extraction
library(stringr)
library(english)
as.numeric(str_replace(str_replace_all(Description,
setNames(as.character(1:9), as.character(english(1:9)))),
".*\\b([0-9]+)\\b[^0-9]+\\byears.*", "\\1"))
-output
[1] 5 3 2 2 2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
