'Splitting data in column based on a word
Solution 1:[1]
You can do something like this:
library(stringr)
data %>%
mutate(speed = as.numeric(str_extract(Cpu, "\\d*[.]?\\d+(?=GHz$)")))
Solution 2:[2]
A slightly easier regex is this:
library(dplyr)
library(stringr)
df %>%
mutate(CPU_new = str_extract(Cpu, "[0-9.]+(?=GHz)"))
base R:
df$CPU_new <- str_extract(df$Cpu, "[0-9.]+(?=GHz)")
How this works:
[0-9.]+: character class allowing digits and the period occurring at least one or more times(?=GHz): positive lookahead asserting that the match to beextracted must be followed by the literal stringGHz
Solution 3:[3]
I think the other answer is better, but an alternative approach to using complicated regex is to extract just the 3 positions right before "GHz" using the stringr package:
Data:
df <- data.frame(ScreenResolution = paste("Test",LETTERS[1:3]),
Cpu = c("Intel Core i5 2.3GHz","Intel Core i5 1.8GHz",
"Intel Core i5 72000U 2.3GHz"),
Ram = "8GB")
Code:
library(stringr)
df$Cpu_new <- str_sub(df$Cpu, str_locate(df$Cpu, pattern = "GHz")[1]-4,
str_locate(df$Cpu, pattern = "GHz")[1]-1)
Output:
# ScreenResolution Cpu Ram Cpu_new
# 1 Test A Intel Core i5 2.3GHz 8GB 2.3
# 2 Test B Intel Core i5 1.8GHz 8GB 1.8
# 3 Test C Intel Core i5 72000U 2.3GHz 8GB 2.3
If you wanted it to be numeric, use as.numeric(str_sub(...))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | langtang |
| Solution 2 | Chris Ruehlemann |
| Solution 3 | jpsmith |

