'How do I get only a portion of this string to populate a column in R? [closed]
I have a data column called col1
in a dataframe df
with each value formatted something like the following:
{"option1":"option2","option3":4,"options":[0.1,0.9]}
How do I clean this up so that each value in this field only reads the first number in the hard brackets? (i.e. 0.1)
Solution 1:[1]
string <- '{"option1":"option2","option3":4,"options":[0.1,0.9]}'
(a <- jsonlite::fromJSON(string))
$option1
[1] "option2"
$option3
[1] 4
$options
[1] 0.1 0.9
if you want the first value:
a$options[1]
[1] 0.1
Solution 2:[2]
Using gsub twice, remove all up to "["
, then remove anything after ","
, and convert to numeric:
x <- '{"option1":"option2","option3":4,"options":[0.1,0.9]}'
as.numeric(gsub(",.*", "", gsub(".*\\[", "", x)))
# [1] 0.1
(There must a better single pass regex solution)
Solution 3:[3]
df <- data.frame(col1 = '{"option1":"option2","option3":4,"options":[0.1,0.9]}')
df %>%
mutate(
num1 = as.numeric(gsub('.*\\[([\\d.]+)(.*)', '\\1', col1, perl = T))
)
col1 num1
1 {"option1":"option2","option3":4,"options":[0.1,0.9]} 0.1
Solution 4:[4]
Here is an alternative solution using parse_number
from readr
package combined with stringr
s str_extract
:
\\[.*?\\]
... matches all between square brackets and parse_number
gets the first number:
library(readr)
library(stringr)
parse_number(str_extract(string, '\\[.*?\\]'))
[1] 0.1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | onyambu |
Solution 2 | zx8754 |
Solution 3 | jdobres |
Solution 4 | TarJae |