'R - Extracting numeric values from multiple txt files
I've been trying to extract certain values from multiple text files.
dataFiles<-lapply(Sys.glob("treedata*SAMPLE01*ID97*.txt"),read.csv,header=FALSE)
dataFiles
data<-data.frame(dataFiles)
data[grepl("^DBHqsm",data$V1),]
data2<-data[grepl("^DBHqsm*",data$V1),]
data2
This gives me this so far as a data.frame of character strings, I want to be able to extract just the numnbers now from this including the decimal point, tried using regmatches and gregexpr but that removes the .
V1 V1.1 V1.2 V1.3 V1.4
13 DBHqsm\t 0.05145 DBHqsm\t 0.05189 DBHqsm\t 0.05245 DBHqsm\t 0.05049 DBHqsm\t 0.05393
V1.5 V1.6 V1.7
13 DBHqsm\t 0.05126 DBHqsm\t 0.0506 DBHqsm\t 0.04977
Thanks for the help!
Solution 1:[1]
The following regex removes all non numerica characters preceding any amount of numbers that are followed by a dot.
We use a a lookahead assertion (?=) Note that we need perl = TRUE.
It should work if you data always follows the pattern you have shown, for example:
gsub(x = "DBHqsm\t 0.05126", pattern = "\\D*(?=\\d*?\\.)", replacement = "", perl = TRUE)```
And, of course you can do:
data.frame(
lapply(iris, function(x) {
gsub(x = x, pattern = "\\D*(?=\\d*?\\.)", replacement = "", perl = TRUE))
})
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Juan Bosco |
