'How to read a textfile line by line in R and search for a special string?
I've got thousands of textfiles which 10-thousands of lines in different structure in a textfile. It looks like the following 3 lines:
DATE#2020-10-08#TIME#16:00:04#__JOBTYPE#ANFRAGE#__PATH#16 16 16 16 16#REFERENZ=23#REFERENZ*23°__PATH°16 16#
DATE#2020-10-08#__JOBTYPE#ANFRAGE#__PATH#16 16 16 16 16#REFERENZ*24°__PATH°16 16#
DATE#2020-10-08#TIME#16:00:04#__JOBTYPE#ANFRAGE#REFERENZ=25#__PATH#17 16 16 18 16
A # symbolizes normally a break between name of data and information. Sometimes there is another deeper level where # changes to ° and = changes to *. The lines in the original data have got about 10.000 signs per line. I am searching in each line just for the REFERENZ which can apear multiple times. E.g. in line 1.
The result of the read-function for this 3 lines should be a data.frame like this:
> Daten = data.frame(REFERENZ = c(23,24,25))
> str(Daten)
'data.frame': 3 obs. of 1 variable:
$ REFERENZ: num 23 24 25
Dies anybody knows a function in R which can search for this?
Solution 1:[1]
I am using read_lines()
function from readr
package for problem like that.
library(readr)
library(data.table)
t1 <- read_lines('textfile.txt')
table <- fread(paste0(t1, collapse = '\n'), sep = '#')
EDIT: I misunderstood the question, my bad. I think you are looking for REGEX.
library(readr)
library(stringr)
t1 <- 'DATE#2020-10-08#TIME#16:00:04#__JOBTYPE#ANFRAGE#__PATH#16 16 16 16 16#REFERENZ=23#REFERENZ*23°__PATH°16 16#
DATE#2020-10-08#__JOBTYPE#ANFRAGE#__PATH#16 16 16 16 16#REFERENZ*24°__PATH°16 16#
DATE#2020-10-08#TIME#16:00:04#__JOBTYPE#ANFRAGE#REFERENZ=25#__PATH#17 16 16 18 16'
t1 <- read_lines(t1)
Daten = data.frame(REFERENZ = str_extract(str_extract(t1, 'REFERENZ\\W\\d*'), '[0-9]+'))
str(Daten)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |