'R - Merge two elements of a list in an iterative pdf task
For a pdf mining task in R, I need your help.
I wish to mine 1061 multi-page pdf files with the file names pdf_filenames, for which I would like to extract the content of the first two pages of each pdf file.
So far, I have managed to get the content of all pdf files using the map function from the purrr library and pdf_text function from pdftools library.
> pdfs = pdf_filenames %>%
map(pdf_text)
This outputs a list with each element of the list representing one pdf file. The structure of the pdfs list is:
> str(pdfs)
List of 1061
$ : chr [1:3] "Content page 1_pdf1" "Content page 2_pdf1" "Content page 3_pdf1"
$ : chr [1:4] "Content page 1_pdf2" "Content page 2_pdf2" "Content page 3_pdf2" "Content page 4_pdf2"
$ : chr [1:2] "Content page 1_pdf3" "Content page 2_pdf3"
.
.
.
My desired output is:
List of 1061
$ : chr [1:2] "Content page 1_pdf1 Content page 2_pdf1" "Content page 3_pdf1"
$ : chr [1:3] "Content page 1_pdf2 Content page 2_pdf2" "Content page 3_pdf2" "Content page 4_pdf2"
$ : chr [1:1] "Content page 1_pdf3 Content page 2_pdf3"
.
.
.
I tried this map function
> pdfs = pdf_filenames %>%
map(pdf_text) %>%
map(c(1,2))
but that returned an empty list.
> pdfs
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
.
.
.
Appreciate your help very much! Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
