'Is there a function to scrape the notes sections of Powerpoint Slides?
I am attempting to read through ~ 100 powerpoint slides and read the notes sections of each slide. I will do some text wrangling and write to csv after the fact, but need to get the notes in a workable format first.
I am working with the officer package, read_pptx function right now, but am open to whatever packages needed. It doesn't seem to pull in notes, but I may just be looking at this wrong.
To show a bit of what I've tried -->
library(officer)
ppt_var <- read_pptx('test_presentation.pptx')
view(ppt_var)
Ideally, I could get the text of each notes slide added to individual variables to write to a csv. I am confident that I can handle the manipulation once I get the notes read in, but cannot seem to get that part down.
Thank you for any pointers or support!
Solution 1:[1]
How do do that is shown in the code here: https://github.com/davidgohel/officer/issues/117 .
The following is based on that code:
library(magrittr)
library(officer)
library(xml2)
p <- read_pptx("mypresentation.pptx")
notes_dir <- file.path(p$package_dir, "ppt", "notesSlides")
files <- list.files(pattern = ".xml$", path = notes_dir, full.names = TRUE)
Notes <- lapply(files,
. %>%
read_xml %>%
xml_find_all("//a:t") %>%
xml_text
)
Solution 2:[2]
Assuming you are using the Document.OpenXML dependencies in C#, a more native way would be:
public static SlidePart GetSlidePart(PresentationDocument pptxDoc, int index)
{
// Get the relationship ID of the first slide.
PresentationPart presentationPart = pptxDoc.PresentationPart;
OpenXmlElementList slideIds = presentationPart.Presentation.SlideIdList.ChildElements;
string relId = (slideIds[index] as SlideId).RelationshipId;
// Get the slide part from the relationship ID.
return (SlidePart)presentationPart.GetPartById(relId);
}
public static string GetNoteText(PresentationDocument pptxDoc, int index)
{
//Get the Slide Part
SlidePart slidePart = GetSlidePart(pptxDoc, index);
//Extract the Note text
return slidePart.NotesSlidePart.NotesSlide.InnerText.ToString();
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | jmerrill2001 |
