'While converting a PDF to regular text using PDFBox, how can I surround superscript text with braces?
Using PDFBox I want to convert a very large PDF file into regular text. I would like to mark any supertext with braces. Being relatively new to PDFBox, how can I surround superscript text with braces?
Example:
PDF: This is text with the X being superscript.
Output: This is text with the (X) being superscript.
Hope you can help. I have seen this post, but that one does not give an easy approach.
My code so far is:
try (PDDocument document = PDDocument.load( new File("files/my-input.pdf"));
FileWriter fileWriter = new FileWriter( "files/my-output.txt")) {
PDFTextStripper tStripper = new PDFTextStripper();
int numberOfPages = document.getNumberOfPages();
for( int i = 1; i <= numberOfPages; i++) {
tStripper.setStartPage(i);
tStripper.setEndPage(i);
tStripper.writeText( document, fileWriter);
fileWriter.flush();
}
}
Subclassing the PDFTextStripper class and simply overruling the writeString() does not work because it will interfere with the original method. The string.getHeight() shows the heigth of the character - so could be used.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
