'While converting a PDF to regular text using PDFBox, how can I surround superscript text with braces?

Using PDFBox I want to convert a very large PDF file into regular text. I would like to mark any supertext with braces. Being relatively new to PDFBox, how can I surround superscript text with braces?

Example:

PDF: This is text with the X being superscript.
Output: This is text with the (X) being superscript.

Hope you can help. I have seen this post, but that one does not give an easy approach.

My code so far is:

try (PDDocument document = PDDocument.load( new File("files/my-input.pdf"));
  FileWriter fileWriter = new FileWriter( "files/my-output.txt")) {
  PDFTextStripper tStripper = new PDFTextStripper();
  int numberOfPages = document.getNumberOfPages();
  for( int i = 1; i <= numberOfPages; i++) {
      tStripper.setStartPage(i);
      tStripper.setEndPage(i);
      tStripper.writeText( document, fileWriter);
      fileWriter.flush();
  }
}

Subclassing the PDFTextStripper class and simply overruling the writeString() does not work because it will interfere with the original method. The string.getHeight() shows the heigth of the character - so could be used.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source