'Java iTextPdf 7 : NullPointerException when removing data using PdfSweep
I'm trying to use itextPdf 7.2.1 and PdfSweep 3.0.0 to anonymise a set of documents. (Compatibility matrix show that it's ok)
The anonymization process is quite simple : I simply remove data in a specific x/y/w/h rectangle location.
For this I'm using the sample of code provided here : https://kb.itextpdf.com/home/it7kb/examples/removing-content-with-pdfsweep
protected void manipulatePdf(String dest) throws IOException {
PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(dest));
List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
// The arguments of the PdfCleanUpLocation constructor: the number of page to be cleaned up,
// a Rectangle defining the area on the page we want to clean up,
// a color which will be used while filling the cleaned area.
PdfCleanUpLocation location = new PdfCleanUpLocation(1, new Rectangle(97, 405, 383, 40),
ColorConstants.GRAY);
cleanUpLocations.add(location);
PdfCleaner.cleanUp(pdfDoc, cleanUpLocations);
pdfDoc.close();
}
This works very well for most of the documents and it's pretty fast, but a specific types of documents I have are not being processed properly, I always get the same exception when calling the close() method :
2022.02.09 14:56:40,465 [main] [ERROR] java.lang.NullPointerException
at com.itextpdf.kernel.pdf.PdfName.generateValue(PdfName.java:1021)
at com.itextpdf.kernel.pdf.PdfName.getValue(PdfName.java:989)
at com.itextpdf.kernel.pdf.PdfName.compareTo(PdfName.java:1002)
at com.itextpdf.kernel.pdf.PdfName.compareTo(PdfName.java:53)
at java.util.TreeMap.compare(TreeMap.java:1294)
at java.util.TreeMap.put(TreeMap.java:538)
at com.itextpdf.kernel.pdf.PdfDictionary.put(PdfDictionary.java:313)
at com.itextpdf.kernel.font.PdfType3Font.flushFontData(PdfType3Font.java:516)
at com.itextpdf.kernel.font.PdfType3Font.flush(PdfType3Font.java:363)
at com.itextpdf.kernel.pdf.PdfDocument.flushFonts(PdfDocument.java:2174)
at com.itextpdf.kernel.pdf.PdfDocument.close(PdfDocument.java:968)
at com.rur.script.inner.pdf.Pdf.removeZones(Pdf.java:40)
Since iTextPdf 7 is an Open Source project, I tried to dig in the code to try to find a solution. Apparently it's coming from the 'content' attribute null in a PdfName object.
I confirmed this by enabling assert, it indeed thrown an AssertionError in PdfName.java:960 :
/**
* Create a PdfName from the passed string
*
* @param value string value, shall not be null.
*/
public PdfName(String value) {
super();
assert value != null; // line 960
this.value = value;
}
I'm wondering how I may be able to solve or debug this or avoid this issue. As it is specific to a types of PDF files I have (I received thousands PDF files from different sources, only/all PDF from one source is facing this issue), I believe this is not a direct issue from the library itself.
Also, at first I was using iTextPdf 5.5.13 and I tried to upgrade to the last version in hope to fix this issue, unfortunately that didn't work.
I also tried to "resave" the document in Adobe Acrobat Reader DC to see if that would "fix" the PDF, althought the size increased a bit (53 KB -> 57 KB), the issue is still here.
Is there any way to debug this ? Any option I can enable or process I should follow ?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
