'Java iTextPdf 7 : NullPointerException when removing data using PdfSweep

I'm trying to use itextPdf 7.2.1 and PdfSweep 3.0.0 to anonymise a set of documents. (Compatibility matrix show that it's ok)

The anonymization process is quite simple : I simply remove data in a specific x/y/w/h rectangle location.

For this I'm using the sample of code provided here : https://kb.itextpdf.com/home/it7kb/examples/removing-content-with-pdfsweep

protected void manipulatePdf(String dest) throws IOException {
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(dest));

    List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();

    // The arguments of the PdfCleanUpLocation constructor: the number of page to be cleaned up,
    // a Rectangle defining the area on the page we want to clean up,
    // a color which will be used while filling the cleaned area.
    PdfCleanUpLocation location = new PdfCleanUpLocation(1, new Rectangle(97, 405, 383, 40),
            ColorConstants.GRAY);
    cleanUpLocations.add(location);

    PdfCleaner.cleanUp(pdfDoc, cleanUpLocations);

    pdfDoc.close();
}

This works very well for most of the documents and it's pretty fast, but a specific types of documents I have are not being processed properly, I always get the same exception when calling the close() method :

2022.02.09 14:56:40,465 [main] [ERROR] java.lang.NullPointerException
        at com.itextpdf.kernel.pdf.PdfName.generateValue(PdfName.java:1021)
        at com.itextpdf.kernel.pdf.PdfName.getValue(PdfName.java:989)
        at com.itextpdf.kernel.pdf.PdfName.compareTo(PdfName.java:1002)
        at com.itextpdf.kernel.pdf.PdfName.compareTo(PdfName.java:53)
        at java.util.TreeMap.compare(TreeMap.java:1294)
        at java.util.TreeMap.put(TreeMap.java:538)
        at com.itextpdf.kernel.pdf.PdfDictionary.put(PdfDictionary.java:313)
        at com.itextpdf.kernel.font.PdfType3Font.flushFontData(PdfType3Font.java:516)
        at com.itextpdf.kernel.font.PdfType3Font.flush(PdfType3Font.java:363)
        at com.itextpdf.kernel.pdf.PdfDocument.flushFonts(PdfDocument.java:2174)
        at com.itextpdf.kernel.pdf.PdfDocument.close(PdfDocument.java:968)
        at com.rur.script.inner.pdf.Pdf.removeZones(Pdf.java:40)

Since iTextPdf 7 is an Open Source project, I tried to dig in the code to try to find a solution. Apparently it's coming from the 'content' attribute null in a PdfName object.

I confirmed this by enabling assert, it indeed thrown an AssertionError in PdfName.java:960 :

/**
 * Create a PdfName from the passed string
 *
 * @param value string value, shall not be null.
 */
public PdfName(String value) {
    super();
    assert value != null; // line 960
    this.value = value;
}

I'm wondering how I may be able to solve or debug this or avoid this issue. As it is specific to a types of PDF files I have (I received thousands PDF files from different sources, only/all PDF from one source is facing this issue), I believe this is not a direct issue from the library itself.

Also, at first I was using iTextPdf 5.5.13 and I tried to upgrade to the last version in hope to fix this issue, unfortunately that didn't work.

I also tried to "resave" the document in Adobe Acrobat Reader DC to see if that would "fix" the PDF, althought the size increased a bit (53 KB -> 57 KB), the issue is still here.

Is there any way to debug this ? Any option I can enable or process I should follow ?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source