'How to extract rotated images from PDF with iText

I need to extract images from PDF. I know that some images are rotated 90 degrees (I checked with online tools).

I'm using this code:

PdfRenderListener:

public class PdfRenderListener : IExtRenderListener
{
    // other methods ...

    public void RenderImage(ImageRenderInfo renderInfo)
    {
        try
        {
            var mtx = renderInfo.GetImageCTM();
            var image = renderInfo.GetImage();
            var fillColor = renderInfo.GetCurrentFillColor();
            var color = Color.FromArgb(fillColor?.RGB ?? Color.Empty.ToArgb());
            var fileType = image.GetFileType();
            var extension = "." + fileType;
            var bytes = image.GetImageAsBytes();
            var height = mtx[Matrix.I22];
            var width = mtx[Matrix.I11];

            // rotated image
            if (height == 0 && width == 0)
            {
                var h = Math.Abs(mtx[Matrix.I12]);
                var w = Math.Abs(mtx[Matrix.I21]);
            }

            // save image
        }
        catch (Exception e)
        {
            Console.WriteLine(e);
        }
    }
}

When I save images with this code the rotated images are saved with distortion.

I have read this post iText 7 ImageRenderInfo Matrix contains negative height on Even number Pages and mkl answer.

In current transfromation matrix (mtx) I have these values:

0 841.9 0
-595.1 0 0
595.1 0 1

I know image rotated 90 degrees. How can I transform an image to get a normal image?



Solution 1:[1]

As @mkl mentioned, the true reason was not in the rotation of the image, but with the applied filter.

I analyzed the pdf file with iText RUPS and found that the image was encoded with a CCITTFaxDecode filter: RUPS screen

Next, I looked for ways to decode this filter and found these questions

  1. Extracting image from PDF with /CCITTFaxDecode filter.
  2. How to use Bit Miracle LibTiff.Net to write the image to a MemoryStream

I used the BitMiracle.LibTiff.NET library

I wrote this method:

    private byte[] DecodeInternal(byte[] rawBytes, int width, int height, int k, int bitsPerComponent)
    {
        var compression = GetCompression(k);

        using var ms = new MemoryStream();
        var tms = new TiffStream();

        using var tiff = Tiff.ClientOpen("in-memory", "w", ms, tms);
        tiff.SetField(TiffTag.IMAGEWIDTH, width);
        tiff.SetField(TiffTag.IMAGELENGTH, height);
        tiff.SetField(TiffTag.COMPRESSION, compression);
        tiff.SetField(TiffTag.BITSPERSAMPLE, bitsPerComponent);
        tiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
        var writeResult = tiff.WriteRawStrip(0, rawBytes, rawBytes.Length);
        if (writeResult == -1)
        {
           Console.WriteLine("Decoding error");
        }

        tiff.CheckpointDirectory();
        var decodedBytes = ms.ToArray();
        tiff.Close();

        return decodedBytes;
    }

        private Compression GetCompression(int k)
        {
            return k switch
            {
                < 0 => Compression.CCITTFAX4,
                0 => Compression.CCITTFAX3,
                _ => throw new NotImplementedException("K > 0"),
            };
        }

After decoding and rotating the image, I was able to save a normal image. Thanks everyone for the help.

Solution 2:[2]

You can try this. I'm using Itext 7 for java. Here you still need to write your own listener:

public class MyImageRenderListener implements IEventListener {

protected String path;

protected String extension;

public MyImageRenderListener (String path) {
    this.path = path;
}

public void eventOccurred(IEventData data, EventType type) {
    switch (type) {
        case RENDER_IMAGE:
            try {
                String filename;
                FileOutputStream os;
                ImageRenderInfo renderInfo = (ImageRenderInfo) data;
                PdfImageXObject image = renderInfo.getImage();
                if (image == null) {
                    return;
                }
                byte[] imageByte = image.getImageBytes(true);
                extension = image.identifyImageFileExtension();
                filename = String.format(path, image.getPdfObject().getIndirectReference().getObjNumber(), extension);
                os = new FileOutputStream(filename);
                os.write(imageByte);
                os.flush();
                os.close();
            } catch (com.itextpdf.io.exceptions.IOException | IOException e) {
                System.out.println(e.getMessage());
            }
            break;

        default:
            break;
    }
}

public Set<EventType> getSupportedEvents() {
    return null;
}
}

I checked for a pdf with a random rotation angle, and 90 degrees, the resulting picture was obtained without distortion

public void manipulatePdf() throws IOException, SQLException, ParserConfigurationException, SAXException {
    PdfDocument pdfDoc = new PdfDocument(new PdfReader("path to pdf"), new PdfWriter(new ByteArrayOutputStream()));
    MyImageRenderListener listener = new MyImageRenderListener("path to resulting image");

    PdfCanvasProcessor parser = new PdfCanvasProcessor(listener);
    for (int i = 1; i <= pdfDoc.getNumberOfPages(); i++) {
        parser.processPageContent(pdfDoc.getPage(i));
    }
    pdfDoc.close();
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Zhenya Prudnikov