'Itext 7 bug with GetCtm() on pages>1

Context:

  • extract text from a pdf
  • using the IEventListener - TextRenderInfo
  • a pdf document with more than one page
  • c# .net core program

Issue: To calculate the exact X,Y position of a text I use this code:

var textMatrix =textRenderInfo.GetTextMatrix().Multiply(textRenderInfo.GetGraphicsState().GetCtm());
float X = textMatrix.Get(6);
float Y = textMatrix.Get(7);

This works ok for the first page. For subsequent pages the CTM seems to be calculated to: Power(ctm, pagenumber) and the X,Y result is obviously not correct.

More clarification: I have a document with a date repeated on every page on the exact same location. By consequence, it's text matrix is the same on every page. But the CTM looks like this for page 1:

{0,05   0   0
0   0,05    0
0   0   1}

For page 2:

{0,0025000002   0   0
0   0,0025000002    0
0   0   1}

For page 3:

{0,000125   0   0
0   0,000125    0
0   0   1}

Etc ... So it looks that each value is powered by the pagenumber. Could this be a bug?



Solution 1:[1]

Could this be a bug?

More likely a case of incorrect API usage...

Unfortunately you don't show your pivotal code. I assume, though, that you re-use the same PdfCanvasProcessor for all pages. Have you considered the note in the ProcessPageContent documentation?

/// <summary>Processes PDF syntax.</summary>
/// <remarks>
/// Processes PDF syntax.
/// <strong>Note:</strong> If you re-use a given
/// <see cref="PdfCanvasProcessor"/>
/// , you must call
/// <see cref="Reset()"/>
/// </remarks>
/// <param name="page">the page to process</param>
public virtual void ProcessPageContent(PdfPage page)

(PdfCanvasProcessor.cs)

I.e.

Note: If you re-use a given PdfCanvasProcessor, you must call Reset() [between ProcessPageContent calls]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mkl