'Itext 7 bug with GetCtm() on pages>1
Context:
- extract text from a pdf
- using the IEventListener - TextRenderInfo
- a pdf document with more than one page
- c# .net core program
Issue: To calculate the exact X,Y position of a text I use this code:
var textMatrix =textRenderInfo.GetTextMatrix().Multiply(textRenderInfo.GetGraphicsState().GetCtm());
float X = textMatrix.Get(6);
float Y = textMatrix.Get(7);
This works ok for the first page. For subsequent pages the CTM seems to be calculated to: Power(ctm, pagenumber) and the X,Y result is obviously not correct.
More clarification: I have a document with a date repeated on every page on the exact same location. By consequence, it's text matrix is the same on every page. But the CTM looks like this for page 1:
{0,05 0 0
0 0,05 0
0 0 1}
For page 2:
{0,0025000002 0 0
0 0,0025000002 0
0 0 1}
For page 3:
{0,000125 0 0
0 0,000125 0
0 0 1}
Etc ... So it looks that each value is powered by the pagenumber. Could this be a bug?
Solution 1:[1]
Could this be a bug?
More likely a case of incorrect API usage...
Unfortunately you don't show your pivotal code. I assume, though, that you re-use the same PdfCanvasProcessor for all pages. Have you considered the note in the ProcessPageContent documentation?
/// <summary>Processes PDF syntax.</summary>
/// <remarks>
/// Processes PDF syntax.
/// <strong>Note:</strong> If you re-use a given
/// <see cref="PdfCanvasProcessor"/>
/// , you must call
/// <see cref="Reset()"/>
/// </remarks>
/// <param name="page">the page to process</param>
public virtual void ProcessPageContent(PdfPage page)
I.e.
Note: If you re-use a given
PdfCanvasProcessor, you must callReset()[betweenProcessPageContentcalls]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mkl |
