'Is there anyway to read the font size from the pdf. I'm trying to extract a specific part of text which differs in font size

I'm trying to extract a specific part of text from a pdf using C#. It always starts with a specific keyword and the next line of that keyword will have different font size. The text may be available in many region of pdf. So I thought the only way to extract is to use pdf font size. I have tried various libraries C#. which does not help me.



Solution 1:[1]

I answer in java, but the c# api is similar. In iText, various strategies are used to extract text from a pdf document. They all implement the ITextExtractionStrategy interface. There are already several strategies in the library, but they do not implement the ability to get the font size. But you can easily write it yourself

Main

public class Main {

public static final String DEST
        = " ExtractPageContent.txt";
public static final String PREFACE
        = "preface.pdf";


public static void main(String[] args) throws IOException {
    PrintWriter out = new PrintWriter(new OutputStreamWriter(new FileOutputStream(DEST), "UTF-8"));
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(PREFACE));


    CustomTextExtractionStrategy strategy = new CustomTextExtractionStrategy();
    
    PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy);
    for (int i = 1; i <= pdfDoc.getNumberOfPages(); i++) {
        parser.processPageContent(pdfDoc.getPage(i));
    }
    out.flush();
    out.close();
}
}

All the information about the text is in the TextRenderInfo, so I override only the eventOccurred() method, because it uses it. If necessary, you can override others, for example getResultantText() (examples can be found in other TextExtractionStrategy classes)

public class CustomTextExtractionStrategy implements ITextExtractionStrategy {

@Override
public String getResultantText() {
    return null;
}

@Override
public void eventOccurred(IEventData iEventData, EventType eventType) {
    if (eventType.equals(EventType.RENDER_TEXT)) {
        TextRenderInfo renderInfo = (TextRenderInfo) iEventData;
        String text = renderInfo.getText();

        Vector curBaseline = renderInfo.getBaseline().getStartPoint();
        Vector topRight = renderInfo.getAscentLine().getEndPoint();

        Rectangle rect = new Rectangle(curBaseline.get(0), curBaseline.get(1), topRight.get(0), topRight.get(1));
        float curFontSize = rect.getHeight();
        System.out.println("Text: " + text + " FontSize: " + curFontSize);
    }
}

@Override
public Set<EventType> getSupportedEvents() {
    return null;
}
}

Now with curFontSize, you can do any manipulations you need

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Zhenya Prudnikov