'Is it possible to filter out references, footnotes of article pdfs (one column or two column) using python?

I have to extract main texts from an article pdf. The articles are of different formats. Some in one column, some in two, some have titles of sections and subsections, and some haven't.

Is there any way to filter out unwanted sections (references, footnotes, author details, publication details) from the article pdfs using PyPdf2, pdfminer, or other tools or techniques?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source