'How to extract data from a particular area in a PDF file

See this pdf

I want this data from this pdf

<?php
$data = array(
 "CertificateID" => "91815380284",
 "BeneficiaryName"=>"Kavita",
 "Gender" => "Female",
 "IDVerified" => "Aadhaar # XXXXXXXX3661",
 "BeneficiaryReferenceID" => "34684952644017",
 "VaccinationStatus" => "Fully Vaccinated (2 Doses)"
);
?>

Solution 1:^[1]

The actual task you want to perform is not suited for PHP.

It is better to use external program, like pdftotext https://www.xpdfreader.com/pdftotext-man.html.

The main problem with PDF is that they are not simply text file, but a binary.

You can invoke the pdftotext with shell_exec command and get the output in PHP for further processing.

After you have the parsed PDF data in PHP just use regexes to get the line you want.

This should be the framework to work with for PDFs like this one.

Other ways is to use more advanced techniques - python has a lot of good libraries for this job:

https://textract.readthedocs.io/en/latest/
https://tabula.technology/
and more

Solution 2:^[2]

You nominated pdftotext as your application and your sample is regular enough to use command line cropping so for a text data output file like

you can use

pdftotext -nopgbrk -marginl 200 -margint 150 -marginb 500 -layout "certificate (9).pdf" test.txt

However since your code does not show a conversion method you will need to adapt the output lines (either ignoring age or extracting two or more chunks) to get your desired

 "CertificateID" => "91815380284",
 "BeneficiaryName"=>"Kavita",
 "Gender" => "Female",
 "IDVerified" => "Aadhaar # XXXXXXXX3661",
 "BeneficiaryReferenceID" => "34684952644017",
 "VaccinationStatus" => "Fully Vaccinated (2 Doses)"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Tzvetan Koshutanski
Solution 2

'How to extract data from a particular area in a PDF file

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]