'To extract table as a pandas dataframe or a csv from Json output of Azure form recognizer

this is my problem. I have this huge Json extract as output from Azure form Recognizer.What I need is to extract the two tables as shown in the screen shoot 1st table as shown in pdf

2nd table as shown inpdf

. The Json output file has both the objects extracted from Azure form recoognizer (Json file and both the pdf attached for your kind reference). I need to extract both the tables in a pandas df and append them as one table and then take the output as CSV. Could anyone please help in this regard.

Json and Pdf file link here (since there is no way to attach the file directly here) --> https://drive.google.com/drive/folders/18gAPDuXsp8Td9WysoNcH_l1HoijOf8BK?usp=sharing



Solution 1:[1]

tabula does an okay job with the PDF, especially for how minimal the work is~

import tabula

df = tabula.read_pdf('file.pdf', pages=[1,2], multiple_tables=False, pandas_options={'header': 3, 'skipfooter':1, 'engine': 'python'})[0]
print(df)

Output:

               Unnamed: 0                                         Unnamed: 1  \
0                  Pickle     20 Pickle Drive, Whitedale,\rMARINA BEACH 0632   
1                  Pickle     20 Pickle Drive, Whitedale,\rMARINA BEACH 0632   
2        Pimbiliki pilapi                   23 Popes Road,\rDIGIRIDAPPA 2105   
3        Pimbiliki pilapi                   23 Popes Road,\rDIGIRIDAPPA 2105   
4                  Towers  Towers Tower 1, Tower\r1/2-6 Gilmour , Beijing...   
5   Customers\rHouse Quay  Level 5 Unit 1/36\rCustomhouse Quay\rBeijing C...   
6           Orbital Drive         6 Orbit Drive, Whudale,\rMARINA BEACH 0632   
7                 Sa palo             84 Smith Street, Sa Aro,\rBEIJING 6011   
8                 Chennai  395 Madras Street,\rChennai Central,\rCHENNAI ...   
9                Mountain         25 Mountain, Freemans\rNay, MADURAI 628002   
10               Mountain         25 Mountain, Freemans\rBay, MADURAI 628002   
11                 Gandhi  Ground,3/25Gandhi\rStreet,MaduraiCentral,\rMAD...   
12                  Total                                                NaN   

    Unnamed: 2                                         Unnamed: 3   Unnamed: 4  
0        500.0   BAN Service Premium\rHigh Availability - Primary      $ 25.00  
1        500.0  BAN Service Premium\rHigh Availability - Secon...      $ 25.00  
2        500.0   BAN Service Premium\rHigh Availability - Primary      $ 25.00  
3        500.0  BAN Service Premium\rHigh Availability - Secon...      $ 25.00  
4         50.0                                BAN Service Premium    $ 9000.00  
5        500.0                                BAN Service Premium  $ 1,9000.00  
6        100.0                                BAN Service Premium     $ 560.00  
7        100.0                                BAN Service Premium     $ 560.00  
8         50.0                                BAN Service Premium    $ 9000.00  
9        500.0   BAN Service Premium\rHigh Availability - Primary   $ 1,150.00  
10       500.0  BAN Service Premium\rHigh Availability - Secon...   $ 1,200.00  
11       100.0                                BAN Service Premium     $ 850.00  
12         NaN                                                NaN  $ 10,746.00

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BeRT2me