'Write specific pages from multiple pdf files to a new pdf file

I have multiple pdf files that I want to extract a group of specific pages from where each set of pages is different for each pdf file. I have created a dictionary with the keys as the pdf file name and the values as the list of pages to be extracted from each pdf file (shown as key). I intend to extract the given pages from the associated pdf file and write them all to one new pdf file so that I can do data extraction on this final file. I have tried PyPDF4 as well as FPDF but no joy as yet as it gives me either a large pdf with blank pages or a pdf with just 1 or 2 pages extracted or error that the pdf object cannot be found. I am hoping to get some guidance on where I am going wrong with my approach. Below is my code:

import PyPDF4
from PyPDF4 import PdfFileReader, PdfFileWriter

for pdf,pgs in dic_11_1.items():
  pdf=list(dic_11_1.keys())
  pgs=list(dic_11_1.values())
  for i in range(0,len(pdf)):
    pages = pgs[i]
    object = open(pdf[i],'rb') 
    pdfinput=PyPDF4.PdfFileReader(object,'rb')
    if pdfinput.isEncrypted:
        pdfinput.decrypt('')
    else:
        pdfinput
    for p in pages:
        page=pdfinput.getPage(p)
        pdf_writer=PyPDF4.PdfFileWriter()
        pdf_writer.addPage(page)
        with open('F111.pdf',mode='wb') as output:
            pdf_writer.write(output)

The error that I get is 'PdfReadError: Could not find object.'

When I try FPDF with the following code, it runs a long time and gives me a large empty pdf file:

from fpdf import FPDF 
import os
for pdf,pgs in dic_11_1.items():
  pdf_in=open(pdf,'rb')
  inputpdf=PdfFileReader(pdf_in,'rb')
  if inputpdf.isEncrypted:
    inputpdf.decrypt('')
  else:
    inputpdf
  for p in pgs:
    content=inputpdf.getPage(p).extractText()
    pdf = FPDF('P','mm','A4') 
    pdf.add_page() 
    pdf.set_font("arial", size = 10) 
    for text in content: 
        text2=text.encode('latin-1', 'replace').decode('latin-1')
        pdf.write(10,text2) 
        pdf.ln(8)
        pdf.close()
        return_byte_string=pdf.output('F_11_1.pdf','S').encode('latin-1')
    pdf_file=open('F_11_1.pdf','wb')
    pdf_file.write(return_byte_string)
    pdf_file.close()

Any guidance would be greatly appreciated. Thank you in advance

The solution provided by @SUTerliakov was great but only wrote the last page or last document from the the dictionary values list of pages. It was resolved with a minor indentation in the code and that got all my data for me. Thanks again @SUTerliakov for starting me on the correct path! Here is your adjusted code:

pdf_writer = PdfFileWriter()
open_files = []

try:
   for filename, pgs in dic_11_1.items():
      src = open(filename, 'rb')
      open_files.append(src)
 
      pdfinput = PdfFileReader(src, 'rb')

      if pdfinput.isEncrypted:
        pdfinput.decrypt('')
        print(f'Extracting relevant pages from {filename} to central repository') 
        for p in pgs:
           print(f'{filename} pg{str(p)}')
           pdf_writer.addPage(pdfinput.getPage(p))
        print(f'Writing {len(pgs)} pages to central file')
   Stream=open('F_11_1.pdf','wb')    
   pdf_writer.write(Stream)
finally:
   print('Closing Source File...')
   for f in open_files:
      f.close()

python pdf fpdf

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Write specific pages from multiple pdf files to a new pdf file

Sources

Related Questions