'Merge PDFs deal with images with any extension
I have a code that merges pdf files from multiple subfolders and the code is working well. But I am stuck at a point, there are some images with different extensions in the subfolders and I need to deal with these images as if they were pdf files so as to be merged in the same way
for key, value in grouped_files.items():
print('Processing PDF Merger -->', key)
pdfs = value
merger = PdfFileMerger()
for pdf in pdfs:
merger.append(pdf)
merger.write(os.path.join(os.getcwd(), OUTPUT_DIR, f'{key}.pdf'))
merger.close()
The code throws an error when there is an image like that
raise utils.PdfReadError("EOF marker not found")
PyPDF2.utils.PdfReadError: EOF marker not found
How can I deal with the image as a pdf to merge it with the other files? I had an idea but couldn't implement it, to check if the extension of the file is jpg or png then to convert it to pdf file before merging then merge it instead of the image
Here's the full code to make the question more cleared
from pathlib import Path
from PyPDF2 import PdfFileMerger
import os
def list_files(dir):
r = []
for root, dirs, files in os.walk(dir):
for name in files:
r.append(os.path.join(root, name))
return r
BASE_DIR = Path.cwd()
MAIN_DIR = BASE_DIR / 'MAIN'
OUTPUT_DIR = BASE_DIR / 'OUTPUT'
try:
shutil.rmtree(OUTPUT_DIR)
except:
pass
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
mylist = list_files(MAIN_DIR)
grouped_files = {}
for el in mylist:
file_name = el.split('\\')[-1].split('.')[0]
if file_name not in grouped_files.keys():
grouped_files[file_name] = []
grouped_files[file_name].append(el)
for key, value in grouped_files.items():
print('Processing PDF Merger -->', key)
pdfs = value
merger = PdfFileMerger()
for pdf in pdfs:
print(pdf)
merger.append(pdf)
merger.write(os.path.join(os.getcwd(), OUTPUT_DIR, f'{key}.pdf'))
merger.close()
The code merge pdf files with the same name from all the subfolders inside the MAIN folder and at last the output pdf files after merging is stored in OUTPUT folder.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
