'Merge PDFs deal with images with any extension

I have a code that merges pdf files from multiple subfolders and the code is working well. But I am stuck at a point, there are some images with different extensions in the subfolders and I need to deal with these images as if they were pdf files so as to be merged in the same way

for key, value in grouped_files.items():
    print('Processing PDF Merger -->', key)
    pdfs = value
    merger = PdfFileMerger()

    for pdf in pdfs:
        merger.append(pdf)

    merger.write(os.path.join(os.getcwd(), OUTPUT_DIR, f'{key}.pdf'))
    merger.close()

The code throws an error when there is an image like that

    raise utils.PdfReadError("EOF marker not found")
PyPDF2.utils.PdfReadError: EOF marker not found

How can I deal with the image as a pdf to merge it with the other files? I had an idea but couldn't implement it, to check if the extension of the file is jpg or png then to convert it to pdf file before merging then merge it instead of the image

Here's the full code to make the question more cleared

from pathlib import Path
from PyPDF2 import PdfFileMerger
import os

def list_files(dir):
    r = []
    for root, dirs, files in os.walk(dir):
        for name in files:
            r.append(os.path.join(root, name))
    return r

BASE_DIR = Path.cwd()
MAIN_DIR = BASE_DIR / 'MAIN'
OUTPUT_DIR = BASE_DIR / 'OUTPUT'

try:
    shutil.rmtree(OUTPUT_DIR)
except:
    pass
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

mylist = list_files(MAIN_DIR)
grouped_files = {}

for el in mylist:
    file_name = el.split('\\')[-1].split('.')[0]
    if file_name not in grouped_files.keys():
        grouped_files[file_name] = []
    grouped_files[file_name].append(el)

for key, value in grouped_files.items():
    print('Processing PDF Merger -->', key)
    pdfs = value
    merger = PdfFileMerger()

    for pdf in pdfs:
        print(pdf)
        merger.append(pdf)

    merger.write(os.path.join(os.getcwd(), OUTPUT_DIR, f'{key}.pdf'))
    merger.close()

The code merge pdf files with the same name from all the subfolders inside the MAIN folder and at last the output pdf files after merging is stored in OUTPUT folder.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source