'Download an image based on color detection with-in a pdf document
I have a lot of pdf document to scrap in order to download the images they got. I know how to do it with Python, but there's a problem some times: the images aren't encoded such as, but more like svg elements, so you can copy-paste the text included inside it, but they cannot be detected as image, as I was doing.
I couldn't come up with a solution, when I noticed that all my images are surronded by a grey-rectangle & background, such as the example given below:
The solution I came up with is:
- First, turn the pdf page as png (or any other image file).
- Detect all the grey pixels within a page
- Consider the boundaries and export what is inside as an image.
But I have no idea where to start / to look at to turn this idea into something working.
Could you help me please? Or do you have any other idea in order to achieve what I'm trying to do?
Thanks!
And sorry for not providing any file example, but as they're sensitive data, I couldn't
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
