'OCRMYPDF: 'pages' parameter not working as expected even with optimization disabled

I'm using ocrmypdf and I just want the first page of the files to have their characters recognized. I'm trying to do this with

ocrmypdf -l por --force-ocr --pages 1 --optimize 0 input.pdf output.pdf

but even then it outputs

Start processing 10 pages concurrently

the files are in portuguese and some of them have text with fonts that I can't read in python because the string becomes a lot of "(cid:)" that's why I use --force-ocr.

Also I have a lot of files (the files are actually a parameter for an application I'm developing), so this is taking too much time.

My operating system is Windows if it helps somehow.



Solution 1:[1]

When maven "translates" your source code to a package, it changes the folder structure.

In a jar packaging:

  • src/main/java sources compiled go to jar's root (keeping java packages as a folder structure)
  • src/main/resources go to jar's root too.

So your file, once the jar is packaged, is in the root of the archive. Actually jar files are just zip files with a different extension, so you can use any zip manager to open it and explore it.

And to access the file do it exacrly as you are doing it, loading it as a resource from the class loader of the jar. Any class from your jar will do, as it delegates this to its class loader. Just change the path:

InputStream is = Main.class.getResourceAsStream("/credentials.json");

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1