'HOCR Combine from input subfolder to output subfolder
I was looking for this function where I could combine the hocr data available in a single subfolder with a file name to similar sub-folder as output.
#!/usr/bin/env python
from __future__ import print_function
import argparse
from lxml import etree, html
################################################################
# main program
################################################################
parser = argparse.ArgumentParser(
description="combine multiple hOCR documents into one")
parser.add_argument(
"filenames", help="hOCR files", nargs='+')
args = parser.parse_args()
doc = html.parse(args.filenames[0])
pages = doc.xpath("F://Testing//input//1//*[@class='ocr_page']")
container = pages[-1].getparent()
for fname in args.filenames[1:]:
doc2 = html.parse(fname)
pages = doc2.xpath("F://Testing//output//2//*[@class='ocr_page']")
for page in pages:
container.append(page)
print(etree.tostring(doc, pretty_print=True).decode('UTF-8'))
Source : https://github.com/ocropus/hocr-tools/blob/master/hocr-combine
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
