'XSLT 1.0 (xsltproc) - Unable to Parse Huge XML
I am trying to parse an input xml file that is 13,00,000 lines long with a size of 56 MB, using xsltproc. I get the below error:
input.xml:245393: parser error : internal error: Huge input lookup
"description" : "List of values for possible department codes"
^
unable to parse input.xml
My xsltproc was able to process an xml file that was 9,30,000 lines long with a size of 48 MB.
In fact, I tried decreasing the xml lines to 600,000 by removing the unnecessary parts. Still, same error, which is strange, because it is able to parse 900,000 but not 600,000.
How do I resolve this issue?
Solution 1:[1]
Using Oxygen XML Editor (Xalan) resolved my issue.
Solution 2:[2]
Write your own xsltproc in Python based on this snippet:
import argparse
from lxml import etree
parser = argparse.ArgumentParser()
parser.add_argument('stylesheet', help='XSLT style sheet', type=argparse.FileType('r', encoding='utf-8'))
parser.add_argument('input', help='XML input file(s)', nargs='*', type=argparse.FileType('r', encoding='utf-8'))
parser.add_argument('--output', help='The output file to create.', type=argparse.FileType('wb'))
args = parser.parse_args()
transform = etree.XSLT(etree.parse(args.stylesheet))
xml_parser = etree.XMLParser(huge_tree=True)
for xml in args.input:
transform(etree.parse(xml, xml_parser)).write_output(args.output)
This uses lxml as suggested in this answer.
The huge_tree=True argument sets the corresponding parser option in libxml2 and thus enables it to process large files. See Parser options for more information.
Solution 3:[3]
libxslt 1.1.35 added a --huge option to xsltproc which disables some internal limits like XML_MAX_LOOKUP_LIMIT.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | AutoTester999 |
| Solution 2 | Adrian W |
| Solution 3 | nwellnhof |
