'Check/Resolve cross-references in separate xml files

Starting point

Let's say we have a book in xml format. This book consists of many assets and these can reference each other by a tag ref-asset with attribute path. [Path-Mask: {id}|{version} of target-asset].

Important: Assets are single files and there is no merged, complete file.

Exemplary XML (merged for better visual view)

<book>
    <!-- file a.xml -->
    <asset id="1" version="1.0">
        <name>Prolog</name>
    </asset>
    <!-- file b.xml -->
    <asset id="2" version="2">
        <name>Table of content</name>
        <list>
            <item><ref-asset path="1|1.0">Prolog</ref-asset></item>
            <item><ref-asset path="2|2.0">Table of content</ref-asset></item>
            <item><ref-asset path="3|1.1">FooBar</ref-asset></item>
        </list>
    </asset>
    <!-- file c.xml -->
    <asset id="3" version="1.1">
        <name>FooBar</name>
    </asset>
</book>

Request

  • Check all ref-asset if linked target is in book.
  • Create report about results [exists, not exists, asset exists but wrong version, ...]
  • [in addition: Replace the reference with the content of target.]

Settings

  • Saxon 9.6.x EE XSLT 2.0
  • Java
  • 100 up to x thousand single documents (combined filesize: upper 3 digit Mb)

How to solve

First attempt function collection() + function document():

Search for all single asset-files on filesystem via collection(), load them into process via document() and search for matching hits.

Second attempt Merged, complete File:

Merge all single assets into book and match via xsl:key or similiar techniques.


Question(s)

  • Is collection() capable of loading thousands of documents and still perform well with a followed document() to process the asset?
  • How to "index" run-timed loaded documents [still via xsl:key?] to search efficiently?

Further hints are highly appreciated / No specific stylsheet needed [i will do it on my own, as soon as i know what way to go].


EDITs: collection() returns already a sequence of document nodes, so document() might be unnecessary.



Solution 1:[1]

I have written an npm package to resolve references in xml. Hope it serves your purpose https://www.npmjs.com/package/xml-path-resolver. This package would take the xml and return JSON with resolved paths

CODE USAGE

const xmlPathResolver = require("xml-path-resolver");
const xmlString = `
<?xml version="1.0" encoding="utf-8"?>  
<note id="1212"  importance="high" logged="true" x_note="23">
    <title>Happy</title>
     <todo>Work</todo>
     <todo>Play</todo>
</note>
<note id="23" importance="high" logged="true">
</note>
<note importance="high" logged="true">
</note>
<person x_note="1212">
</person>
`;
const resolvedJSON = xmlPathResolver(xmlString,{ crossReference: /x_(.*)/ });

Example :

<?xml version="1.0" encoding="utf-8"?>  
<note id="1212"  importance="high" logged="true" x_note="23">
    <title>Happy</title>
     <todo>Work</todo>
     <todo>Play</todo>
</note>
<note id="23" importance="high" logged="true">
</note>
<note importance="high" logged="true">
</note>
<person x_note="1212">
</person>

The above xml has cross reference paths, The resolved JSON output is

{
  "_declaration": {
    "_attributes": {
      "version": "1.0",
      "encoding": "utf-8"
    }
  },
  "note": [
    {
      "_attributes": {
        "id": "1212",
        "importance": "high",
        "logged": "true",
        "x_note": {
          "_attributes": {
            "id": "23",
            "importance": "high",
            "logged": "true"
          }
        }
      },
      "title": {
        "_text": "Happy"
      },
      "todo": [
        {
          "_text": "Work"
        },
        {
          "_text": "Play"
        }
      ]
    },
    {
      "_attributes": {
        "id": "23",
        "importance": "high",
        "logged": "true"
      }
    },
    {
      "_attributes": {
        "importance": "high",
        "logged": "true"
      }
    }
  ],
  "person": {
    "_attributes": {
      "x_note": {
        "_attributes": {
          "id": "1212",
          "importance": "high",
          "logged": "true",
          "x_note": {
            "_attributes": {
              "id": "23",
              "importance": "high",
              "logged": "true"
            }
          }
        },
        "title": {
          "_text": "Happy"
        },
        "todo": [
          {
            "_text": "Work"
          },
          {
            "_text": "Play"
          }
        ]
      }
    }
  }
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1