'xQuery - How to query based on number of elements in XML document?
I'm still new to xQuery / MarkLogic and I'm having trouble understanding how to query based on the number of elements in the XML document. For example, imagine I have a database of XML documents roughly similar to the following:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
As you can see in book[2], price is missing. Most documents in the database I'm working with would either have the child element price for each book or no price element attached to any of the book elements. My goal is to find only the documents where some of the child elements are missing (like the above XML); and ignore the documents where either all the child elements exist or where none of the child elements exist. So in my head the logic is something along the lines of "return results where the number of price elements is < the number of book elements AND > 0."
The best I can do so far is the following query:
let $some-docs := cts:search(fn:collection('/my/collection'),
cts:and-query((
cts:element-query(xs:QName("book"), cts:true-query()),
cts:not-query(cts:element-query(xs:QName("price"), cts:true-query()))
)))
return (xdmp:node-uri($some-docs))
But this obviously only returns documents where book elements exist and no price elements exist. I need a way of indicating I want the documents where the price element exists, but is missing for some books.
I prefer a solution that is using the cts:search function, but any help is appreciated
Solution 1:[1]
I need a way of indicating I want the documents where the price element exists, but is missing for some books.
So basically you need to find documents that have both <bookstore><book><price/></book></bookstore> and ones missing the child <price/> element?
The simplest thing to do is modify the existing documents using a tool like CORB to include an element indicating that document matches your criteria or perhaps place them in a distinct collection. Then just use CTS to return documents with that added indicator.
If you don't want to touch the dataset you could create a field range index on /bookstore/book/price and /bookstore/book[not(./price)]/title. Then you just need to query for documents where both indexes are present with something like:
cts:and-query((
cts:field-word-query("field1", "*", ("wildcarded")),
cts:field-word-query("field2", "*", ("wildcarded"))
))
Solution 2:[2]
Getting the count of elements within a document isn't something that is exposed and available for a query. You could apply a predicate filter and test if there are any book that do not have a price for the docs returned from the search for those bookstore docs:
cts:search(fn:collection('/my/collection'),
cts:element-query(xs:QName("book"), cts:true-query())
)[bookstore/book[not(price)]]
Solution 3:[3]
return results where the number of price elements is < the number of book elements AND > 0
You could write not(count(//price) = (count(//book), 0))
or perhaps
empty(//price) or empty(//book[not(price)]
It seems a very strange query though. Perhaps you should be using a schema for validation?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rob S. |
| Solution 2 | Mads Hansen |
| Solution 3 | Michael Kay |
