'count number of xml element from linux shell

My xml looks something like this :

<elements>
<elem>
....bunch of other elements
</elem>
</elements>

Is there a way to count the number of occurances of elem tag in some xml file trough linux shell? like with perl/python or anything that might work as one liner?

I might try something like grep -c "elem" myfile.xml and the number I get divide it by 2 and get the number, is there something similar but one liner?

EDIT :

I'm looking for alternative grep solution



Solution 1:[1]

The xml_grep tool does what you want - try the following:

xml_grep --count //elem example.xml

That utility is in the xml-twig-tools package on Debian / Ubuntu, and the documentation is here.

Solution 2:[2]

You can also use xmllint:

xmllint --xpath "count(//elem)" myfile.xml

Solution 3:[3]

DO NOT USE REGULAR EXPRESSIONS TO PARSE OR SCAN XML FILES

The mandatory disclaimer being fired, here's my solution:

xmllint --nocdata --format myfile.xml | grep -c '</elem>'

xmllint is part of libxml which is fairly common on many linux distros. This solution passes the following regex/XML traps:

  • spurious spaces (--format)
  • several closing tags on single line (--format)
  • CDATA sections (--nocdata)

However, you will be caught by nasty namespace declaration and defaults.

Solution 4:[4]

London,

Try fgrep -c '</elem>' $filename

fgrep is a standard unix utility, not at all sure about linux though. The -c switch means count.

Cheers. Keith.

PS: It's allmost allways easier to count CLOSING tags, coz they don't have attributes ;-)

Solution 5:[5]

grep alone won't help in all cases, but this is an easy case for XMLStarlet. You can match elem with XMLStarlet and then count the new lines with wc -l. The new lines minus 1 is the number of elements.

Example YOURFILE.xml:

<elements>
<elem>....bunch of other elements</elem><elem>....bunch of other elements</elem>
<elem>
....bunch of other elements
....bunch of other elements
</elem>
</elements>

Use XMLStarlet and wc-l:

echo $(($(xmlstarlet sel -t -m //elem -n YOURFILE.xml | wc -l)-1))

Output: 3

Solution 6:[6]

Here's a refinement to @bluenote10's xmllint answer that also works for arbitrary namespace prefixes :

xmllint --xpath "count(//*[local-name()='elem'])" myfile.xml

(Already tried to add this as a response to @Ryan_Pelletier's question below the original answer, but kept running into formatting issues so created a separate answer instead).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mark Longair
Solution 2 bluenote10
Solution 3 Robert Bossy
Solution 4 corlettk
Solution 5
Solution 6