'iterate over xml with xmlstarlet and output parent and child node values
I have this format in multiple XML files:
<bad>
<objdesc>
<desc id="butwba10.1.wc.01" dbi="BUTWBA10.1.1.WC">
<physdesc>adfa;sdfkjad</physdesc>
<related objectid="bb435.1.comdes.02"/>
<related objectid="but614r.1.penc.01"/>
<related objectid="but611.1.wc.01"/>
<related objectid="but612.1.wd.01"/>
<related objectid="bb515.1.comb.12"/>
</desc>
<desc id="butwba10.1.wc.02" dbi="BUTWBA10.1.2.WC">
<physdesc>alkdjfa;sfjsdf</physdesc>
<related objectid="but621r.1.penc.01"/>
<related objectid="bb435.1.comdes.03"/>
</desc>
</objdesc>
</bad>
I want output that looks like this:
butwba10.1.wc.01 dbi="BUTWBA10.1.1.WC" related="bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12"
butwba10.1.wc.02 dbi="BUTWBA10.1.2.WC" related="but621r.1.penc.01, bb435.1.comdes.03"
I have a bash script that uses xmlstarlet to iterate over the xml files in a directory, but it dumps all the "related values" after the last desc id. It needs to associate each desc id with each set of "related" values. And it needs to include the dbi value for each id.
#!/bin/bash
for x in *.xml
do
id=$(xml sel -t -v '//bad/objdesc/desc/@id' "$x")
arr=( $(xml sel -t -v '//bad/objdesc/desc/related/@objectid' "$x") )
cat<<EOF >> new_file
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
done
Solution 1:[1]
#!/bin/bash
for x in *.xml; do
count=$(xml sel -t -v 'count(//bad/objdesc/desc/@id)' "$x")
for ((i=1; i<=count; i++)); do
id=$(xml sel -t -v "//bad/objdesc/desc[$i]/@id" "$x")
arr=( $(xml sel -t -v "//bad/objdesc/desc[$i]/related/@objectid" "$x") )
cat<<EOF
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
done
done
=)
It seems like this is a job for XSLT. But, OK, shell can handle this too...
Can you do the rest for dbi ? It's better to try understanding what involves here than just cut/paste.
Solution 2:[2]
Agree with sputnick that XSLT is the right tool. Nevertheless, a perl answer using an XML token parser. Has the advantage that it only has to process the file once instead of repeatedly invoking xmlstarlet:
#!perl
use strict;
use warnings;
use XML::Parser;
my (@related, @desc); # boo, global variables
sub start {
my ($x, $elem, %attrs) = @_;
if ($elem eq "desc") {
@desc = @attrs{'id', 'dbi'};
@related = ();
}
elsif ($elem eq "related") {
push @related, $attrs{objectid};
}
}
sub end {
my ($x, $elem) = @_;
if ($elem eq "desc") {
printf qq{%s dbi="%s" related="%s"\n}, @desc, join(', ', @related);
}
}
my $parser = XML::Parser->new( Handlers => {Start => \&start, End => \&end} );
$parser->parsefile($ARGV[0]);
in action:
$ perl parse.pl file
butwba10.1.wc.01 dbi="BUTWBA10.1.1.WC" related="bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12"
butwba10.1.wc.02 dbi="BUTWBA10.1.2.WC" related="but621r.1.penc.01, bb435.1.comdes.03"
Solution 3:[3]
$ xml sel -t -m bad/objdesc/desc -v "concat(@id,' dbi=',@dbi,' ')" -m related -v @objectid -i "number(count(./preceding-sibling::related))+1<number(count(./../related))" -o ", " --else -n -b file.xml
butwba10.1.wc.01 dbi=BUTWBA10.1.1.WC bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12
butwba10.1.wc.02 dbi=BUTWBA10.1.2.WC but621r.1.penc.01, bb435.1.comdes.03
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | glenn jackman |
| Solution 3 | focog77269 |
