'Parsing XML file with duplicate tags
I currently use an XML parser to extract the name of a route from a GPX (XML) file.
Each GPX files contains a single "name" tag which is what I've been extracting.
Here's the script:
#! /bin/bash
gpxpath=/mnt/gpxfiles; export gpxpath
for file in $gpxpath/*
do
filename=`ls $file`; export filenanme
gpxname=`$scripts/xmlparse.pl "$file"`
echo $filename " "$gpxname >> gpxparse.tmp
done
sort -k 2,2 gpxparse.tmp > gpxparse.out
cat gpxparse.out
And here's xmlparse.pl:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
XML::Twig->new(
twig_handlers => {
'name' => sub { print $_ ->text }
}
)->parse( <> );
Here's an example GPX file:
<?xml version="1.0" encoding="UTF-8"?>
<gpx version="1.1" creator="creator" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<metadata>
<referrer>Referrer</referrer>
<time>2019-06-17T06:02:23.000Z</time>
</metadata>
<trk>
<name>Another GPX file</name>
<trkseg>
<trkpt lon="-1.91990" lat="53.00131">
<ele>112.1</ele>
<time>2019-06-17T06:02:23.000Z</time>
</trkpt>
<trkpt lon="-1.91966" lat="53.00126">
<ele>113.6</ele>
<time>2019-06-17T06:02:25.000Z</time>
</trkpt>
<trkpt lon="-1.91962" lat="53.00125">
<ele>114.1</ele>
<time>2019-06-17T06:02:25.000Z</time>
</trkpt>
<trkpt lon="-1.91945" lat="53.00120">
<ele>115.5</ele>
<time>2019-06-17T06:02:26.000Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
I can successfully extract the name of the route using the scripts above However, I'd additionally like to extract the first co-ordinate pair in each file.
Atrack can defined by a "trk" element and within a track can be multiple segments or "trkseg". Finally, within a trkseg are multiple "trkpt" (track points).
A track point usually consists of a latitdue and longitude co-ordinate pair along with elevation and timestamp information.
I'm only looking to extract the first lat and lon within the first trkpt of the GPX file. Ideally, once the script has found the first co-ordinate pair it should exit and move onto the next file.
I've tried crafting an additional perl script
I've added an additional perl parse script using XML::Twig but it seems to stumble when there are multiple elements with duplicate names.
Solution 1:[1]
Since you were originally going for a Perl solution,
perl -MXML::LibXML -e'
my $doc = XML::LibXML->load_xml( location => $ARGV[0] );
my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs( gpx => "http://www.topografix.com/GPX/1/1" );
CORE::say
join ",",
$xpc->findnodes(q{/gpx:gpx/gpx:trk/gpx:name}, $doc),
$xpc->findnodes(q{/gpx:gpx/gpx:trk/gpx:trkseg/gpx:trkpt[1]/@lat}, $doc),
$xpc->findnodes(q{/gpx:gpx/gpx:trk/gpx:trkseg/gpx:trkpt[1]/@long}, $doc);
' "$file"
(I used XML::LibXML instead of XML::Twig because I'm more familiar with that.)
Unlike the solution in the earlier answer,
- This solution doesn't make fragile assumptions about what the default namespace might be.
- This solution doesn't make fragile assumptions about where
nameelements might or might not appear.
Solution 2:[2]
This is very easy for xidel:
xidel -s input.xml -e 'join((//name,//trkpt[1]/@*),",")'
Another GPX file,-1.91990,53.00131
Ideally, once the script has found the first co-ordinate pair it should exit and move onto the next file.
xidel, together with the integrated EXPath File Module, can do this very efficiently:
xidel -se 'file:list("/mnt/gpxfiles")' # lists all files in '/mnt/gpxfiles' (and subdirs!)
xidel -se 'file:list("/mnt/gpxfiles",false(),"*.xml")' # lists all xml-files in '/mnt/gpxfiles'
xidel -se '
for $x in file:list("/mnt/gpxfiles") return
doc("/mnt/gpxfiles/"||$x)/join((//name,//trkpt[1]/@*),",")
' # iterate over and parse all xml-files in '/mnt/gpxfiles' AND extract the info you need.
Solution 3:[3]
I see some more elegant methods in other answers, but I'd probably use a brute force method:
grep name {file} | head -1
grep "trkpt lon" {file} | head -1
and then use perl or sed to edit the result to the parts wanted.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Reino |
| Solution 3 | WGroleau |
