'Finding an xml element given two children with different attributes
I would like to check an OpenStreetMap xml document (root) for the number of children (way) that have 2 grandchildren (nd) with different attribute values (ref). Here is what the OSM xml document looks like:
<osm version="0.6" generator="osmium/1.13.2">
...
<way id="654822858">
<nd ref="3311110418"/>
<nd ref="6340618164"/>
<nd ref="6135961734"/>
<nd ref="8197878242"/>
<tag k="highway" v="residential"/>
<tag k="name" v="Avenida Décima Cerrada Las Torres"/>
</way>
<way id="654822862">
<nd ref="6135961736"/>
<nd ref="6135961745"/>
<nd ref="6340618150"/>
<nd ref="8197878242"/>
<tag k="highway" v="residential"/>
</way>
...
</osm>
I successfully used ElementTree with the following code to check the number of ways with and
startnode = "6135961736"
endnode = "6340618150"
len(root.findall("./way/nd/[@ref ='"+ startnode +"'].." and "./way/nd/[@ref ='"+ endnode +"'].."))
The issue with this is that it takes a very long time. I extrapolated on the number of ways (~397000) it needs to check and it would take 9 days. I would like some help in finding a faster method for it.
Thank you
Solution 1:[1]
Could you test and time the 2 other methods you'll find in the code below with the full data set?
from io import StringIO
from lxml import etree
import timeit
f = StringIO('''\
<osm version="0.6" generator="osmium/1.13.2">
...
<way id="654822858">
<nd ref="3311110418"/>
<nd ref="6340618164"/>
<nd ref="6135961734"/>
<nd ref="8197878242"/>
<tag k="highway" v="residential"/>
<tag k="name" v="Avenida Décima Cerrada Las Torres"/>
</way>
<way id="654822862">
<nd ref="6135961736"/>
<nd ref="6135961745"/>
<nd ref="6340618150"/>
<nd ref="8197878242"/>
<tag k="highway" v="residential"/>
</way>
...
</osm>
''')
tree = etree.parse(f)
startnode = "6135961736"
endnode = "6340618150"
print('Your method:')
%time len(tree.findall("./way/nd/[@ref ='"+ startnode +"'].." and "./way/nd/[@ref ='"+ endnode +"'].."))
print('\n')
print('XPATH method:')
%time len(tree.xpath('./way[ ./nd[contains(@ref, "'+ startnode +'")] and ./nd[contains(@ref, "'+ endnode +'")] ]'))
print('\n')
print('XPATH + f-string method:')
%time len(tree.xpath(f'./way[ ./nd[contains(@ref, "{startnode}")] and ./nd[contains(@ref, "{endnode}")] ]'))
Results:
Your method:
CPU times: user 103 µs, sys: 0 ns, total: 103 µs
Wall time: 111 µs
XPATH method:
CPU times: user 98 µs, sys: 0 ns, total: 98 µs
Wall time: 103 µs
XPATH + f-string method:
CPU times: user 69 µs, sys: 3 µs, total: 72 µs
Wall time: 76.1 µs
1
Another results:
Your method:
CPU times: user 142 µs, sys: 0 ns, total: 142 µs
Wall time: 148 µs
XPATH method:
CPU times: user 110 µs, sys: 0 ns, total: 110 µs
Wall time: 114 µs
XPATH + f-string method:
CPU times: user 62 µs, sys: 0 ns, total: 62 µs
Wall time: 65.8 µs
Solution 2:[2]
After taking a break; i looked over it and realised that i forgot to include the "=" symbol >.<
so j += mazeWidth
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Drakax |
| Solution 2 | Ayu Crystal |
