'Finding an xml element given two children with different attributes

I would like to check an OpenStreetMap xml document (root) for the number of children (way) that have 2 grandchildren (nd) with different attribute values (ref). Here is what the OSM xml document looks like:

<osm version="0.6" generator="osmium/1.13.2">
...
 <way id="654822858">
    <nd ref="3311110418"/>
    <nd ref="6340618164"/>
    <nd ref="6135961734"/>
    <nd ref="8197878242"/>
    <tag k="highway" v="residential"/>
    <tag k="name" v="Avenida Décima Cerrada Las Torres"/>
  </way>
  <way id="654822862">
    <nd ref="6135961736"/>
    <nd ref="6135961745"/>
    <nd ref="6340618150"/>
    <nd ref="8197878242"/>
    <tag k="highway" v="residential"/>
  </way>
...
</osm>

I successfully used ElementTree with the following code to check the number of ways with and

startnode = "6135961736"
endnode = "6340618150"
len(root.findall("./way/nd/[@ref ='"+ startnode +"'].." and "./way/nd/[@ref ='"+ endnode +"'].."))

The issue with this is that it takes a very long time. I extrapolated on the number of ways (~397000) it needs to check and it would take 9 days. I would like some help in finding a faster method for it.

Thank you



Solution 1:[1]

Could you test and time the 2 other methods you'll find in the code below with the full data set?

from io import StringIO
from lxml import etree
import timeit

f = StringIO('''\
<osm version="0.6" generator="osmium/1.13.2">
...
 <way id="654822858">
    <nd ref="3311110418"/>
    <nd ref="6340618164"/>
    <nd ref="6135961734"/>
    <nd ref="8197878242"/>
    <tag k="highway" v="residential"/>
    <tag k="name" v="Avenida Décima Cerrada Las Torres"/>
  </way>
  <way id="654822862">
    <nd ref="6135961736"/>
    <nd ref="6135961745"/>
    <nd ref="6340618150"/>
    <nd ref="8197878242"/>
    <tag k="highway" v="residential"/>
  </way>
...
</osm>
''')

tree = etree.parse(f)

startnode = "6135961736"
endnode = "6340618150"

print('Your method:')
%time len(tree.findall("./way/nd/[@ref ='"+ startnode +"'].." and "./way/nd/[@ref ='"+ endnode +"'].."))
print('\n')

print('XPATH method:')
%time len(tree.xpath('./way[ ./nd[contains(@ref, "'+ startnode +'")] and ./nd[contains(@ref, "'+ endnode +'")] ]'))
print('\n')

print('XPATH + f-string method:')
%time len(tree.xpath(f'./way[ ./nd[contains(@ref, "{startnode}")] and ./nd[contains(@ref, "{endnode}")] ]'))

Results:

Your method:
CPU times: user 103 µs, sys: 0 ns, total: 103 µs
Wall time: 111 µs


XPATH method:
CPU times: user 98 µs, sys: 0 ns, total: 98 µs
Wall time: 103 µs


XPATH + f-string method:
CPU times: user 69 µs, sys: 3 µs, total: 72 µs
Wall time: 76.1 µs
1

Another results:

Your method:
CPU times: user 142 µs, sys: 0 ns, total: 142 µs
Wall time: 148 µs


XPATH method:
CPU times: user 110 µs, sys: 0 ns, total: 110 µs
Wall time: 114 µs


XPATH + f-string method:
CPU times: user 62 µs, sys: 0 ns, total: 62 µs
Wall time: 65.8 µs

Solution 2:[2]

After taking a break; i looked over it and realised that i forgot to include the "=" symbol >.<

so j += mazeWidth

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Drakax
Solution 2 Ayu Crystal