'python lxml.objectify gives no attribute access to gco:CharacterString node
I have a geometadata file
<gmd:MD_Metadata
xmlns:gmd="http://www.isotc211.org/2005/gmd"
xmlns:gco="http://www.isotc211.org/2005/gco" >
<gmd:fileIdentifier>
<gco:CharacterString>2ce585df-df23-45f6-b8e1-184e64e7e3b5</gco:CharacterString>
</gmd:fileIdentifier>
<gmd:language>
<gmd:LanguageCode codeList="https://www.loc.gov/standards/iso639-2/" codeListValue="ger">ger</gmd:LanguageCode>
</gmd:language>
</gmd:MD_Metadata>
Utilizing lxml.objectivy I can parse it easily
from lxml import objectify
data = objectify.parse('rec.xml').getroot()
in data the 'language' is exposed as
(Pdb) data.language.LanguageCode
'ger'
but not the 'fileIdentifier'
(Pdb) data.fileIdentifier.CharacterString
*** AttributeError: no such child: {http://www.isotc211.org/2005/gmd}CharacterString
Obviously lxml is looking in the wrong namespace for "CharacterString".
But the information is there
(Pdb) data.fileIdentifier['{http://www.isotc211.org/2005/gco}CharacterString']
'2ce585df-df23-45f6-b8e1-184e64e7e3b5'
How can I convince lxml to use the correct namespace? Any help appreciated
Volker
Solution 1:[1]
This is expected and just how lxml.objectify works. Since {http://www.isotc211.org/2005/gco}CharacterString is from another namespace than its parent element ({"http://www.isotc211.org/2005/gmd}fileIdentifier) it cannot be retrieved from the parent with "simple (dot) attribute lookup".
Instead you have to you use index access (just like you already did) or getattr:
>>> from lxml import etree, objectify
>>> data = objectify.fromstring(
... """<gmd:MD_Metadata
... xmlns:gmd="http://www.isotc211.org/2005/gmd"
... xmlns:gco="http://www.isotc211.org/2005/gco" >
... <gmd:fileIdentifier>
... <gco:CharacterString>2ce585df-df23-45f6-b8e1-184e64e7e3b5</gco:CharacterString>
... </gmd:fileIdentifier>
... <gmd:language>
... <gmd:LanguageCode codeList="https://www.loc.gov/standards/iso639-2/" codeListValue="ger">ger</gmd:LanguageCode>
... </gmd:language>
... </gmd:MD_Metadata>
... """)
>>> data.fileIdentifier["{http://www.isotc211.org/2005/gco}CharacterString"]
'2ce585df-df23-45f6-b8e1-184e64e7e3b5'
>>> getattr(data.fileIdentifier, "{http://www.isotc211.org/2005/gco}CharacterString")
'2ce585df-df23-45f6-b8e1-184e64e7e3b5'
>>>
The reason for this is that obviously {http://www.isotc211.org/2005/gco}CharacterString is not a valid Python identifier and it makes sense to restrict unqualified lookup to children from the same namespace. See also "Namespace handling" in the official docs: https://lxml.de/objectify.html#the-lxml-objectify-api
Best, Holger
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Holger Joukl |
