'How do I get the xml:id of an element using ElementTree in python
I'm sorry, if that is a really basic questions, but I'm sitting in front of that problem for hours already and just can't make it work.
I'm working with the British National Corpus (which files are in XML-format) and I want to extract the attributes of different persons in those files. The part I'm working with is structured like this:
<bncDoc>
<teiHeader>
<profileDesc>
<particDesc n="C196">
<person ageGroup="X" xml:id="PS21Y" role="unspecified" sex="f" soc="UU" dialect="NONE" firstLang="EN-GBR" educ="X">
<persName>j. hammond</persName>
<occupation>interviewer</occupation>
</person>
<person ageGroup="X" xml:id="PS220" role="unspecified" sex="m" soc="UU" dialect="XIS" firstLang="EN-GBR" educ="X">
<persName>Bhagan</persName>
</person>
</particDesc>
</profileDesc>
</teiHeader>
</bncDoc>
I'm trying to extract "id", "sex", "soc", and "ageGroup" of the "person" elements. But I just don't know how it works with those "xml:id"'s. The way I'm trying to do it (like shown below), doesn't work. It works for "sex", "soc", and "ageGroup", but not for "xml:id". Does anyone know, how to make it work? That would help me a lot! :)
for i in root.findall('teiHeader/profileDesc/particDesc/person'):
tmp = []
tmp.append(i.get('id'))
tmp.append(i.get('sex'))
tmp.append(i.get('soc'))
tmp.append(i.get('ageGroup'))
Solution 1:[1]
It works if you use
i.get('{http://www.w3.org/XML/1998/namespace}id')
This looks a bit ugly, but it has to do with the fact that xml: is a special namespace prefix that is bound to the http://www.w3.org/XML/1998/namespace URI. See https://www.w3.org/XML/1998/namespace.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mzjn |
