'xml parsing to retrieve specific tags

I have an xml annotation file where there are < action > tags, I want to find the tag < origin > for each action and read its value (check if it is Blur or not) and for each action I also want to return the < start_time > and < stop_time >. How can I do this? Is there a toolkit? Do I need to read each and every < tag > and find all of its children?

<action>
    <temporal_region>
    <start_time>2683480</start_time>
    <stop_time>2684448</stop_time>
    </temporal_region>
    <action_type/>
    <state>1</state>
    <actuator>Incident</actuator>
    <description/><verb/><affected_list/><instrument_list/><recipient/>
    <origin>Blur</origin>
    <destination/>
    </action>

Edit:

The suggestions, slightly extended to have multiple actions:

from bs4 import BeautifulSoup as bs

xml = """
<action>
<temporal_region>
<start_time>2683480</start_time>
<stop_time>2684448</stop_time>
</temporal_region>
<action_type/>
<state>1</state>
<actuator>Incident</actuator>
<description/><verb/><affected_list/><instrument_list/><recipient/>
<origin>Blur</origin>
<destination/>
</action>
<action>
<temporal_region>
<start_time>2683480</start_time>
<stop_time>2684448</stop_time>
</temporal_region>
<action_type/>
<state>1</state>
<actuator>Incident</actuator>
<description/><verb/><affected_list/><instrument_list/><recipient/>
<origin>Blur</origin>
<destination/>
</action>"""

soup = bs(xml, 'html.parser')
origin = soup.find('origin').text
print(len(origin))
start_time = soup.find('start_time').text
stop_time = soup.find('stop_time').text

if origin == 'Blur':
    print("success")

Returns 4, which I suppose is the opening and closing tags of origin while I have only 2 elements.



Solution 1:[1]

Another solution.

from simplified_scrapy.simplified_doc import SimplifiedDoc

xml = """
<action>
<temporal_region>
<start_time>2683480</start_time>
<stop_time>2684448</stop_time>
</temporal_region>
<action_type/>
<state>1</state>
<actuator>Incident</actuator>
<description/><verb/><affected_list/><instrument_list/><recipient/>
<origin>Blur</origin>
<destination/>
</action>
<action>
<temporal_region>
<start_time>2683480</start_time>
<stop_time>2684448</stop_time>
</temporal_region>
<action_type/>
<state>1</state>
<actuator>Incident</actuator>
<description/><verb/><affected_list/><instrument_list/><recipient/>
<origin>Blur</origin>
<destination/>
</action>"""

doc = SimplifiedDoc(xml)
actions = doc.selects('action')
for action in actions:
  print (action.start_time)
  print (action.stop_time)
  print (action.origin)

Here's an example of SimplifiedDoc: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

Solution 2:[2]

You can use BeautifulSoup for that.

from bs4 import BeautifulSoup as bs

xml = """
<action>
<temporal_region>
<start_time>2683480</start_time>
<stop_time>2684448</stop_time>
</temporal_region>
<action_type/>
<state>1</state>
<actuator>Incident</actuator>
<description/><verb/><affected_list/><instrument_list/><recipient/>
<origin>Blur</origin>
<destination/>
</action>"""

soup = bs(xml, 'html.parser')
origin = soup.find('origin').text
start_time = soup.find('start_time').text
stop_time = soup.find('stop_time').text

if origin == 'Blur':
    print("success")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 dabingsou
Solution 2