'Problem parsing xml when retrieving from a URL

I am doing an assignment from a Coursera Python course. The goal is to sum up the counts for each username and get a final tally.

XML: http://py4e-data.dr-chuck.net/comments_42.xml

If I copy and paste that XML and parse it with the following program, it works just fine.

import xml.etree.ElementTree as ET

input = (XML string goes here)
ct = 0
stuff = ET.fromstring(input)
lst = stuff.findall('comments/comment')
for item in lst:
    print('Name', item.find('name').text)
    print('Count', item.find('count').text)
    ct = ct + int(item.find('count').text)
print(ct)

The problem is when I try to get it directly from the URL. In that case I have tried two approaches:

import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET

uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')


data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
lst = commentinfo.findall('comments/comment')   
for item in lst:
    print('Count', item.find('count').text)

This leads to the following error:

Traceback (most recent call last):
  File "C:\Users\patri\Desktop\PY4E\Materials\code3\urllib1.py", line 10, in <module>
    lst = commentinfo.findall('comments/comment')
NameError: name 'commentinfo' is not defined

Second approach is one that is suggested by the assignment, using the following way of accessing the counts:

counts = tree.findall('.//count')

And so I wrote the following code:

import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET

uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')


data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
counts = tree.findall('.//count')
for item in counts:
    print('Count', item.find('count').text)

This apparently leads to a None type and I cannot do anything with that:

Traceback (most recent call last):
  File "C:\Users\patri\Desktop\PY4E\Materials\code3\urllib1.py", line 12, in <module>
    print('Count', item.find('count').text)
AttributeError: 'NoneType' object has no attribute 'text'

python xml

Solution 1:^[1]

In first code snippet the error is NameError: name 'commentinfo' is not defined due to the variable commentinfo, which isn't declared:

import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET

uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')


data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
# commentinfo not declared
lst = commentinfo.findall('comments/comment')   
for item in lst:
    print('Count', item.find('count').text)

Replace it with variable tree to make code work:

import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET

uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')


data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
lst = tree.findall('comments/comment')   
for item in lst:
    print('Count', item.find('count').text)

In second code snippet expression tree.findall('.//count') already gets a list of count elements. So when in the loop item.find('count') is called, it doesn't find a child named count inside the count element, leading to the error AttributeError: 'NoneType' object has no attribute 'text'. To fix it remove item.find('count') from the loop:

import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET

uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')


data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
counts = tree.findall('.//count')
for item in counts:
    print('Count', item.text)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Alexandra Dudkina

'Problem parsing xml when retrieving from a URL

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]