'Problem parsing xml when retrieving from a URL
I am doing an assignment from a Coursera Python course. The goal is to sum up the counts for each username and get a final tally.
XML: http://py4e-data.dr-chuck.net/comments_42.xml
If I copy and paste that XML and parse it with the following program, it works just fine.
import xml.etree.ElementTree as ET
input = (XML string goes here)
ct = 0
stuff = ET.fromstring(input)
lst = stuff.findall('comments/comment')
for item in lst:
print('Name', item.find('name').text)
print('Count', item.find('count').text)
ct = ct + int(item.find('count').text)
print(ct)
The problem is when I try to get it directly from the URL. In that case I have tried two approaches:
import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET
uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')
data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
lst = commentinfo.findall('comments/comment')
for item in lst:
print('Count', item.find('count').text)
This leads to the following error:
Traceback (most recent call last):
File "C:\Users\patri\Desktop\PY4E\Materials\code3\urllib1.py", line 10, in <module>
lst = commentinfo.findall('comments/comment')
NameError: name 'commentinfo' is not defined
Second approach is one that is suggested by the assignment, using the following way of accessing the counts:
counts = tree.findall('.//count')
And so I wrote the following code:
import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET
uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')
data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
counts = tree.findall('.//count')
for item in counts:
print('Count', item.find('count').text)
This apparently leads to a None type and I cannot do anything with that:
Traceback (most recent call last):
File "C:\Users\patri\Desktop\PY4E\Materials\code3\urllib1.py", line 12, in <module>
print('Count', item.find('count').text)
AttributeError: 'NoneType' object has no attribute 'text'
Solution 1:[1]
In first code snippet the error is NameError: name 'commentinfo' is not defined due to the variable commentinfo, which isn't declared:
import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET
uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')
data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
# commentinfo not declared
lst = commentinfo.findall('comments/comment')
for item in lst:
print('Count', item.find('count').text)
Replace it with variable tree to make code work:
import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET
uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')
data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
lst = tree.findall('comments/comment')
for item in lst:
print('Count', item.find('count').text)
In second code snippet expression tree.findall('.//count') already gets a list of count elements. So when in the loop item.find('count') is called, it doesn't find a child named count inside the count element, leading to the error AttributeError: 'NoneType' object has no attribute 'text'. To fix it remove item.find('count') from the loop:
import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET
uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')
data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
counts = tree.findall('.//count')
for item in counts:
print('Count', item.text)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Alexandra Dudkina |
