'Scraping a random Wikipedia article works for about 1000 iterations with Beautiful Soup until I get an attribute error
Code I have used in a Jupyter Notebook:
import requests
from bs4 import BeautifulSoup
import requests
corpus = ""
for x in range(10000):
URL = "https://en.wikipedia.org/wiki/Special:Random"
page = requests.get(URL)
html = page.text
soup = BeautifulSoup(html)
text = soup.p.text
text = text.replace('[1]', '')
text = text.replace('[2]', '')
text = text.replace('[3]', '')
text = text.replace('[4]', '')
text = text.replace('[5]', '')
text = text.replace('[6]', '')
text = text.replace('[7]', '')
text = text.replace('[8]', '')
text = text.replace('[9]', '')
text = text.strip()
corpus += text
print(x)
with open('Wikipedia Corpus.txt', 'w') as f:
f.write(corpus)
Error I get:
AttributeError Traceback (most recent call last)
/tmp/ipykernel_8985/763917129.py in <module>
11
12 soup = BeautifulSoup(html)
---> 13 text = soup.p.text
14
15 text = text.replace('[1]', '')
AttributeError: 'NoneType' object has no attribute 'text'
Could this error have been caused by a temporary internet disconnection? I do not know why this code stops working after about 1000 iterations.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
