'(Python) Fill the column by webscraping the data on the website. Getting an error: UnicodeError: label empty or too long
I have a dataset that looks like this:
| ID | Link |
|---|---|
| 1 | 'https://wwwexamplecom/hello/details-5565558html' |
| 2 | 'https://wwwexamplecom/hello/details-5489292html' |
| 3 | 'https://wwwexamplecom/hello/details-5538258html' |
| 4 | 'https://wwwexamplecom/hello/details-5523020html' |
| 5 | 'https://wwwexamplecom/hello/details-5543794html' |
These links lead to the same website, but different pages of it. It is real estate marketplace website and these links lead to each property page, where there is a description for each property. What I need to do is to extract the name of the property from these pages, so that in the end it looks like this:
| ID | Link | Name |
|---|---|---|
| 1 | 'https://wwwexamplecom/hello/details-5565558html' | The One Townhouses |
| 2 | 'https://wwwexamplecom/hello/details-5489292html' | Twin Villas |
| 3 | 'https://wwwexamplecom/hello/details-5538258html' | City Park |
| 4 | 'https://wwwexamplecom/hello/details-5523020html' | The Sky |
| 5 | 'https://wwwexamplecom/hello/details-5543794html' | La Mer |
For this, I tried to webscrape these pages in the following way:
links=['https://wwwexamplecom/hello/details-5565558html', 'https://wwwexamplecom/hello/details-5489292html', 'https://wwwexamplecom/hello/details-5538258html', 'https://wwwexamplecom/hello/details-5523020html', 'https://wwwexamplecom/hello/details-5543794html']
data=[]
for link in links:
html_text=requests.get(link).content
soup=BeautifulSoup(html_text,'lxml')
project=soup.find_all('a',class_='_146bd1c5')
data.append({
'link':link,
'project':project
})
But, got an error: UnicodeError: label empty or too long
How to solve this issue? Or maybe you can recommend other ways to fill 'Name' column, but not web scraping
Thank you!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
