'(Python) Fill the column by webscraping the data on the website. Getting an error: UnicodeError: label empty or too long

I have a dataset that looks like this:

ID	Link
1	'https://wwwexamplecom/hello/details-5565558html'
2	'https://wwwexamplecom/hello/details-5489292html'
3	'https://wwwexamplecom/hello/details-5538258html'
4	'https://wwwexamplecom/hello/details-5523020html'
5	'https://wwwexamplecom/hello/details-5543794html'

These links lead to the same website, but different pages of it. It is real estate marketplace website and these links lead to each property page, where there is a description for each property. What I need to do is to extract the name of the property from these pages, so that in the end it looks like this:

ID	Link	Name
1	'https://wwwexamplecom/hello/details-5565558html'	The One Townhouses
2	'https://wwwexamplecom/hello/details-5489292html'	Twin Villas
3	'https://wwwexamplecom/hello/details-5538258html'	City Park
4	'https://wwwexamplecom/hello/details-5523020html'	The Sky
5	'https://wwwexamplecom/hello/details-5543794html'	La Mer

For this, I tried to webscrape these pages in the following way:

links=['https://wwwexamplecom/hello/details-5565558html', 'https://wwwexamplecom/hello/details-5489292html', 'https://wwwexamplecom/hello/details-5538258html', 'https://wwwexamplecom/hello/details-5523020html', 'https://wwwexamplecom/hello/details-5543794html']

data=[]

for link in links:
  html_text=requests.get(link).content
  soup=BeautifulSoup(html_text,'lxml')
  project=soup.find_all('a',class_='_146bd1c5')
  data.append({
      'link':link,
      'project':project
  })

But, got an error: UnicodeError: label empty or too long

How to solve this issue? Or maybe you can recommend other ways to fill 'Name' column, but not web scraping

Thank you!

python web-scraping

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'(Python) Fill the column by webscraping the data on the website. Getting an error: UnicodeError: label empty or too long

Sources

Related Questions