'(Python) Fill the column by webscraping the data on the website. Getting an error: UnicodeError: label empty or too long

I have a dataset that looks like this:

ID Link
1 'https://wwwexamplecom/hello/details-5565558html'
2 'https://wwwexamplecom/hello/details-5489292html'
3 'https://wwwexamplecom/hello/details-5538258html'
4 'https://wwwexamplecom/hello/details-5523020html'
5 'https://wwwexamplecom/hello/details-5543794html'

These links lead to the same website, but different pages of it. It is real estate marketplace website and these links lead to each property page, where there is a description for each property. What I need to do is to extract the name of the property from these pages, so that in the end it looks like this:

ID Link Name
1 'https://wwwexamplecom/hello/details-5565558html' The One Townhouses
2 'https://wwwexamplecom/hello/details-5489292html' Twin Villas
3 'https://wwwexamplecom/hello/details-5538258html' City Park
4 'https://wwwexamplecom/hello/details-5523020html' The Sky
5 'https://wwwexamplecom/hello/details-5543794html' La Mer

For this, I tried to webscrape these pages in the following way:

links=['https://wwwexamplecom/hello/details-5565558html', 'https://wwwexamplecom/hello/details-5489292html', 'https://wwwexamplecom/hello/details-5538258html', 'https://wwwexamplecom/hello/details-5523020html', 'https://wwwexamplecom/hello/details-5543794html']

data=[]

for link in links:
  html_text=requests.get(link).content
  soup=BeautifulSoup(html_text,'lxml')
  project=soup.find_all('a',class_='_146bd1c5')
  data.append({
      'link':link,
      'project':project
  })
 

But, got an error: UnicodeError: label empty or too long

How to solve this issue? Or maybe you can recommend other ways to fill 'Name' column, but not web scraping

Thank you!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source