'Web scraping words and phrases from word hippo
I have a bit of code that I use to extract a list of words from word hippo based on the length of the word in this case 10. In this case I am trying to get all the synonyms (with a length of 10) for "overview" but find that any synonym with spaces in it doesn't come through, for example "Brain wave" doesn't come through even when I specify a letter length of 10.
This piece of code I have puts a list of the words which do not contain spaces. How do I amend it to get synonyms like "brain wave" which have spaces in them?
'''
import requests
from bs4 import BeautifulSoup
import pandas as pd
page = requests.get("https://www.wordhippo.com/what-is/another-word-for/overview.html")
soup = BeautifulSoup(page.content, 'html.parser')
'Change this'
keyword = "overview"
listing = []
length = 10
synonyms = soup.select('.relatedwords')
for i in range(0, 100):
print ('synonyms section ' + str(i + 1))
csvname = str(keyword) + str(i) + ".xlsx"
print ('synonyms section ' + str(i + 1))
DataFrame = pd.DataFrame((synonyms[i].text.strip().split()))
InitList = DataFrame[0].tolist()
for item in InitList:
if len(item) == length:
listing.append(item)
print (listing)
DataFrame1 = pd.DataFrame(listing)
out_path = "FIlePathName" + csvname
DataFrame1.to_excel(out_path, index = False)
'''
I have the correct FilePath.
Thanks a lot in advance.
Solution 1:[1]
I figured out that I needed to change the following line of code:
DataFrame = pd.DataFrame((synonyms[i].text.strip().split()))
to this line of code
DataFrame = pd.DataFrame((synonyms[i].text.split("\n")))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | cordelia |
