'Is there a way to separate strings in HTML?
I'm trying to get the address of some companies from WSJ.com. However, I couldn't figure out a reliable way to separate the city and the state/province from the HTML page.
here's my code and output
code = "TURN"
url = "https://www.wsj.com/market-data/quotes/{}".format(code)
headers = {'User-Agent':str(ua.random)}
page = requests.get(url, headers = headers)
page.encoding = page.apparent_encoding
pageText = page.text
soup = BeautifulSoup(pageText, 'html.parser')
address = soup.find('div', {"class" : "WSJTheme--contact--bDuH_KYx"}).contents[0]
print(address.contents[2])
Output: <span class="">Montclair New Jersey 07042</span>
I want to get a result like [Montclair, New Jersey]. However, I cant simply separate the string by space since there are inputs like "San Diego California 92130" or "Beijing Beijing 100022" which requires different rules to separate them.
They are separated strings in the original HTML code, I'm not sure if this helps.
<span class="">
"Montclair"
"New Jersey"
"07042"
</span>
Solution 1:[1]
I would suggest grabbing the zip code and then using a library like: https://pypi.org/project/zipcodes/
Solution 2:[2]
If html really looks like you portrayed it, you can simply split at quotes.
a = address.contents[2].text
b = a.split('"', 4)
city = b[1]
state = b[3]
print(f"{city}, {state}")
output: Montclair, New Jersey
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | LhasaDad |
| Solution 2 |
