'Trouble with beautiful soup parsing all items
I'm trying to scrape the drink menu from the Purple Pig Restaurant. Here is the link
I got this working for the names, descriptions and extra info right off the bat with this code:
URL = "https://thepurplepigchicago.com/drink"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
menu = soup.find_all("div", class_="menu-item")
def clean_name(name):
if name == None:
return ''
elif '|' in name.text:
name = name.text.split('|')[1]
else:
name = name.text
return str(name.strip())
def clean_extra(extra):
if extra == None:
return ''
elif '/' in extra.text:
return [x.strip() for x in str(extra.text).split('/')]
else:
return str(extra.text.replace('\n','').strip())
def clean_description(x):
if x == None:
return ''
else:
return str(x.text.strip())
all_info = []
for section in menu:
name = clean_name(section.find("div", class_="menu-item-title"))
description = clean_description(section.find("div", class_="menu-item-description"))
extra = clean_extra(section.find("div", class_="menu-item-price-bottom"))
if len(name) > 1:
all_info.append((name,description,extra))
However, I wanted to go one level higher and add the category to each item so changed to:
menu = soup.find_all("div", class_="menu-section")
and added the following:
all_info = []
for section in menu:
if clean_category(section.find("div", class_="menu-item-title")):
category = clean_category(section.find("div", class_="menu-item-title"))
name = clean_name(section.find("div", class_="menu-item-title"))
description = clean_description(section.find("div", class_="menu-item-description"))
extra = clean_extra(section.find("div", class_="menu-item-price-bottom"))
all_info.append(str(category)+'|'+name+'|'+description+'|'+extra+'\n')
The output is this:
2019 | PINOT NOIR|PINOT NOIR|Domaine de la Ferté|Givry Premier Cru ‘Clos de la Servoisine’, Burgundy, France
NELSON'S "GREEN BRIER" TENNESSEE WHISKEY|NELSON'S "GREEN BRIER" TENNESSEE WHISKEY|Corn, Wheat and Malted Barley|$14
7525 | CHAMPAGNE (CHARDONNAY + PINOT NOIR)|CHAMPAGNE (CHARDONNAY + PINOT NOIR)|Drappier, Carte d'Or Brut, Champagne France | NV|$22 / $53 / $101
4009 | SYRAH + CINSAULT|SYRAH + CINSAULT|Domaine Rimbert “ Le Petit Cochon Bronze” | Languedoc-Roussillon, France | 2020|$14
So it doesn't appear to be grabbing the categories. As a simple test I wrote this:
for section in menu:
print(section.find("div", class_="menu-section-title").text)
print(clean_name(section.find("div", class_="menu-item-title")))
Which oddly does get the category, but then just lists the first item in the sub-category. I believe there is a nesting issue I am not seeing and after 5 hours of trying to get this to work I'm turning here. The output of the last code block is:
CORAVIN WINE FEATURE
PINOT NOIR
WHISKEY OF THE MONTH
NELSON'S "GREEN BRIER" TENNESSEE WHISKEY
SPARKLING BY THE GLASS
CHAMPAGNE (CHARDONNAY + PINOT NOIR)
ROSÉ BY THE GLASS
SYRAH + CINSAULT
...
I also tried looping through the menu variable twice:
for section in menu:
...
for item in section:
...
with no success. I'm sure I'm missing something obvious, but I just can't figure it out.
Below is an example section that I am trying to build up to create the following:
SPARKLING BY THE GLASS|CHAMPAGNE (CHARDONNAY + PINOT NOIR)|Drappier, Carte d'Or Brut, Champagne France | NV|$22 $53 $101
SPARKLING BY THE GLASS | LAMBRUSCO GRASPAROSSA|Carra di Casatico, La Luna Secco, Emilia-Roma
.
[<div class="menu-section">
<div class="menu-section-header">
<div class="menu-section-title">SPARKLING BY THE GLASS</div>
</div>
<div class="menu-items">
<div class="menu-item">
<span class="menu-item-price-top">
<span class="currency-sign">$</span>22
/
<span class="currency-sign">$</span>53
/
<span class="currency-sign">$</span>101
</span>
<div class="menu-item-title">7525 | CHAMPAGNE (CHARDONNAY + PINOT NOIR)</div>
<div class="menu-item-description">Drappier, Carte d'Or Brut, Champagne France | NV</div>
<div class="menu-item-price-bottom">
<span class="currency-sign">$</span>22
/
<span class="currency-sign">$</span>53
/
<span class="currency-sign">$</span>101
</div>
</div>
<div class="menu-item">
<span class="menu-item-price-top">
<span class="currency-sign">$</span>13
/
<span class="currency-sign">$</span>32
/
<span class="currency-sign">$</span>61
</span>
<div class="menu-item-title">7523 | LAMBRUSCO GRASPAROSSA</div>
<div class="menu-item-description">Carra di Casatico, La Luna Secco, Emilia-Romagna | NV</div>
<div class="menu-item-price-bottom">
<span class="currency-sign">$</span>13
/
<span class="currency-sign">$</span>32
/
<span class="currency-sign">$</span>61
</div>
</div>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
