'Extract business hours from Google using only beautiful soup

Goal

Extract the business hours and its closed status from the Google Search results.

Screenshot with the highlighted working hours and closed status (example URL):

google knowledge panel restaurant working hours

Screenshot with the highlighted working in the popup (example URL):

working hours when clicking on a row in the knowledge panel

Problem

soup.find() with the specific selector returns None.

Description

I am trying to create a voice-activated AI similar to Google Home or Alexa that I can pair up with something cool. Currently, I'm trying to use data from the Google knowledge panel for specific search queries.

Code

def service(self, business):
    url = requests.get("https://www.google.com/search?q={}+hours".format(business))

    outputs = []

    if url.status_code == 200:
        soup = bs4.BeautifulSoup(url.text, "lxml")

        # span class below is the class that contains the text that contains the hours shown for that day or just displays closed
        string = soup.find("span", attrs={"class": "TLou0b JjSWRd"})
        
        print(string)
        # returns None

    if url.status_code == 404:
        print("Error")
        return "Error 404"

How to extract the working hours and the closed status of the business?

PS. I'm on a Raspberry Pi 4. I don't want to use Selenium and its drivers. But I'm open to suggestions.



Solution 1:[1]

Selector for the business hours: [data-attrid='kc:/location/location:hours'] table tr.

.TLou0b.JjSWRd is a selector for the Google Answer Box.

google answer box

From what I understand, you're looking for the business hours from the Google Knowledge Panel.

google knowledge panel

Code to extract business hours:

hours_wrapper_node = soup.select_one("[data-attrid='kc:/location/location:hours']")

if hours_wrapper_node is None:
    logger.info("Business hours node is not found")
    return

business_hours = {"open_closed_state": "", "hours": []}

business_hours["open_closed_state"] = hours_wrapper_node.select_one(
    ".JjSWRd span span span"
).text.strip()

location_hours_rows_nodes = hours_wrapper_node.select("table tr")
for location_hours_rows_node in location_hours_rows_nodes:
    [day_of_week, hours] = [
        td.text.strip() for td in location_hours_rows_node.select("td")
    ]

    business_hours["hours"].append(
        {"day_of_week": day_of_week, "business_hours": hours}
    )

Output:

{
    "hours": [
        {"business_hours": "5:30–10PM", "day_of_week": "Wednesday"},
        {"business_hours": "5:30–10PM", "day_of_week": "Thursday"},
        {"business_hours": "5:30–11PM", "day_of_week": "Friday"},
        {"business_hours": "5:30–11PM", "day_of_week": "Saturday"},
        {"business_hours": "5:30–10PM", "day_of_week": "Sunday"},
        {"business_hours": "Closed", "day_of_week": "Monday"},
        {"business_hours": "5:30–10PM", "day_of_week": "Tuesday"},
    ],
    "open_closed_state": "Closed",
}

Demo in online IDE.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Illia Zub