'can anyone tell me why (crux-component-title) is used and where it is taken from

this code is good but i do not understand some things

import requests from bs4 import BeautifulSoup

def get_products(url):
    soup = BeautifulSoup(requests.get(url).content, "html.parser")

    out = []
    for title in soup.select(".crux-component-title"):
        out.append(title.get_text(strip=True))

    return out


url = "https://www.consumerreports.org/cro/coffee-makers.htm"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

all_data = []
for category_link in soup.select("h3.crux-product-title a"):
    u = "https://www.consumerreports.org" + category_link["href"]
    print("Getting {}".format(u))
    all_data.extend(get_products(u))

for i, title in enumerate(all_data, 1):
    print("{:<5} {}".format(i, title))

i did not get that crux-component-title is used and where is it came from



Solution 1:[1]

The crux-component-title comes from the page that is obtained in the "loop" and passed in the get_products function.

This is your code:

# Loop the links found in the anchor HTML tag "a" 
# that are inside the "h3" tag: 
for category_link in soup.select("h3.crux-product-title a"):
    # Get the "href" value from the link: 
    u = "https://www.consumerreports.org" + category_link["href"] 

The following line calls the get_products function that makes a request to the page (i.e. the url of the page obtained in the loop ):

all_data.extend(get_products(u))

In the get_products function, the code gets the titles found in the page passed in the u parameter and those titles are contained in an HTML element with the crux-component-title class:

for title in soup.select(".crux-component-title"):
    out.append(title.get_text(strip=True))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Marco Aurelio Fernandez Reyes