'Webscraping python promotion information [closed]
I'm new working with python and trying to scrape a website using beautifulsoup. I can get information like the title and price but I can't get the promotion-information
Website: https://www.vitaminstore.nl/product/vitacura-vitamine-c-500-mg-calcium-ascorbaat-tabletten-1306065
Information needed: "Vitacura Vitamine C 1+1 gratis" enter image description here
import:
import requests
from glob import glob
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
from time import sleep
HEADERS = ({'User-Agent':
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
'Accept-Language': 'en-US, en;q=0.5'})
Promotion = soup.find("div", { "class" : "o-Promotions__Info" }).findall('span', { "class" : "o-Promotions__Title" })
Could anyone help me fix this?
Many thanx!!
Solution 1:[1]
The selenium module is ideal for processing web pages that are reliant on Javascript. You can achieve your objective like this:
from bs4 import BeautifulSoup as BS
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.add_argument('--headless')
CLASS = 'o-Promotions__Title'
with webdriver.Chrome(options=options) as driver:
driver.get('https://www.vitaminstore.nl/product/vitacura-vitamine-c-500-mg-calcium-ascorbaat-tabletten-1306065')
# wait up to 5 seconds for the relevant class to be observable
WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.CLASS_NAME, CLASS)))
span = BS(driver.page_source, 'lxml').select_one(f'span.{CLASS}')
print(span.text.strip())
Output:
Vitacura Vitamine C 1+1 gratis
Note:
You will need to install chromedriver for this. Details on selenium website
Solution 2:[2]
You don't need selenium when you can fetch the data directly. Just feed the product id into the url.
import requests
productId = '1306065'
url = f'https://www.vitaminstore.nl/Promotion/GetProductPromotion?productId={productId}'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36'}
jsonData = requests.get(url, headers=headers).json()
promotions = jsonData['RelatedPromotions']
for promotion in promotions:
print(promotion['Name'])
Output:
Vitacura Vitamine C 1+1 gratis
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Albert Winestein |
| Solution 2 | chitown88 |
