'Unexpected data from web scraper
I'm scraping a webpage and the results are not turning out as expected. This is the code I'm running with Requests, BeautifulSoup, and Selenium.
#Beep beep so lets ride
driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.cardkingdom.com/')
driver.maximize_window()
#Search card, grab URL
search = driver.find_element(By.ID,"tags")
search.send_keys('Silver Myr')
search.send_keys(Keys.RETURN)
url = driver.current_url
#Making a soup
request = requests.get(url)
soup = BeautifulSoup(request.text, 'html.parser')
#Parse for correct card
findset = soup.find_all(['input'])
print(findset)
I expect to be given several lines of HTML that contain info on different versions of the MTG card "Silver Myr." For ex)
<input type="hidden" name="category" value="Scars of Mirrodin">
<input type="hidden" name="price" value="0.35">
etcetcetc more of these
But instead I get something that looks like this:
<input name="md" type="hidden" value="RfORPzNURs_n059RoHnU2L51HXgZQBQvwGBvEn4MX4U-1647612983-0-AfhGlkGttGt49azWEYz9rEsee-yEC56PMZaCaTeA4vIEdChp-08thT9V2k_-2xor9a55ZZ1L1zRuGjBk9TZ-CzLuEGoPi7ExjndPPjhbmWoIFvKf_A695H-soqHzJsfeECscLD8XmtU5gcy1e6YQandkZqZtS_xeDqReorLLzJNBbU5az-QTuFIUnlHa8RjVOxixc8LObyob3bbBcktPf4Z00F_F2mPe8ZAjmjd8CLXitHZyKpjauOq4I8VnqTMl5qVKpcuc6RJzA0iAk-vHXQaDG2C4yiNWnEJybNSAtna2RZNXvwXU5OvEj2qBY_BKO47-j7QAauX1CYYCP_rrZ5U2mVaGZfpl9mPoTYV6tS_Z4Th7P8Y5h8LMhhMaPW3gw01YZBiHRbHaZRqyzC4Sr5qaw0ixtGGrqM-Z9pGhq60hagxyJ5MYmxVrLSfEG5Wmb9OSm05TUDRZ3ySltM2SXMF7JNUeRZokxFrdMO54KS0G7qjx3B6KiKIuJUqd1JFMOtBo264PylnfSU59u5iYq93AU9uz0AwMnEsPC0rKkheZ0NJq3y4-095oR_OvQzGiIwb3PZtvHvz2EHhRwJykWbTUziF2pNLfcZ8BcQx-H2LKsDfqVY4zpP53UGnc_mYMlFaE4vpOuySeVpkrKE1dE_uB9ANbDLpJ7vdyhr9bcghkTt1SNiBf1wfjN1G0WS1w168Z0fIBWk37TaObHhQ2YrNSi3RzXPs94I2isqQk06SrWDA2sAvpE3e6yBXuxhG0LoBM7isDwdWSP9L68KUv-8w"/>
Am I haunted, or is there a real world reason that this data appears obfuscated.
Solution 1:[1]
You can get this data without Selenium. It is in the static requests:
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
search = 'Silver Myr'
url = 'https://www.cardkingdom.com/catalog/search'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
payload = {
'search': 'header',
'filter[name]': search}
response = requests.get(url, headers=headers, params=payload)
soup = BeautifulSoup(response.text, 'html.parser')
findset = soup.find_all('li', {'class':re.compile("^itemAddToCart")})
rows = []
for each in findset:
row = {}
inputs = each.find_all('input')
for i in inputs:
if 'class' in i.attrs.keys():
row[i['class'][0]] = i['value']
else:
row[i['name']] = i['value']
if i.find_next('span',{'class':'styleQty'}):
qtyAvail = i.find_next('span',{'class':'styleQty'}).text
row['qtyAvail'] = qtyAvail
rows.append(row)
df = pd.DataFrame(rows)
Output:
print(df.to_string())
product_id style qty maxQty category model name price slug title variation qtyAvail
0 132098 NM 0 61 Scars of Mirrodin mtg_card Scars of Mirrodin: Silver Myr 0.35 silver-myr Scars of Mirrodin: Scars of Mirrodin: Silver Myr 20
1 132098 EX 0 58 Scars of Mirrodin mtg_card Scars of Mirrodin: Silver Myr 0.28 silver-myr Scars of Mirrodin: Scars of Mirrodin: Silver Myr 7
2 132098 VG 0 7 Scars of Mirrodin mtg_card Scars of Mirrodin: Silver Myr 0.25 silver-myr Scars of Mirrodin: Scars of Mirrodin: Silver Myr 20
3 132098 G 0 Scars of Mirrodin mtg_card Scars of Mirrodin: Silver Myr 0.18 silver-myr Scars of Mirrodin: Scars of Mirrodin: Silver Myr 20
4 257350 NM 0 243 Kamigawa: Neon Dynasty Commander Decks mtg_card Kamigawa: Neon Dynasty Commander Decks: Silver Myr 0.25 silver-myr Kamigawa: Neon Dynasty Commander Decks: Kamigawa: Neon Dynasty Commander Decks: Silver Myr 20
5 257350 EX 0 Kamigawa: Neon Dynasty Commander Decks mtg_card Kamigawa: Neon Dynasty Commander Decks: Silver Myr 0.20 silver-myr Kamigawa: Neon Dynasty Commander Decks: Kamigawa: Neon Dynasty Commander Decks: Silver Myr 20
6 257350 VG 0 Kamigawa: Neon Dynasty Commander Decks mtg_card Kamigawa: Neon Dynasty Commander Decks: Silver Myr 0.18 silver-myr Kamigawa: Neon Dynasty Commander Decks: Kamigawa: Neon Dynasty Commander Decks: Silver Myr 20
7 257350 G 0 Kamigawa: Neon Dynasty Commander Decks mtg_card Kamigawa: Neon Dynasty Commander Decks: Silver Myr 0.13 silver-myr Kamigawa: Neon Dynasty Commander Decks: Kamigawa: Neon Dynasty Commander Decks: Silver Myr 20
8 72240 NM 0 Mirrodin mtg_card Mirrodin: Silver Myr 0.39 silver-myr Mirrodin: Mirrodin: Silver Myr 20
9 72240 EX 0 44 Mirrodin mtg_card Mirrodin: Silver Myr 0.31 silver-myr Mirrodin: Mirrodin: Silver Myr 20
10 72240 VG 0 23 Mirrodin mtg_card Mirrodin: Silver Myr 0.27 silver-myr Mirrodin: Mirrodin: Silver Myr 1
11 72240 G 0 Mirrodin mtg_card Mirrodin: Silver Myr 0.20 silver-myr Mirrodin: Mirrodin: Silver Myr 1
12 127702 NM 0 1 Planechase mtg_card Planechase: Silver Myr 0.79 silver-myr Planechase: Planechase: Silver Myr 7
13 127702 EX 0 7 Planechase mtg_card Planechase: Silver Myr 0.63 silver-myr Planechase: Planechase: Silver Myr 1
14 127702 VG 0 1 Planechase mtg_card Planechase: Silver Myr 0.55 silver-myr Planechase: Planechase: Silver Myr 1
15 127702 G 0 Planechase mtg_card Planechase: Silver Myr 0.40 silver-myr Planechase: Planechase: Silver Myr 1
16 132036 NM 0 1 Duel Decks: Elspeth Vs. Tezzeret mtg_card Duel Decks: Elspeth Vs. Tezzeret: Silver Myr 0.69 silver-myr Duel Decks: Elspeth Vs. Tezzeret: Duel Decks: Elspeth Vs. Tezzeret: Silver Myr 9
17 132036 EX 0 9 Duel Decks: Elspeth Vs. Tezzeret mtg_card Duel Decks: Elspeth Vs. Tezzeret: Silver Myr 0.55 silver-myr Duel Decks: Elspeth Vs. Tezzeret: Duel Decks: Elspeth Vs. Tezzeret: Silver Myr 4
18 132036 VG 0 4 Duel Decks: Elspeth Vs. Tezzeret mtg_card Duel Decks: Elspeth Vs. Tezzeret: Silver Myr 0.48 silver-myr Duel Decks: Elspeth Vs. Tezzeret: Duel Decks: Elspeth Vs. Tezzeret: Silver Myr NaN
19 132036 G 0 Duel Decks: Elspeth Vs. Tezzeret mtg_card Duel Decks: Elspeth Vs. Tezzeret: Silver Myr 0.35 silver-myr Duel Decks: Elspeth Vs. Tezzeret: Duel Decks: Elspeth Vs. Tezzeret: Silver Myr NaN
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | chitown88 |
