'How to get product id and UPC in page source in Target?
I am trying to scrape some product ID and UPC of products in Target using Selenium in Python. I cannot find product id and UPC in product page so i go to the page source and find them in there. I try to use selenium and bs4 to get the product id and UPC but it doesn't work. Can anyone explain how to scrape data from web source? Thanks
web source is like this
<tr>
<td class="line-content">
<span class="html-comment"><!-- --></span>
"12049604"
driver = webdriver.Chrome('D:\chromedriver\chromedriver_win32new\chromedriver_win32 (2)\chromedriver.exe')
driver.get('https://www.target.com/p/zyrtec-24-hour-allergy-relief-capsules-cetirizine-hcl/-/A-15075282?preselect=12049604#lnk=sametab')
time.sleep(3)
soup = BeautifulSoup(driver.page_source,'html.parser')
a = soup.find("span", {"class":"html-comment"}).get_text()
print(a)
Solution 1:[1]
Should like something like this
import from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('D:\chromedriver\chromedriver_win32new\chromedriver_win32 (2)\chromedriver.exe')
driver.get('https://www.target.com/p/zyrtec-24-hour-allergy-relief-capsules-cetirizine-hcl/-/A-15075282?preselect=12049604#lnk=sametab')
#needed to click onto the "Show more" to get the tcin and upc
timeout = 5
xpath = '//*[@id="tabContent-tab-Details"]/div/button'
element_present = EC.presence_of_element_located((By.XPATH, xpath))
WebDriverWait(driver, timeout).until(element_present)
driver.find_element(by=By.XPATH, value=xpath).click()
soup = BeautifulSoup(driver.page_source,'html.parser')
#gets a list of all elements under "Specifications"
div = soup.find("div", {"class":"styles__StyledCol-sc-ct8kx6-0 iKGdHS h-padding-h-tight"})
list = div.find_all("div")
for a in range(len(list)):
list[a] = list[a].text
#locates the elements in the list
tcin = [v for v in list if v.startswith("TCIN")]
upc = [v for v in list if v.startswith("UPC")]
print(tcin)
print(upc)
The output will look like this:
['TCIN: 12049604']
['UPC: 300450204448']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Andrew Horowitz |