'How to get product id and UPC in page source in Target?

I am trying to scrape some product ID and UPC of products in Target using Selenium in Python. I cannot find product id and UPC in product page so i go to the page source and find them in there. I try to use selenium and bs4 to get the product id and UPC but it doesn't work. Can anyone explain how to scrape data from web source? Thanks

web source is like this

<tr>
   <td class="line-content">
     <span class="html-comment"><!-- --></span>
     "12049604"

source

driver = webdriver.Chrome('D:\chromedriver\chromedriver_win32new\chromedriver_win32 (2)\chromedriver.exe')
driver.get('https://www.target.com/p/zyrtec-24-hour-allergy-relief-capsules-cetirizine-hcl/-/A-15075282?preselect=12049604#lnk=sametab')
time.sleep(3)

soup = BeautifulSoup(driver.page_source,'html.parser')
a = soup.find("span", {"class":"html-comment"}).get_text()
print(a)


Solution 1:[1]

Should like something like this

import from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome('D:\chromedriver\chromedriver_win32new\chromedriver_win32 (2)\chromedriver.exe')
driver.get('https://www.target.com/p/zyrtec-24-hour-allergy-relief-capsules-cetirizine-hcl/-/A-15075282?preselect=12049604#lnk=sametab') 

#needed to click onto the "Show more" to get the tcin and upc
timeout = 5
xpath = '//*[@id="tabContent-tab-Details"]/div/button'
element_present = EC.presence_of_element_located((By.XPATH, xpath))
WebDriverWait(driver, timeout).until(element_present)
driver.find_element(by=By.XPATH, value=xpath).click()

soup = BeautifulSoup(driver.page_source,'html.parser')

#gets a list of all elements under "Specifications"
div = soup.find("div", {"class":"styles__StyledCol-sc-ct8kx6-0 iKGdHS h-padding-h-tight"})
list = div.find_all("div")
for a in range(len(list)):
    list[a] = list[a].text

#locates the elements in the list
tcin = [v for v in list if v.startswith("TCIN")]
upc = [v for v in list if v.startswith("UPC")]

print(tcin)
print(upc)

The output will look like this:

['TCIN: 12049604']
['UPC: 300450204448']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrew Horowitz