'Scraping text in meta tag with selenium

I'm trying to get the book description from the following webpage: https://bookshop.org/books/lucky-9798200961177/9781668002452

This is what I've got so far

***EDIT***
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome('path_to_my_driver_on_local', options=options)
driver.get('https://bookshop.org/a/16709/9781668002452')
description = driver.find_element_by_xpath("//meta[@name='description']").get_attribute("content")
description

Basically, I'm trying to get the text inside of this html:


<meta name="description" content="REESE'S BOOK CLUB PICK NEW YORK TIMES BESTSELLER A thrilling roller-coaster ride about a heist gone terribly wrong, with a plucky protagonist who will win readers' hearts. What if you had the winning ticket ....">

I end up with the following error

 Message: no such element: Unable to locate element: {"method":"xpath","selector":"//meta[@name='description']"}

Solution 1:^[1]

elem=driver.find_element(By.XPATH,"//meta[@name='description']")
print(elem.get_attribute("content"))

You can use a more inclusive xpath. Then target the attribute for content.

Imports:

from selenium.webdriver.common.by import By

Solution 2:^[2]

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome('path_to_my_driver_on_local', options=options)

driver.get('https://bookshop.org/books/lucky-9798200961177/9781668002452')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
Description = soup.find_all('div', class_="title-description")
print(Description[0].text)

Solution 3:^[3]

You need to target the element with the correct xpath. Your value for the xpath //meta[@content] is returning the first meta element that contains a content attribute. I would recommend using the xpath //meta[@name="description"] or the css selector meta[name="description"] for a more precise selection. This works perfectly:

# imports and boilerplate
....

description_meta_element = driver.find_element_by_css_selector('meta[name="description"]')
description_meta_content = description_meta_element.get_attribute('content')
print(description_meta_content)

Solution 4:^[4]

This <meta> tag...

<meta name="description" content="REESE'S BOOK CLUB PICK NEW YORK TIMES BESTSELLER A thrilling roller-coaster ride about a heist gone terribly wrong, with a plucky protagonist who will win readers' hearts. What if you had the winning ticket ....">

...is within the <head> section. So Selenium won't be able to scrape this element.

Solution

In this case your best bet would be to use BeautifulSoup with urllib.request as follows:

from bs4 import BeautifulSoup
from urllib.request import urlopen #  In python3, urllib2 has been split into urllib.request and urllib.error

webpage = urlopen('https://bookshop.org/books/lucky-9798200961177/9781668002452').read()
soup = BeautifulSoup(webpage, "lxml")
my_meta = soup.find("meta",{"name":"description"})
print(my_meta[content])

References

You can find a couple of relevant detailed discussions in:

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Arundeep Chohan
Solution 2	ma9
Solution 3	Dharman
Solution 4	undetected Selenium

'Scraping text in meta tag with selenium

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]