'Passing selenium webdriver session to requests gets error 403 in Cloudflare
I have a small crawler I did some time ago using requests and Beautiful Soup but I stopped working on it because it got stuck in a Cloudflare captcha.
Then I tried to circunvent this captcha issue using webdriver and it worked. I wrote a small proof of concept using selenium 3.141.0 and geckodriver 0.29.1, the latests versions then.
def VBulletinLoginSelenium(login_url='', login_data={}, test_url=''):
driver = webdriver.Firefox()
driver.get(login_url)
element = driver.find_element(By.ID, "navbar_username")
element.send_keys(login_data['vb_login_username'])
element = driver.find_element(By.ID, "navbar_password")
element.send_keys(login_data['vb_login_password'])
element.send_keys(Keys.RETURN)
timeout = 10
# element_present = EC.presence_of_element_located((By.ID, 'element_id'))
# WebDriverWait(driver, timeout).until(element_present)
WebDriverWait(driver, timeout).until(EC.url_changes(login_url))
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"
}
s = requests.session()
s.headers.update(headers)
for cookie in driver.get_cookies():
c = {cookie['name']: cookie['value']}
s.cookies.update(c)
driver.close()
cookie_bbimloggedin = s.cookies.get('bbimloggedin', default='no')
if cookie_bbimloggedin == 'yes':
current_page = s.get(test_url)
if current_page.status_code != requests.codes.ok:
return
soup = BeautifulSoup(current_page.text, features="html.parser")
regex_id = re.compile("edit([0-9]{9})")
all_posts = soup.find_all('div', id=regex_id, recursive=True)
# Press the green button in the gutter to run the script.
if __name__ == '__main__':
VBulletinLoginSelenium(
login_data={...})
This worked fine: I could log onto the forum (I check for a cookie 'bbimloggedin') and then start parsing the pages I needed.
Today I tried to add this to my first project. Login is succesful (I can check the cookie and see the forum after-login page) but I keep running once and again against a 403 error (the CAPTCHA problem) while using requests.
I don't know if this problem is something in my side -I just copied some code from one instance of pycharm to anocher- or something to do with Cloudfare.
Some more tests that I've done:
I created two VirtualEnvs: in one I use Python 3.6 and in the other Python 3.8
When using the 3.6 venv it works, but when I use the 3.8 venv I get 403 errors again.
In the Python 3.6 virtualenv I can't install the most recent version of Selenium, (4.10.0) and I have to use selenium 3.141.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|


