'Passing selenium webdriver session to requests gets error 403 in Cloudflare

I have a small crawler I did some time ago using requests and Beautiful Soup but I stopped working on it because it got stuck in a Cloudflare captcha.

Then I tried to circunvent this captcha issue using webdriver and it worked. I wrote a small proof of concept using selenium 3.141.0 and geckodriver 0.29.1, the latests versions then.

def VBulletinLoginSelenium(login_url='', login_data={}, test_url=''):
    driver = webdriver.Firefox()
    driver.get(login_url)
    element = driver.find_element(By.ID, "navbar_username")
    element.send_keys(login_data['vb_login_username'])
    element = driver.find_element(By.ID, "navbar_password")
    element.send_keys(login_data['vb_login_password'])
    element.send_keys(Keys.RETURN)
    timeout = 10
    # element_present = EC.presence_of_element_located((By.ID, 'element_id'))
    # WebDriverWait(driver, timeout).until(element_present)
    WebDriverWait(driver, timeout).until(EC.url_changes(login_url))

    headers = {
        "User-Agent":
            "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"
    }
    s = requests.session()
    s.headers.update(headers)

    for cookie in driver.get_cookies():
        c = {cookie['name']: cookie['value']}
        s.cookies.update(c)

    driver.close()
    cookie_bbimloggedin = s.cookies.get('bbimloggedin', default='no')
    if cookie_bbimloggedin == 'yes':

    current_page = s.get(test_url)
    if current_page.status_code != requests.codes.ok:
        return
    soup = BeautifulSoup(current_page.text, features="html.parser")
    regex_id = re.compile("edit([0-9]{9})")
    all_posts = soup.find_all('div', id=regex_id, recursive=True)

# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    VBulletinLoginSelenium(
        login_data={...})

This worked fine: I could log onto the forum (I check for a cookie 'bbimloggedin') and then start parsing the pages I needed.

Today I tried to add this to my first project. Login is succesful (I can check the cookie and see the forum after-login page) but I keep running once and again against a 403 error (the CAPTCHA problem) while using requests.

I don't know if this problem is something in my side -I just copied some code from one instance of pycharm to anocher- or something to do with Cloudfare.

Some more tests that I've done:

I created two VirtualEnvs: in one I use Python 3.6 and in the other Python 3.8

Python 3.6

Python 3.8

When using the 3.6 venv it works, but when I use the 3.8 venv I get 403 errors again.

In the Python 3.6 virtualenv I can't install the most recent version of Selenium, (4.10.0) and I have to use selenium 3.141.0



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source