'Clear CAPTCHA Success From HTML
I'm trying to scrape some site data and have cleared the CAPTCHA I'm triggering manually - however I continue to load the CAPTCHA success page after I close and reopen my session:
Code:
import urllib, os, urllib.request, time, requests, random, pandas as pd
from datetime import date
from time import sleep
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.chrome.options import Options
from google_trans_new import google_translator
chrome_options = Options()
chrome_options.add_argument("user-data-dir=C:\\environments\\selenium")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.maximize_window()
driver.implicitly_wait(10)
driver.get("https://ca.indeed.com/")
search_company = driver.find_element(By.XPATH,"//*[@id='text-input-what']")
search_company.send_keys(Keys.CONTROL + "a")
search_company.send_keys(Keys.DELETE)
search_company.send_keys("Sales")
search_loc = driver.find_element(By.XPATH,"//*[@id='text-input-where']")
search_loc.send_keys(Keys.CONTROL + "a")
search_loc.send_keys(Keys.DELETE)
search_loc.send_keys("Quebec")
click_search = driver.find_element(By.XPATH,"//*[@id='jobsearch']/button")
click_search.click()
After running this block, I run:
page = driver.current_url
html = requests.get(page,verify=False)
soup = BeautifulSoup(html.content, 'html.parser', from_encoding = 'utf-8')
soup
And I can't avoid the HTML, and thus have nothing to scrape:
hCaptcha solve page
How do I stop returning the CAPTCHA success page and revert back to the page I'm trying to scrape? I've added my environment to try and retain the cookies but I'm at a loss on how to proceed.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|