'Scraping site using Selenium but issues with content inside the Class
I am trying to scrape a site to get actor information that looks like this:
<fieldset class="table_content">
<legend class="hidden nolineHt">Select the actor</legend>
<div id="mainDiv">
<div id="wrapperDiv">
<div class="firstRow">
<input id="actorRow_0_check" name="actorDTO.actors[0].selectedActor" class="sno" type="checkbox" value="true"><input type="hidden" name="_actorDTO.actors[0].selectedActor" value="on">
<label class="lablemargin1 actor_row_class" id="actorRow_0" for="actorRow_0_check">
<span class="table_content_colone">
<span class="nameBold"JOHN KRAZINSKI, P.H.D.</span><br>
<span class="smallFont" aria-disable="true" aria-hidden="true">9255 W NEVERLAND RANCH </span><br>
<span class="smallFont" aria-disable="true" aria-hidden="true">Los MAMA
, </span>
<span class="smallFont" aria-disable="true" aria-hidden="true">CA 99999</span>
<br>
<span class="left smallFont tpPad10" aria-hidden="true" aria-disable="true">1-545-555-5555</span>
</span>
<span class="table_content_coltwo textSmall" aria-disable="true" aria-hidden="true">
<span class="" aria-disabled="disabled" aria-hidden="true">ACTOR </span> <br>
<span class=""> </span>
</span>
<span class="table_content_colthree textSmall" aria-disable="true" aria-hidden="true">Accepting New JOBS</span>
<span class="table_content_colfour textSmall" aria-disable="true" aria-hidden="true">0.69 miles</span>
</label>
</div>
<div class="firstRow">
<input id="actorRow_1_check" name="actorDTO.actors[1].selectedActor" class="sno" type="checkbox" value="true"><input type="hidden" name="_actorDTO.actors[1].selectedActor" value="on">
<label class="lablemargin1 actor_row_class" id="actorRow_1" for="actorRow_1_check">
<span class="table_content_colone">
<span class="nameBold">PAMELA BEASLEY, L.C.S.W.</span><br>
<span class="smallFont" aria-disable="true" aria-hidden="true">2222 PENNSYLVANIA AVE </span><br>
<span class="smallFont" aria-disable="true" aria-hidden="true">West PITTSBURGH
, </span>
<span class="smallFont" aria-disable="true" aria-hidden="true">CA 99999</span>
<br>
<span class="left smallFont tpPad10" aria-hidden="true" aria-disable="true">1-555-555-5555</span>
</span>
<span class="table_content_coltwo textSmall" aria-disable="true" aria-hidden="true">
<span class="" aria-disabled="disabled" aria-hidden="true">ACTOR </span> <br>
<span class="" aria-disabled="disabled" aria-hidden="true">ACTOR </span> <br>
<span class=""> </span>
</span>
<span class="table_content_colthree textSmall" aria-disable="true" aria-hidden="true">Accepting New JOBS</span>
<span class="table_content_colfour textSmall" aria-disable="true" aria-hidden="true">0.79 miles</span>
</label>
</div>
The issue is that i would like to get the name, address, city, phone, job, and whether they're currently employed or not.
I have tried to choose each one of those elements individually but the issue is that when i do that, the XPATH displays this:
//*[@id="actorRow_0"]/span[1]/span[1]
but then when I check the XPATH for the second actor it displays this:
//*[@id="actorRow_1"]/span[1]/span[1]
The XPATH is completely different. So when I run the scraper I only get the information for the first actor.
I have tried to bypass grabbing each element individually and settled for just grabbing everything that is blocked together. This is the code I currently have:
main = driver.find_element_by_id('mainDiv')
sections = main.find_elements_by_class_name('firstRow')
actor_info = []
#print(section.text)
for actor in sections:
first_blox = pcp.find_element_by_class_name('table_content_colone').text
second_blox = pcp.find_element_by_class_name('table_content_coltwo').text
actor_items = {
'first_block' : [first_blox],
'second_block' : [second_blox]
}
actor_info.append(actor_items)
The issue I am encountering now is that since im focusing on pulling a list of values, only the last value gets returned to me at the end.
Any help on either outcome would be appreciated.
Thanks.
EDIT - Hey guys, just wanted to let everyone know that my answer to the second question was that i was inputing .text at the end of actor.find_element_by_class_name('table_content_colone').text the solution was to replace text with innerText and i was able to acquire the whole block i was scrapping.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
