'Scraping site using Selenium but issues with content inside the Class

I am trying to scrape a site to get actor information that looks like this:

<fieldset class="table_content">
                        <legend class="hidden nolineHt">Select the actor</legend>
                        <div id="mainDiv">    
                        <div id="wrapperDiv">
                       
                                    <div class="firstRow">
                                               
                                                           
                                                           
                                                                        <input id="actorRow_0_check" name="actorDTO.actors[0].selectedActor" class="sno" type="checkbox" value="true"><input type="hidden" name="_actorDTO.actors[0].selectedActor" value="on">
                                                           
                                               
                                               
                                                <label class="lablemargin1 actor_row_class" id="actorRow_0" for="actorRow_0_check">
                                                            <span class="table_content_colone">
                                                                        <span class="nameBold"JOHN KRAZINSKI, P.H.D.</span><br>
                                                                        <span class="smallFont" aria-disable="true" aria-hidden="true">9255 W NEVERLAND RANCH </span><br>
                                                                        <span class="smallFont" aria-disable="true" aria-hidden="true">Los MAMA
                                                                        ,  </span>
                                                                        <span class="smallFont" aria-disable="true" aria-hidden="true">CA 99999</span>
                                                                        <br>
                                                        <span class="left smallFont tpPad10" aria-hidden="true" aria-disable="true">1-545-555-5555</span>
                                                            </span>          
                                                            <span class="table_content_coltwo textSmall" aria-disable="true" aria-hidden="true">
                                                           
                                                                        <span class="" aria-disabled="disabled" aria-hidden="true">ACTOR </span>         <br>                                       
                                                           
                                                           
                                                                       
                                                                       
                                                                                    <span class="">&nbsp;</span>
                                                                       
                                                           
                                                            </span>
                                                           
                                                                       
                                                                                    <span class="table_content_colthree textSmall" aria-disable="true" aria-hidden="true">Accepting New JOBS</span>           
                                                                       
                                                                       
                                                           
                                                           
                                                                       
                                                                        <span class="table_content_colfour textSmall" aria-disable="true" aria-hidden="true">0.69 miles</span>
                                                           
                                                </label>
                                    </div>
                                                           
                       
                                    <div class="firstRow">
                                               
                                                           
                                                           
                                                                        <input id="actorRow_1_check" name="actorDTO.actors[1].selectedActor" class="sno" type="checkbox" value="true"><input type="hidden" name="_actorDTO.actors[1].selectedActor" value="on">
                                                           
                                               
                                               
                                                <label class="lablemargin1 actor_row_class" id="actorRow_1" for="actorRow_1_check">
                                                            <span class="table_content_colone">
                                                                        <span class="nameBold">PAMELA BEASLEY, L.C.S.W.</span><br>
                                                                        <span class="smallFont" aria-disable="true" aria-hidden="true">2222 PENNSYLVANIA AVE </span><br>
                                                                        <span class="smallFont" aria-disable="true" aria-hidden="true">West PITTSBURGH
                                                                        ,  </span>
                                                                        <span class="smallFont" aria-disable="true" aria-hidden="true">CA 99999</span>
                                                                        <br>
                                                        <span class="left smallFont tpPad10" aria-hidden="true" aria-disable="true">1-555-555-5555</span>
                                                            </span>          
                                                            <span class="table_content_coltwo textSmall" aria-disable="true" aria-hidden="true">
                                                           
                                                                        <span class="" aria-disabled="disabled" aria-hidden="true">ACTOR </span>         <br>                                       
                                                           
                                                                        <span class="" aria-disabled="disabled" aria-hidden="true">ACTOR </span>         <br>                                       
                                                           
                                                           
                                                                       
                                                                       
                                                                                    <span class="">&nbsp;</span>
                                                                       
                                                           
                                                            </span>
                                                           
                                                                       
                                                                                    <span class="table_content_colthree textSmall" aria-disable="true" aria-hidden="true">Accepting New JOBS</span>           
                                                                       
                                                                       
                                                           
                                                           
                                                                       
                                                                        <span class="table_content_colfour textSmall" aria-disable="true" aria-hidden="true">0.79 miles</span>
                                                           
                                                </label>
                                    </div>

The issue is that i would like to get the name, address, city, phone, job, and whether they're currently employed or not.

I have tried to choose each one of those elements individually but the issue is that when i do that, the XPATH displays this:

//*[@id="actorRow_0"]/span[1]/span[1]

but then when I check the XPATH for the second actor it displays this:

//*[@id="actorRow_1"]/span[1]/span[1]

The XPATH is completely different. So when I run the scraper I only get the information for the first actor.

I have tried to bypass grabbing each element individually and settled for just grabbing everything that is blocked together. This is the code I currently have:

main = driver.find_element_by_id('mainDiv')
sections = main.find_elements_by_class_name('firstRow')


actor_info = []

#print(section.text)
for actor in sections:
    first_blox = pcp.find_element_by_class_name('table_content_colone').text
    second_blox = pcp.find_element_by_class_name('table_content_coltwo').text

    actor_items = {
        'first_block' : [first_blox],
        'second_block' : [second_blox]
    }

    actor_info.append(actor_items)

The issue I am encountering now is that since im focusing on pulling a list of values, only the last value gets returned to me at the end.

Any help on either outcome would be appreciated.

Thanks.

EDIT - Hey guys, just wanted to let everyone know that my answer to the second question was that i was inputing .text at the end of actor.find_element_by_class_name('table_content_colone').text the solution was to replace text with innerText and i was able to acquire the whole block i was scrapping.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Scraping site using Selenium but issues with content inside the Class

Sources

Related Questions