'Find Elements Between Div With Selenium in Python

I have the following HTML code, I want to extract Years and names, I tried everything with no success :

<div class="Year">

<span class="date">2019</span>

</div>



<div class="cl2">
    <span class="name">name1</span>
</div>
<div class="cl2">
    <span class="name">name2</span>
</div>
<div class="cl2">
    <span class="name">name3</span>
</div>
<div class="cl2">
    <span class="name">name4</span>
</div>



<div class="Year">
    <span class="date">2020</span>
</div>

<div class="cl2">
    <span class="name">name5</span>
</div>
<div class="cl2">
    <span class="name">name6</span>
</div>

What I want to get is :

2019
name1
name2
name3
name4
2020
name5
name6

I tried the following, using xpath

years = driver.find_elements_by_xpath("//div[@class='year']")

for year in years:
    
    print(year.find_element_by_xpath(".//span[@class='date']").text)

names = driver.find_elements_by_xpath("//div[@class='name']")

for name in names:
    print(name.find_element_by_xpath(".//span[@class='name']").text)

I got :

2019

2020

name1

name2

name3

name4

name5

name6



Solution 1:[1]

A solution is to work with a html file converted to a text file rather than working with the html file directly. This approach gives much more flexibility to extract the desired text from the given source file.

Firstly, import the import re library which will allow us to easily parse our html_text file

Then read in the text file and use .split() to split the text into a list based off of the year class. Next, iterate over the list and use re.search and re.findall to target your date and name classes within the text strings.

import re 

f = open("html_text.txt", "r")
html_text = (f.read())

text_list = text.split('<div class="Year">')

for year in text_list[1:]:
  date = re.search('<span class="date">(.+?)</span>', year)
  names = re.findall('<span class="name">(.+?)</span>', year)

  print(date.group(1))
  for name in names:
    print(name)

The output when printing out the results should look something like this

Output:

2019
name1
name2
name3
name4
2020
name5
name6

Hope this helped!!

Solution 2:[2]

I managed to find elements between div using .get_attribute("textContent") instead of .text using tip from Get Text from Span returns empty string

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tyler Russin
Solution 2 Al Martins