'Find Elements Between Div With Selenium in Python
I have the following HTML code, I want to extract Years and names, I tried everything with no success :
<div class="Year">
<span class="date">2019</span>
</div>
<div class="cl2">
<span class="name">name1</span>
</div>
<div class="cl2">
<span class="name">name2</span>
</div>
<div class="cl2">
<span class="name">name3</span>
</div>
<div class="cl2">
<span class="name">name4</span>
</div>
<div class="Year">
<span class="date">2020</span>
</div>
<div class="cl2">
<span class="name">name5</span>
</div>
<div class="cl2">
<span class="name">name6</span>
</div>
What I want to get is :
2019
name1
name2
name3
name4
2020
name5
name6
I tried the following, using xpath
years = driver.find_elements_by_xpath("//div[@class='year']")
for year in years:
print(year.find_element_by_xpath(".//span[@class='date']").text)
names = driver.find_elements_by_xpath("//div[@class='name']")
for name in names:
print(name.find_element_by_xpath(".//span[@class='name']").text)
I got :
2019
2020
name1
name2
name3
name4
name5
name6
Solution 1:[1]
A solution is to work with a html file converted to a text file rather than working with the html file directly. This approach gives much more flexibility to extract the desired text from the given source file.
Firstly, import the import re library which will allow us to easily parse our html_text file
Then read in the text file and use .split() to split the text into a list based off of the year class. Next, iterate over the list and use re.search and re.findall to target your date and name classes within the text strings.
import re
f = open("html_text.txt", "r")
html_text = (f.read())
text_list = text.split('<div class="Year">')
for year in text_list[1:]:
date = re.search('<span class="date">(.+?)</span>', year)
names = re.findall('<span class="name">(.+?)</span>', year)
print(date.group(1))
for name in names:
print(name)
The output when printing out the results should look something like this
Output:
2019
name1
name2
name3
name4
2020
name5
name6
Hope this helped!!
Solution 2:[2]
I managed to find elements between div using .get_attribute("textContent") instead of .text using tip from Get Text from Span returns empty string
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tyler Russin |
| Solution 2 | Al Martins |
