'Requests_HTML in Python, why .find returns the whole page code?
I'm trying to learn web scraping with 'resquests-html' library, and in all the tutorials, the selector html.find('.class') works well to find a CSS 'class' and return the text inside.
My example:
from requests_html import HTMLSession
s = HTMLSession()
link = 'https://prev.lifestylegarden.uk/simple-page.html'
f = s.get(link)
title = f.html.find('.title', first=True).text
print(title)
In my test HTML page I have:
<h1 id="title">Welcome to our simple page project</h1> and I want to return:
# Welcome to our simple page project
But at the end I am getting the whole page texts from the beginning H1 to the end, without HTML tags.
I've followed 2-3 different tutorials and different websites for test and everyone writes it that way and get string correctly from the 'class' and not the whole site.
Am I missing something?
Thanks for your great support.
Solution 1:[1]
I've found the root of the issue.
I am using Python 3.9 when this error was occurring.
I had to downgrade the version to 3.6 to make it work:
conda create -n envpy-3.6 python=3.6 anaconda
This will create a local environment called envpy-3.6 that allowed me to create a local version of 3.6 to work on these projects.
activate envpy-3.6
This did activate the environment.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ButterySAM777 |
