'Read table from website using pandas read_html
I want to read the table from this website using pandas.read_html. The site shows the top 100 most viewed News Channels on YouTube.
I tried to grab the table using pandas:
import pandas as pd
df = pd.read_html('https://socialblade.com/youtube/top/category/news/mostviewed')
However, it raises the following error:
HTTPError: HTTP Error 403: Forbidden
Following this thread, I pretended to be a browser, but the response's text does not seem to have a table:
import requests
import pandas as pd
url = 'https://socialblade.com/youtube/top/category/news/mostviewed'
header = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"}
df = pd.read_html(requests.get(url, headers=header).text)
ValueError: No tables found
What is the easiest way to get this table into a pandas.DataFrame object?
Solution 1:[1]
It seems you want to scrap data from the site..
However, i would say you are using the wrong tool for this purpose, as if you closely see the html response of the site you are fetching, it do not have the html table tags.
Where as the pandas read_html() function seach for the <table> tags as stated in the pandas documentation here: - https://pandas.pydata.org/docs/reference/api/pandas.read_html.html#:~:text=This%20function%20searches,into%20the%20header).
I would suggest you to use the correct tool for scrapping data using Beautiful Soup. It is a python library for doing scrapping of websites.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Akash |
