'Scraping points on a graph on a website using beautifulsoup or selenium
I want to get the values of the data points from the graph titled "Total Followers for 'OlympusDAO' (Monthly)" from this website:
https://socialblade.com/twitter/user/olympusdao/monthly
Here's what I tried:
import requests as req
from bs4 import BeautifulSoup as bs
import re
import pandas as pd
def sub_scraper(url, var):
r = req.get(url)
print(r.status_code)
soup = bs(r.text, 'html.parser')
script_divs = soup.find_all('script', {'type': 'text/javascript'})
res = 0
for i in range(len(script_divs)):
#print(i)
print(script_divs[i])
if "CSV" in str(script_divs[i]):
if var == 'count':
res = script_divs[i]
elif var == 'total':
res = script_divs[i + 1]
elif var == 'views':
res = script_divs[i + 2]
elif var == 'views_tot':
res = script_divs[i + 3]
break
#print(res)
lst = str(res).split('+')
lst = [test.strip() for test in lst]
lst = [test.replace('\\n"', '').replace('"', '') for test in lst]
return last
def to_df(url, name, var):
lst = sub_scraper(url, var)
print(len(lst))
lst = lst[1:len(lst) - 1]
df = pd.DataFrame()
df['Date'] = [x.split(',')[0] for x in lst]
df['Subs'] = [x.split(',')[1] for x in lst]
df['Name'] = name
return df
to_df('https://socialblade.com/twitter/user/olympusdao', 'Olympus', 'count')
However, I can't seem to figure out how to this. I was wondering how I can do this without their API (which costs money), using beautiful soup or selenium webdriver.
Solution 1:[1]
The data is loose in the page as [timestamp, value] 2d arrays. So you'd have to use regex.
Since selenium is allowed, how about:
data = driver.execute_script("""
return document.body.innerHTML.match(/\[\[.*\]\]/g).map(JSON.parse)[6]
""")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pguardiario |
