'Scraping points on a graph on a website using beautifulsoup or selenium

I want to get the values of the data points from the graph titled "Total Followers for 'OlympusDAO' (Monthly)" from this website:

https://socialblade.com/twitter/user/olympusdao/monthly

Here's what I tried:

import requests as req
from bs4 import BeautifulSoup as bs
import re
import pandas as pd

def sub_scraper(url, var):
    r = req.get(url)
    print(r.status_code)
    soup = bs(r.text, 'html.parser')
    script_divs = soup.find_all('script', {'type': 'text/javascript'})
    res = 0
    for i in range(len(script_divs)):
        #print(i)
        print(script_divs[i])
        if "CSV" in str(script_divs[i]):
            if var == 'count':
                res = script_divs[i]
            elif var == 'total':
                res = script_divs[i + 1]
            elif var == 'views':
                res = script_divs[i + 2]
            elif var == 'views_tot':
                res = script_divs[i + 3]
            break
    #print(res)
    lst = str(res).split('+')
    lst = [test.strip() for test in lst]
    lst = [test.replace('\\n"', '').replace('"', '') for test in lst]
    return last

def to_df(url, name, var):
    lst = sub_scraper(url, var)
    print(len(lst))
    lst = lst[1:len(lst) - 1]
    df = pd.DataFrame()
    df['Date'] = [x.split(',')[0] for x in lst]
    df['Subs'] = [x.split(',')[1] for x in lst]
    df['Name'] = name
    return df

to_df('https://socialblade.com/twitter/user/olympusdao', 'Olympus', 'count')

However, I can't seem to figure out how to this. I was wondering how I can do this without their API (which costs money), using beautiful soup or selenium webdriver.



Solution 1:[1]

The data is loose in the page as [timestamp, value] 2d arrays. So you'd have to use regex.

Since selenium is allowed, how about:

data = driver.execute_script("""
  return document.body.innerHTML.match(/\[\[.*\]\]/g).map(JSON.parse)[6]
""")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pguardiario