'Scraping different years from Tableau

I have to scrape this table but it seems that TableauScraper does not recognise that multiple years are available.

Here is the Table https://public.tableau.com/app/profile/mapping.social.movements/viz/SocialistPartyScandinavianFederation/Story1

And this is the code I have written that scrapes the year 1914.

from tableauscraper import TableauScraper as TS
url= "https://public.tableau.com/views/SocialistPartyScandinavianFederation/Story1"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

sheets = workbook.getSheets()
print(sheets)

# show original data for worksheet
ws = ts.getWorksheet("tab1")
print(ws.data)

How can I scrape the rest of the years?



Solution 1:[1]

I'd go for a Selenium answer. I understand that the information that you need are from the tables when you select the years 1914, 1915, 1916, 1917 and 1918.

I would create a for loop that:

  1. Clicks in the year.
  2. Extracts the information from the table

The code would be something like this:


#Declaration of selenium

[...]

#Scraper

years = [1914, 1915, 1916, 1917, 1918] #The steps of the iterations

d = {} #An empty dictionary to store the global result

for i in years:

    current_item = {} #A local dictionary in the loop to store the current value
    
    #click on the button of year i (the i_path code depends on the button html)
    driver.find_element_by_xpath(i_path).click() 
    
    time.sleep(.5) #Once you've clicked in the desired year, the table will be displayed, but you might need to wait a bit depending on your internet connection.

    #Here's the tricky part. You'll need to iterate through the table and get the information.

    d.update(current_item) #You fill the global dictionary with the current information
    
#At the end of the code, you'll be able to extract the information as a json file, csv or any type of document to make your data-analysis.

There's a big workaround through the scraping table part, but I'm sure it'll do the job :)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1