'Scraping different years from Tableau
I have to scrape this table but it seems that TableauScraper does not recognise that multiple years are available.
Here is the Table https://public.tableau.com/app/profile/mapping.social.movements/viz/SocialistPartyScandinavianFederation/Story1
And this is the code I have written that scrapes the year 1914.
from tableauscraper import TableauScraper as TS
url= "https://public.tableau.com/views/SocialistPartyScandinavianFederation/Story1"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
sheets = workbook.getSheets()
print(sheets)
# show original data for worksheet
ws = ts.getWorksheet("tab1")
print(ws.data)
How can I scrape the rest of the years?
Solution 1:[1]
I'd go for a Selenium answer. I understand that the information that you need are from the tables when you select the years 1914, 1915, 1916, 1917 and 1918.
I would create a for loop that:
- Clicks in the year.
- Extracts the information from the table
The code would be something like this:
#Declaration of selenium
[...]
#Scraper
years = [1914, 1915, 1916, 1917, 1918] #The steps of the iterations
d = {} #An empty dictionary to store the global result
for i in years:
current_item = {} #A local dictionary in the loop to store the current value
#click on the button of year i (the i_path code depends on the button html)
driver.find_element_by_xpath(i_path).click()
time.sleep(.5) #Once you've clicked in the desired year, the table will be displayed, but you might need to wait a bit depending on your internet connection.
#Here's the tricky part. You'll need to iterate through the table and get the information.
d.update(current_item) #You fill the global dictionary with the current information
#At the end of the code, you'll be able to extract the information as a json file, csv or any type of document to make your data-analysis.
There's a big workaround through the scraping table part, but I'm sure it'll do the job :)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |