'How to scrape links in a loop and store the results in their respective CSVs?
I have a scraping script that scrapes data from an RSS feed. I have a list of RSS feed links that I want to pass onto a loop that stores their respective results into their CSVs.
My feedlink_01.py
# My current approach:
df1 = pd.read_csv("feedlink_01.csv")
URL = "RSSfeedlink_01.com"
# Do some scraping
df2 = pd.DataFrame(output)
df = pd.concat([df1, df2]).drop_duplicates('name')
df.to_csv('feedlink_01.csv', index=False)
My feedlink_02.py
# My current approach:
df1 = pd.read_csv("feedlink_02.csv")
URL = "RSSfeedlink_02.com"
# Do some scraping
df2 = pd.DataFrame(output)
df = pd.concat([df1, df2]).drop_duplicates('name')
df.to_csv('feedlink_02.csv', index=False)
I have a folder of scripts which has 15 files with the exact same script except the feedlink.csv and URL is different.
How do I run it all in a single file, If possible?
Solution 1:[1]
You need to use a loop. Here we use a for..in loop and use .replace to replace com with csv. Just add your feed links to the rss_feeds list.
rss_feeds = [ "RSSfeedlink_01.com", "RSSfeedlink_02.com", "RSSfeedlink_03.com" ...]
for feed in rss_feeds:
df1 = pd.read_csv(feed.replace("com", "csv"))
df2 = pd.DataFrame(output)
df = pd.concat([df1, df2]).drop_duplicates('name')
df.to_csv(feed.replace("com", "csv"), index=False)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Invizi |
