'How to scrape links in a loop and store the results in their respective CSVs?

I have a scraping script that scrapes data from an RSS feed. I have a list of RSS feed links that I want to pass onto a loop that stores their respective results into their CSVs.

My feedlink_01.py

# My current approach:
df1 = pd.read_csv("feedlink_01.csv")
URL = "RSSfeedlink_01.com"
# Do some scraping
df2 = pd.DataFrame(output)
df = pd.concat([df1, df2]).drop_duplicates('name')
df.to_csv('feedlink_01.csv', index=False)

My feedlink_02.py

# My current approach:
df1 = pd.read_csv("feedlink_02.csv")
URL = "RSSfeedlink_02.com"
# Do some scraping
df2 = pd.DataFrame(output)
df = pd.concat([df1, df2]).drop_duplicates('name')
df.to_csv('feedlink_02.csv', index=False)

I have a folder of scripts which has 15 files with the exact same script except the feedlink.csv and URL is different.

How do I run it all in a single file, If possible?



Solution 1:[1]

You need to use a loop. Here we use a for..in loop and use .replace to replace com with csv. Just add your feed links to the rss_feeds list.

rss_feeds = [ "RSSfeedlink_01.com", "RSSfeedlink_02.com", "RSSfeedlink_03.com" ...]

for feed in rss_feeds:
  df1 = pd.read_csv(feed.replace("com", "csv"))
  df2 = pd.DataFrame(output)
  df = pd.concat([df1, df2]).drop_duplicates('name')
  df.to_csv(feed.replace("com", "csv"), index=False)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Invizi