'How to only download updated CSV files via Python requests?
I am trying to download CSV files from:
Download sports fixtures, schedules and results as CSV, XLSX, ICS and JSON.
I have a python program that downloads the files I am looking for. The problem is the files that are being downloaded are not up-to-date. It is currently May and one of the files downloaded is out of date all the way back to November while some are actually up to date.
There is no consistency that I can see and I have run out of ideas on how to fix it. I have tried to touch all the file and folders involved to get the most recent timestamp. I have tried to clear all of the .pyc files. Nothing seems to work. Here is the code I am using:
base_url = 'https://fixturedownload.com/download/'
csv_file_names = [
'epl-2021-chelsea-EasternStandardTime.csv' ,
'champions-league-2021-chelsea-EasternStandardTime.csv',
'la-liga-2021-fc-barcelona-EasternStandardTime.csv',
'champions-league-2021-barcelona-EasternStandardTime.csv',
'ligue-1-2021-paris-saint-germain-EasternStandardTime.csv',
'champions-league-2021-paris-EasternStandardTime.csv',
'epl-2021-EasternStandardTime.csv',
'champions-league-2021-EasternStandardTime.csv',
'mlb-2021-baltimore-orioles-EasternStandardTime.csv',
'nfl-2020-pittsburgh-steelers-EasternStandardTime.csv'
]
count = 0
led_count = 0
for csv in csv_file_names:
print("Downloading...", count+1, "of", len(csv_file_names), "-", csv )
r = requests.get( base_url+csv, allow_redirects=True)
open( '/home/pi/Score-Checker/CSV-Files/'+csv, 'wb').write(r.content)
count += 1
Solution 1:[1]
Have you tried to pull the json feed instead of the download csv? Requires a slight change to your csv_file_names list. (Which if you need me to, we can work off your original list and just use regex to grab the relevant parts to place in the url).
import requests
import pandas as pd
import re
csv_file_names = [
['epl-2021','chelsea'] ,
['champions-league-2021','chelsea'],
['la-liga-2021','fc-barcelona'],
['champions-league-2021','barcelona'],
['ligue-1-2021','paris-saint-germain'],
['champions-league-2021','paris'],
['epl-2021', ''],
['champions-league-2021',''],
['mlb-2021','baltimore-orioles'],
['nfl-2020','pittsburgh-steelers']
]
for count, each in enumerate(csv_file_names, start=1):
url = 'https://fixturedownload.com/feed/json/%s/%s' %(each[0], each[-1])
jsonData = requests.get(url).json()
df = pd.DataFrame(jsonData)
csv = '%s-%s-.csv' %(each[0], each[-1])
print("Downloading...", count, "of", len(csv_file_names), "-", csv )
for col in ['HomeTeamScore', 'AwayTeamScore']:
df[col] = df[col].fillna(99).astype(int).astype(str)
df['Result'] = df['HomeTeamScore'] + ' - ' + df['AwayTeamScore']
df['Result'] = df['Result'].replace('99 - 99', '')
for col in df.columns:
if 'Date' in col:
newColName = 'Date'
else:
newColName = ' '.join(re.sub('([A-Z][a-z]+)', r' \1', re.sub('([A-Z]+)', r' \1', col)).split())
df = df.rename(columns={col:newColName})
df = df.drop(['Group', 'Home Team Score', 'Away Team Score'], axis=1)
df.to_csv('/home/pi/Score-Checker/CSV-Files/'+csv, index=False)
Output: of 'ligue-1-2021','paris-saint-germain'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

