'Webscraping: nested for loop only writing to last dataframe in the list
I am attempting to use a nested for loop to pull data from different web pages (salary information for each player on every MLB team), with each scraped web page being written to its own Pandas DataFrame. I'm getting all of the data I want, but it's only being written to the last DataFrame in the list.
l = {}
for name in name_list:
l[name] = pd.DataFrame(columns=['Player','Salary','% of Payroll'])
#create a for loop to scrape all urls and write each to their df
for page in url_list:
url = page
response=requests.get(url).text
soup=bs(response, 'html.parser')
sleep(randint(1,10))
table=soup.find('tbody')
for row in table.find_all('tr'):
col=row.find_all('a')
player=col[0].string
col=row.find_all('td')
payroll=col[7].text
percent=col[9].string
l[name]=l[name].append({'Player':player,'Salary':payroll,'% of Payroll':percent}, ignore_index=True)
print(l)
Snippet of output:
Index: [], 'St. Louis Cardinals': Empty DataFrame
Columns: [Player, Salary, % of Payroll]
Index: [], 'Tampa Bay Rays': Empty DataFrame
Columns: [Player, Salary, % of Payroll]
Index: [], 'Texas Rangers': Empty DataFrame
Columns: [Player, Salary, % of Payroll]
Index: [], 'Toronto Blue Jays': Empty DataFrame
Columns: [Player, Salary, % of Payroll]
Index: [], 'Washington Nationals': Player Salary % of Payroll
0 Madison Bumgarner $23,000,000 23.69
1 Ketel Marte $8,500,000 11.19
2 David Peralta $8,000,000 10.53
3 Mark Melancon $6,000,000 7.90
4 Merrill Kelly $5,583,333 7.35
.. ... ... ...
919 Andres Machado -0 0.00
920 Patrick Murphy -0 0.00
921 Josh Rogers -0 0.00
922 Keibert Ruiz -0 0.00
923 Lane Thomas -0 0.00
[924 rows x 3 columns]}
The information is being pulled fully and correctly, but only written to the final DF in the list. I have a feeling the issue is something simple like indentation, etc. but I'm out of ideas and experiments! Thank you in advance.
Solution 1:[1]
From @AndrejKesely's advice, the appropriate fix was
for name, page in zip(name_list, url_list):
Immediate fix, my original code was not iterating over the name list and therefore not writing to the named DF each time.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Umar.H |
