'Issues with python's Beautiful Soup and Requests when scraping the steam market
Just to clarify I am new to Beautiful Soup/ requests and fairly new to coding with python in general.
I am trying to pull a list of names of items off the steam market place, so I can use this list in other projects. More specifically, I am pulling a list of CS:GO skins listings URL. I am doing this by scraping the URL of each item (10 per page) then changing the page number of the requests.get URL and repeating this process until my desired number of pages has been hit (I have it set to 5 for an example). The issue I am having is when I use the requests.get function it pulls the same webpage over and over again (with some slight variation), even though I am changing the page number of the URL, thus changing the steam market page number.
I presume it has something to do with the requests.get function because the page number increases, but the .get function displays the same information every time.
Here is my very poor code:
import requests
from bs4 import BeautifulSoup
page_number = 1
url = f'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any' \
'&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any' \
'&category_730_Type%5B%5D=tag_CSGO_Type_Pistol&category_730_Type%5B%5D=tag_CSGO_Type_SMG&category_730_Type%5B' \
'%5D=tag_CSGO_Type_Rifle&category_730_Type%5B%5D=tag_CSGO_Type_Shotgun&category_730_Type%5B%5D' \
'=tag_CSGO_Type_SniperRifle&category_730_Type%5B%5D=tag_CSGO_Type_Machinegun&category_730_Type%5B%5D' \
f'=tag_CSGO_Type_Knife&category_730_Type%5B%5D=tag_Type_Hands&appid=730#p{page_number}_name_asc'
while page_number != 6:
req = requests.get(url) # pulls the html file of the current url
soup = BeautifulSoup(req.content, 'html.parser') # creates a soup of the html file
the_list = set() # creates a dictionary to use later
for link in soup.find_all('a', href=True, class_="market_listing_row_link"): # finds where the hrefs are located
the_list.add(link['href']) # adds the href links to the dictionary
the_list = str(the_list) # coverts the dictionary to string, so it can be writen in a txt file
with open('List of names.txt', 'a') as file: # opens the txt file
file.write(the_list) # writes the list of hrefs in the txt file
page_number += 1 # increases the page number by one
print(url) # prints the URL for debugging
Thank you for your help!
It gives me these items over and over again: {'https://steamcommunity.com/market/listings/730/%E2%98%85%20Shadow%20Daggers%20%7C%20Blue%20Steel%20%28Minimal%20Wear%29', 'https://steamcommunity.com/market/listings/730/AK-47%20%7C%20Black%20Laminate%20%28Battle-Scarred%29', 'https://steamcommunity.com/market/listings/730/Souvenir%20P90%20%7C%20Sand%20Spray%20%28Battle-Scarred%29', 'https://steamcommunity.com/market/listings/730/StatTrak%E2%84%A2%20Five-SeveN%20%7C%20Nightshade%20%28Field-Tested%29', 'https://steamcommunity.com/market/listings/730/StatTrak%E2%84%A2%20XM1014%20%7C%20Entombed%20%28Field-Tested%29', 'https://steamcommunity.com/market/listings/730/Souvenir%20MP7%20%7C%20Gunsmoke%20%28Battle-Scarred%29', 'https://steamcommunity.com/market/listings/730/StatTrak%E2%84%A2%20MAC-10%20%7C%20Heat%20%28Minimal%20Wear%29', 'https://steamcommunity.com/market/listings/730/AWP%20%7C%20Elite%20Build%20%28Well-Worn%29', 'https://steamcommunity.com/market/listings/730/Souvenir%20Nova%20%7C%20Green%20Apple%20%28Factory%20New%29', 'https://steamcommunity.com/market/listings/730/P90%20%7C%20Blind%20Spot%20%28Field-Tested%29'}
Solution 1:[1]
I just tried it myself. When I run the code like this, I get the right result:
import requests
from bs4 import BeautifulSoup
page_number = 1
while page_number != 6:
url = f'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any' \
'&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any' \
'&category_730_Type%5B%5D=tag_CSGO_Type_Pistol&category_730_Type%5B%5D=tag_CSGO_Type_SMG&category_730_Type%5B' \
'%5D=tag_CSGO_Type_Rifle&category_730_Type%5B%5D=tag_CSGO_Type_Shotgun&category_730_Type%5B%5D' \
'=tag_CSGO_Type_SniperRifle&category_730_Type%5B%5D=tag_CSGO_Type_Machinegun&category_730_Type%5B%5D' \
f'=tag_CSGO_Type_Knife&category_730_Type%5B%5D=tag_Type_Hands&appid=730#p{page_number}_name_asc'
req = requests.get(url) # pulls the html file of the current url
# creates a soup of the html file
soup = BeautifulSoup(req.content, 'html.parser')
the_list = set() # creates a dictionary to use later
# finds where the hrefs are located
for link in soup.find_all('a', href=True, class_="market_listing_row_link"):
the_list.add(link['href']) # adds the href links to the dictionary
# coverts the dictionary to string, so it can be writen in a txt file
the_list = str(the_list)
with open('List of names.txt', 'a') as file: # opens the txt file
file.write(the_list) # writes the list of hrefs in the txt file
print(page_number)
page_number += 1 # increases the page number by one
print(url) # prints the URL for debugging
print('done')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
