'Issues with python's Beautiful Soup and Requests when scraping the steam market

Just to clarify I am new to Beautiful Soup/ requests and fairly new to coding with python in general.

I am trying to pull a list of names of items off the steam market place, so I can use this list in other projects. More specifically, I am pulling a list of CS:GO skins listings URL. I am doing this by scraping the URL of each item (10 per page) then changing the page number of the requests.get URL and repeating this process until my desired number of pages has been hit (I have it set to 5 for an example). The issue I am having is when I use the requests.get function it pulls the same webpage over and over again (with some slight variation), even though I am changing the page number of the URL, thus changing the steam market page number.

I presume it has something to do with the requests.get function because the page number increases, but the .get function displays the same information every time.

Here is my very poor code:

import requests
from bs4 import BeautifulSoup

page_number = 1

url = f'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any' \
      '&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any' \
      '&category_730_Type%5B%5D=tag_CSGO_Type_Pistol&category_730_Type%5B%5D=tag_CSGO_Type_SMG&category_730_Type%5B' \
      '%5D=tag_CSGO_Type_Rifle&category_730_Type%5B%5D=tag_CSGO_Type_Shotgun&category_730_Type%5B%5D' \
      '=tag_CSGO_Type_SniperRifle&category_730_Type%5B%5D=tag_CSGO_Type_Machinegun&category_730_Type%5B%5D' \
      f'=tag_CSGO_Type_Knife&category_730_Type%5B%5D=tag_Type_Hands&appid=730#p{page_number}_name_asc'

while page_number != 6:

    req = requests.get(url)  # pulls the html file of the current url

    soup = BeautifulSoup(req.content, 'html.parser')  # creates a soup of the html file

    the_list = set()  # creates a dictionary to use later

    for link in soup.find_all('a', href=True, class_="market_listing_row_link"):   # finds where the hrefs are located
        the_list.add(link['href'])   # adds the href links to the dictionary

    the_list = str(the_list)   # coverts the dictionary to string, so it can be writen in a txt file

    with open('List of names.txt', 'a') as file:    # opens the txt file
        file.write(the_list)   # writes the list of hrefs in the txt file

    page_number += 1   # increases the page number by one

    print(url)    # prints the URL for debugging

Thank you for your help!

It gives me these items over and over again: {'https://steamcommunity.com/market/listings/730/%E2%98%85%20Shadow%20Daggers%20%7C%20Blue%20Steel%20%28Minimal%20Wear%29', 'https://steamcommunity.com/market/listings/730/AK-47%20%7C%20Black%20Laminate%20%28Battle-Scarred%29', 'https://steamcommunity.com/market/listings/730/Souvenir%20P90%20%7C%20Sand%20Spray%20%28Battle-Scarred%29', 'https://steamcommunity.com/market/listings/730/StatTrak%E2%84%A2%20Five-SeveN%20%7C%20Nightshade%20%28Field-Tested%29', 'https://steamcommunity.com/market/listings/730/StatTrak%E2%84%A2%20XM1014%20%7C%20Entombed%20%28Field-Tested%29', 'https://steamcommunity.com/market/listings/730/Souvenir%20MP7%20%7C%20Gunsmoke%20%28Battle-Scarred%29', 'https://steamcommunity.com/market/listings/730/StatTrak%E2%84%A2%20MAC-10%20%7C%20Heat%20%28Minimal%20Wear%29', 'https://steamcommunity.com/market/listings/730/AWP%20%7C%20Elite%20Build%20%28Well-Worn%29', 'https://steamcommunity.com/market/listings/730/Souvenir%20Nova%20%7C%20Green%20Apple%20%28Factory%20New%29', 'https://steamcommunity.com/market/listings/730/P90%20%7C%20Blind%20Spot%20%28Field-Tested%29'}



Solution 1:[1]

I just tried it myself. When I run the code like this, I get the right result:

import requests
from bs4 import BeautifulSoup

page_number = 1


while page_number != 6:

    url = f'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any' \
        '&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any' \
        '&category_730_Type%5B%5D=tag_CSGO_Type_Pistol&category_730_Type%5B%5D=tag_CSGO_Type_SMG&category_730_Type%5B' \
        '%5D=tag_CSGO_Type_Rifle&category_730_Type%5B%5D=tag_CSGO_Type_Shotgun&category_730_Type%5B%5D' \
        '=tag_CSGO_Type_SniperRifle&category_730_Type%5B%5D=tag_CSGO_Type_Machinegun&category_730_Type%5B%5D' \
        f'=tag_CSGO_Type_Knife&category_730_Type%5B%5D=tag_Type_Hands&appid=730#p{page_number}_name_asc'

    req = requests.get(url)  # pulls the html file of the current url

    # creates a soup of the html file
    soup = BeautifulSoup(req.content, 'html.parser')

    the_list = set()  # creates a dictionary to use later

    # finds where the hrefs are located
    for link in soup.find_all('a', href=True, class_="market_listing_row_link"):
        the_list.add(link['href'])   # adds the href links to the dictionary

    # coverts the dictionary to string, so it can be writen in a txt file
    the_list = str(the_list)

    with open('List of names.txt', 'a') as file:    # opens the txt file
        file.write(the_list)   # writes the list of hrefs in the txt file

    print(page_number)
    page_number += 1   # increases the page number by one
    print(url)    # prints the URL for debugging

print('done')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1