'OSU, Download links open beatmap page instead of downloading the beatmap file
Noticed that the beatmap packages that are available officially in OSU have 98% songs I don't care for to play. Same with the unofficial mega packs you can find that have 20gigs of songs on a per year basis 2011,2012,2013,2013,etc..
I did find that the "most favourites" page in osu: https://osu.ppy.sh/beatmapsets?sort=favourites_desc have a good chunk of songs that I like or would play. So I tried to create a python script which would click the download button on every beatmap panel. I learned alot during this process-->"Actions move_to_element (hover menu), Wait.until_clickable, Stale Element Exceptions, Scroll Page execute script(s).
Kept having a hard time with elements disappearing from Page/DOM to make a "for element in elements" work properly I decided to have it scroll multiple times to load more beatmaps and than scrape for HREF links with the word "Download" in it and this worked great for capturing "most" of the links. Atleast captured over 3000 unique links.
I put it in a text file and it looks like this:
...
https://osu.ppy.sh/beatmapsets/1457867/download
https://osu.ppy.sh/beatmapsets/881996/download
https://osu.ppy.sh/beatmapsets/779173/download
https://osu.ppy.sh/beatmapsets/10112/download
https://osu.ppy.sh/beatmapsets/996628/download
https://osu.ppy.sh/beatmapsets/415886/download
https://osu.ppy.sh/beatmapsets/490662/download
...
The "Download" button on each panel all have this HREF link. If you click the button you download the beatmap file which is a .osz filetype. However, if you "right-click -> copy-link" from the "Download" button and you open it from a new-page or new-tab it will re-direct to the beatmaps page and not download the file.
I make it work by using the Pandas module to read a .xlxs excel file for URLs and loop for each url. Once the url page is opened it clicks the Download button:
def read_excel():
import pandas as pd
df = pd.read_excel('book.xlsx') # Get all the urls from the excel
mylist = df['urls'].tolist() #urls is the column name
print(mylist) # will print all the urls
# now loop through each url & perform actions.
for url in mylist:
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_argument("user-data- dir=C:\\Users\\%UserName%\\AppData\\Local\\Google\\Chrome\\User Data\\Profile1")
driver = webdriver.Chrome(executable_path=driver_path, chrome_options=options)
driver.get(url)
try:
WebDriverWait(driver, 3).until(EC.alert_is_present(),'Timed out waiting for alert.')
alert = driver.switch_to.alert
alert.accept()
print("alert accepted")
except TimeoutException:
print("no alert")
time.sleep(1)
wait = WebDriverWait(driver, 10)
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "body > div.osu-layout__section.osu-layout__section--full.js-content.beatmaps_show > div > div > div:nth-child(2) > div.beatmapset-header > div > div.beatmapset-header__box.beatmapset-header__box--main > div.beatmapset-header__buttons > a:nth-child(2) > span"))).click()
time.sleep(1)
except Exception:
print("Can't find the Element Download")
time.sleep(10)
download_file()
driver.close()
This a sequence "one at a time" function, the download_file() function is a loop which checks the download folder to see if there's a file being downloaded, if not it goes to the next url. This works. Ofcourse the website as limitations. Can only download max 8 at a time and after a 100 to 200 downloads you can't download anymore and you have to wait a bit. but the loop keeps going and tries each URL unless you stop the script. Luckily you can see the last beatmap that was downloaded and reference it to where it is in the Excel spreadsheet and remove the rows above and start the script again. I'm sure I can code it so it stops the loop when there's no new file that pops up in the Download folder.
Finally the question: Is there a way so it opens these download links and downloads the file without having to click the "Download Button" after opening the page? It redirects to the beatmap page instead of downloading the file automatically. Must be some java/html data I don't know about.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
