'Python scraping links from buttons with event

Link I want to scrape: https://digital.auraria.edu/work/ns/8fb66c05-0ad2-4e56-8cc7-6ced34d0c126 I'm currently having some trouble scrapping the "Download" button on this website to download the pdf file using python and beautiful soup since normally, there's a link and I can just do

    soup = BeautifulSoup(r.content, 'lxml') 
    links = soup.find_all("a")
    for link in links:
           if ('pdf' in link.get('href')): #find if the book pdf link is in there.
               i += 1
               response = requests.get(link.get('href'))
               print(f"Retrieving PDF for: {title}")
               write_pdf(pdf_path, response.content)

However I'm not quite sure what the link for the pdf is in this. I'm wondering if I had to use a headless browser and how would I be able to extract this link? Here is the Image of inspect element of the link below Image of inspecting the webpage button



Solution 1:[1]

The way I found the PDF link is by going to the page and looking at the page source. Then I used the finder tool and searched for PDF and found a meta tag.

<meta name="citation_pdf_url" content="https://dashboard.digital.auraria.edu/downloads/1e0b44c6-cd79-49a3-9eac-0b10d1a4392e"/>

I followed the link and it downloaded a PDF with the same title. In the following code below, you can get the entire tag or the contents using .attrs.get('content') at the end.

Required -> pip install bs4 requests

from bs4 import BeautifulSoup
import requests

url = "https://digital.auraria.edu/work/ns/8fb66c05-0ad2-4e56-8cc7-6ced34d0c126"

req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')

pdf_link = soup.find("meta", attrs={'name': "citation_pdf_url"}).attrs.get('content')


print(pdf_link)

Good luck and let me know if you face any other issues!

Solution 2:[2]

Just scrape the filename and add that name to this link, I got the link by actually downloading the file copying it's download address, removing file name and adding different one to test it, it works like a charm. https://storage.googleapis.com/us-fedora/fcrepo.binary.directory/ea/05/2a/ea052a597af18fb6a46c44254e9e596a7e93571f?GoogleAccessId=k8s-storage%40repositories-307112.iam.gserviceaccount.com&Expires=1643411246&Signature=ICo51cFbe3By7JPJol8nLfxcic%2BV%2Bv1uvGYjodATCXJc2I6XWSi7JWC8l%2BM6BTSVFOL8A0YioZOQggY8Afc0JJtiwInkxFHmVjleQ41he3RK5pwF4IwONeuQxcgUXYzd8p94sA5L0YZC6drAFb9mx4AJLwTdKQt7dZh146FmaQYY8ElGT6BpHX2t%2BK31UGP0pC75uFGUq6b3IDK11gPOCSvnrLGSAM1yulE8togDgZmw0BU77nLPkinXSIATCTjlHNxf5aUxlJkg0%2FtSM21b53JFvHGHHCQf8QSKtST4WCBA1up6BVX1YLbGLZXxQ07mf8K7jnQ4U%2FXfnw6IoTpQxw%3D%3D&response-content-disposition=attachment%3B%20filename%3D

so according to your example the link would look like this https://storage.googleapis.com/us-fedora/fcrepo.binary.directory/ea/05/2a/ea052a597af18fb6a46c44254e9e596a7e93571f?GoogleAccessId=k8s-storage%40repositories-307112.iam.gserviceaccount.com&Expires=1643411246&Signature=ICo51cFbe3By7JPJol8nLfxcic%2BV%2Bv1uvGYjodATCXJc2I6XWSi7JWC8l%2BM6BTSVFOL8A0YioZOQggY8Afc0JJtiwInkxFHmVjleQ41he3RK5pwF4IwONeuQxcgUXYzd8p94sA5L0YZC6drAFb9mx4AJLwTdKQt7dZh146FmaQYY8ElGT6BpHX2t%2BK31UGP0pC75uFGUq6b3IDK11gPOCSvnrLGSAM1yulE8togDgZmw0BU77nLPkinXSIATCTjlHNxf5aUxlJkg0%2FtSM21b53JFvHGHHCQf8QSKtST4WCBA1up6BVX1YLbGLZXxQ07mf8K7jnQ4U%2FXfnw6IoTpQxw%3D%3D&response-content-disposition=attachment%3B%20filename%3DIR00000195_00001.pdf

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 subzero_flow
Solution 2