'Coursera URL web scraping
I have python code which can scrape coursera course details like course_title, ratings, number of students etc, But I want the course link as well. can someone help me for how to get each course URL from coursera.
Solution 1:[1]
I had a look at coursera.org and have find out the solution to scrape courses' URL too.
Here is what you want to do:
- Scrape all
aelements with attributedata-click-key=search.search.click.search_card. - Make a list of
hrefof each element from the elements list.
Here is the code:
#Assume that you searched for python courses
base = "https://www.coursera.org"
titles = soup.find_all("h2", class_="card-title")
urls = soup.find_all("a", attrs={"data-click-key": "search.search.click.search_card"})
#Incase you need a list of URLs
url_list = [i['href'] for i in urls]
for title, url in zip(titles, urls):
print(title.text + ": " + base + url['href'])
Output:
Python for Everybody: https://www.coursera.org/specializations/python
Python 3 Programming: https://www.coursera.org/specializations/python-3-programming
IBM Data Science: https://www.coursera.org/professional-certificates/ibm-data-science
Google IT Automation with Python: https://www.coursera.org/professional-certificates/google-it-automation
Applied Data Science with Python: https://www.coursera.org/specializations/data-science-python
Programming for Everybody (Getting Started with Python): https://www.coursera.org/learn/python
Crash Course on Python: https://www.coursera.org/learn/python-crash-course
Python for Data Science and AI: https://www.coursera.org/learn/python-for-applied-data-science-ai
Introducción a la programación en Python I: Aprendiendo a programar con Python: https://www.coursera.org/learn/aprendiendo-programar-python
Python Basics: https://www.coursera.org/learn/python-basics
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Just for fun |
