'Check how many links I have in each page and put the count at a dataframe column
I'm doing this project to scrape how many links a series of webpages have.
My ideia is to add the count of the links for each page in a column of a Pandas dataframe. The ideia is to have something like this:
title count links
0 page1 2
1 page2 3
2 page3 0
I did this code:
links_bs4 = ['page1', 'page2']
article_title = []
links = []
for item in links_bs4:
page = requests.get(item)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find('title')
article_title.append(title.string)
body_text = soup.find('div', class_='article-body')
for link in body_text.find_all('a'):
links.append((link.get('href')))
count_of_links = len(links)
s1 = pd.Series(article_title, name='title')
s2 = pd.Series(count_of_links, name='count links')
df = pd.concat([s1, s2], axis=1)
It partly works. The count_of_links = len(links) generates a count of all links of all pages combined.
I wish the count for each page, not the total as is happening now. How can I do this? My for loop is adding the count for the whole list. I should create a new list for each URL I scrape? Or use another thing in Python?
I'm clearly missing some part of the logic.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
