'How can I plug this section of code into my BeautifulSoup script?

I am new to Python and Beautiful Soup. My project I am working on is a script which scrapes the pages inside of the hyperlinks on this page:

https://bitinfocharts.com/top-100-richest-dogecoin-addresses-2.html

Currently, the script has a filter which will only scrape the pages which have a "Last Out" date which is past a certain date.

I am trying to add an additional filter to the script, which does the following:

  1. Scrape the "Profit from price change:" section on the page inside hyperlink (Example page: https://bitinfocharts.com/dogecoin/address/D8WhgsmFUkf4imvsrwYjdhXL45LPz3bS1S

  2. Convert the profit into a float

  3. Compare the profit to a variable called "goal" which has a float assigned to it.

  4. If the profit is greater or equal to goal, then scrape the contents of the page. If the profit is NOT greater or equal to the goal, do not scrape the webpage, and continue the script.

Here is the snippet of code I am using to try and do this:

#Get the profit

sections = soup.find_all(class_='table-striped')

for section in sections:
    oldprofit = section.find_all('td')[11].text
    removetext = oldprofit.replace('USD', '')
    removetext = removetext.replace(' ', '')
    removetext = removetext.replace(',', '')
    profit = float(removetext)

# Compare profit to goal

goal = float(50000)

if profit >= goal

Basically, what I am trying to do is run an if statement on a value on the webpage, and if the statement is true, then scrape the webpage. If the if statement is false, then do not scrape the page and continue the code.

Here is the entire script that I am trying to plug this into:

import csv
import requests
from bs4 import BeautifulSoup as bs
from datetime import datetime

headers = []
datarows = []
# define 1-1-2020 as a datetime object
after_date = datetime(2020, 1, 1)

with requests.Session() as s:
    s.headers = {"User-Agent": "Safari/537.36"}
    r = s.get('https://bitinfocharts.com/top-100-richest-dogecoin-addresses-2.html')
    soup = bs(r.content, 'lxml')

    # select all tr elements (minus the first one, which is the header)
    table_elements = soup.select('tr')[1:]
    address_links = []
    for element in table_elements:
        children = element.contents  # get children of table element
        url = children[1].a['href']
        last_out_str = children[8].text
        # check to make sure the date field isn't empty
        if last_out_str != "":
            # load date into datetime object for comparison (second part is defining the layout of the date as years-months-days hour:minute:second timezone)
            last_out = datetime.strptime(last_out_str, "%Y-%m-%d %H:%M:%S %Z")
            # if check to see if the date is after 2020/1/1
            if last_out > after_date:
                address_links.append(url)

    for url in address_links:

        r = s.get(url)
        soup = bs(r.content, 'lxml')
        table = soup.find(id="table_maina")

        #Get the Doge Address for the filename

        item = soup.find('h1').text
        newitem = item.replace('Dogecoin', '')
        finalitem = newitem.replace('Address', '')


        #Get the profit

        sections = soup.find_all(class_='table-striped')

        for section in sections:
            oldprofit = section.find_all('td')[11].text
            removetext = oldprofit.replace('USD', '')
            removetext = removetext.replace(' ', '')
            removetext = removetext.replace(',', '')
            profit = float(removetext)

        # Compare profit to goal

        goal = float(50000)

        if profit >= goal

        if table:

                    for row in table.find_all('tr'):
                        heads = row.find_all('th')
                        if heads:
                            headers = [th.text for th in heads]
                        else:
                            datarows.append([td.text for td in row.find_all('td')])

                    fcsv = csv.writer(open(f'{finalitem}.csv', 'w', newline=''))
                    fcsv.writerow(headers)
                    fcsv.writerows(datarows)

I am familiar with if statements however I this unsure how to plug this into the existing code and have it accomplish what I am trying to do. If anyone has any advice I would greatly appreciate it. Thank you.



Solution 1:[1]

From my understanding, it seems like all that you are asking is how to have the script continue if it fails that criteria in which case you need to just do

if profit < goal:
    continue

Though the for loop in your snippet is only using the final value of profit, if there are other profit values that you need to look at those values are not being evaluated.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrew Ryan