'Scrape and change data in date in BeautifulSoup

I am scraping data from different web pages and there are several dates in this data. The code allowing me to have the information that I want looks like this, I only put here the part concerning the dates.

data = []
for url in urlsjugement:
    soup = BeautifulSoup(
        requests.get(url, headers=headers).content, "html.parser"
    )
    title = soup.select_one("#identite_deno").get_text(strip=True)
    
    try:
        active = soup.select_one('td:-soup-contains("Jugement") + td').get_text(
        strip=True)
    except:
        active = "In activity"
    
    date = soup.select_one('td:-soup-contains("Date création entreprise") + td').get_text(
            strip=True)

    data.append([title, active, date])

df = pd.DataFrame(
    data,
    columns=["Title", "Active", "Date"],
)

print(df.to_markdown())

I would like first of all to separate the judgment and the date of judgment into two different data and to be able to compare the two dates. There is a business creation date and a closing date, so I would like to have the lifespan of the businesses, is that possible?


    | Title                       | Active                                | Date       |
|---:|:----------------------------|:--------------------------------------|:-----------|
|  0 | 1804 TRANSPORT              | Liquidation judiciaire le 07-01-2022- | 28-01-2013 |

I have 2 informations in the column Active and I want separate these. After this I want calculate the time between the two date. Thanks for your help !



Solution 1:[1]

I only tried it with your first url, but inside your for loop, I would make this change:

title = soup.select_one("#identite_deno").text
start = list(soup.select_one('td:-soup-contains("Date création entreprise") + td'))[0].text.strip()
end = list(soup.select_one('td.red').stripped_strings)[0].split('le ')[1]
days = datetime.strptime(end, '%d-%m-%Y')-datetime.strptime(start, '%d-%m-%Y')
data.append([title, start, end,days.days])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jack Fleeting