'How do i get data from infobox?

It is necessary to find an article on Wiki and pull out the name for this level of classification from the table

I have this code:

import requests
from bs4 import BeautifulSoup

def get_infobox(url):
   response = requests.get(url)
   bs = BeautifulSoup(response.text)

   tble = bs.find('table', {'class' :'infobox'})
   result = {}
   row_count = 0
   if table is None:
     pas
   else:
     for tr in table.find_all('tr'):
         if tr.find('th'):
             pass
         else:
             row_count += 1
     if row_count > 1:
         if tr is not None:
           result[tr.find('td').text.stip()] = tr.find('td').text
     return result

print(urol(""))


Solution 1:[1]

Checking if the row has exactly two columns seems to be the easiest way. That works for me:

def get_infobox(url):
    response = requests.get(url)
    bs = BeautifulSoup(response.text)
    table = bs.find('table', {'class': 'infobox'})
    result = {}
    
    if table is None:
        return None
    
    for tr in table.find_all('tr'):
        tds = tr.find_all('td')
        if len(tds) == 2:
            key, value = tds
            result[key.text.strip()] = value.text.strip()
    return result

print(get_infobox("https://en.wikipedia.org/wiki/Cat"))

Result:

{'Kingdom:': 'Animalia', 'Phylum:': 'Chordata', 'Class:': 'Mammalia', 'Order:': 'Carnivora', 'Suborder:': 'Feliformia', 'Family:': 'Felidae', 'Subfamily:': 'Felinae', 'Genus:': 'Felis', 'Species:': 'F.\xa0catus[1]'}

You can clean up results as necessary.

Solution 2:[2]

For Russian page you can do like this:

def get_infobox(url):
    response = requests.get(url)
    bs = BeautifulSoup(response.text, features='lxml')
    return dict(x.getText().split(":") for x in bs.findAll('div', class_='ts-Taxonomy-rang-row'))


print(get_infobox('https://ru.wikipedia.org/wiki/%D0%9A%D0%BE%D1%88%D0%BA%D0%B0'))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yevhen Kuzmovych
Solution 2 Sergey K