'Retrieving table values from HTML with the same tag names using Beautiful Soup in Python

I am trying to retrieve all the td text for the below table using Beautiful Soup, unfortunately the tag names are the same and I am either only able to retrieve the first element or some elements are repeatedly printing. Hence not really sure of how to go about it.

Below is HTML table snippet:

<div>Table</div>
<table class="Auto" width="100%">
    <tr>
       <td class="Auto_head">Address</td>
       <td class="Auto_head">Name</td>
       <td class="Auto_head">Type</td>
       <td class="Auto_head">Value IN</td>
       <td class="Auto_head">AUTO Statement</td>
       <td class="Auto_head">Value OUT</td>
       <td class="Auto_head">RESULT</td>
       <td class="Auto_head"></td>
    </tr>
    <tr>
           <td class="Auto_body">1</td>
           <td class="Auto_body">abc</td>
           <td class="Auto_body">yes</td>
           <td class="Auto_body">abc123</td>
           <td class="Auto_body">jar</td>
           <td class="Auto_body">123abc</td>
           <td class="Auto_body">PASS</td>
           <td class="Auto_body">na</td>
    </tr>

What I want is all the text content inside these tags for example the first auto_head corresponds to first auto_body i.e. Address = 1 similarly all the values should be retrieved.

I have used find,findall,findNext and next_sibling but no luck. Here is my current code in python:

self.table = self.soup_file.findAll(class_="Table")
self.headers = [tab.find(class_="Auto_head").findNext('td',class_="Auto_head").contents[0] for tab in self.table]
self.data = [data.find(class_="Auto_body").findNext('td').contents[0] for data in self.table]


Solution 1:[1]

Get the headers first, then use zip(...) to combine

from bs4 import BeautifulSoup

data = '''\
<table class="Auto" width="100%">
    <tr>
       <td class="Auto_head">Address</td>
       <td class="Auto_head">Name</td>
       <td class="Auto_head">Type</td>
    </tr>
    <tr>
           <td class="Auto_body">1</td>
           <td class="Auto_body">abc</td>
           <td class="Auto_body">yes</td>
    </tr>
    <tr>
           <td class="Auto_body">2</td>
           <td class="Auto_body">def</td>
           <td class="Auto_body">no</td>
    </tr>
    <tr>
           <td class="Auto_body">3</td>
           <td class="Auto_body">ghi</td>
           <td class="Auto_body">maybe</td>
    </tr>
</table>
'''

soup = BeautifulSoup(data, 'html.parser')

for table in soup.select('table.Auto'):
    # get rows
    rows = table.select('tr')
    # get headers
    headers = [td.text for td in rows[0].select('td.Auto_head')]
    # get details
    for row in rows[1:]:
        values = [td.text for td in row.select('td.Auto_body')]
        print(dict(zip(headers, values)))

My output:

{'Address': '1', 'Name': 'abc', 'Type': 'yes'}
{'Address': '2', 'Name': 'def', 'Type': 'no'}
{'Address': '3', 'Name': 'ghi', 'Type': 'maybe'}

Solution 2:[2]

Get each category first then iterate using zip

s = '''<div>Table</div>
<table class="Auto" width="100%">
    <tr>
       <td class="Auto_head">Address</td>
       <td class="Auto_head">Name</td>
       <td class="Auto_head">Type</td>
       <td class="Auto_head">Value IN</td>
       <td class="Auto_head">AUTO Statement</td>
       <td class="Auto_head">Value OUT</td>
       <td class="Auto_head">RESULT</td>
       <td class="Auto_head"></td>
    </tr>
    <tr>
           <td class="Auto_body">1</td>
           <td class="Auto_body">abc</td>
           <td class="Auto_body">yes</td>
           <td class="Auto_body">abc123</td>
           <td class="Auto_body">jar</td>
           <td class="Auto_body">123abc</td>
           <td class="Auto_body">PASS</td>
           <td class="Auto_body">na</td>
    </tr></table>'''

soup = BeautifulSoup(s,features='html')
head = soup.find_all(name='td',class_='Auto_head')
body = soup.find_all(name='td',class_='Auto_body')
for one,two in zip(head,body):
    print(f'{one.text}={two.text}')

Address=1
Name=abc
Type=yes
Value IN=abc123
AUTO Statement=jar
Value OUT=123abc
RESULT=PASS
=na

Searching by CSS class

Solution 3:[3]

The easiest solution is to add the find_all method at the end of the find so your code will be

source = requests.get('YOUR URL')
soup=BeautifulSoup(source.text,'html.parser')

data = soup.find('tr').find_all('td')[0]
data = soup.find('tr').find_all('td')[1]

and so on just change the last list number 0,1,2... or else use for loop for the same

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Justin Ezequiel
Solution 2 wwii
Solution 3 Milan