'Python 3 BeautifulSoup4 Selecting specific <td> tags from each <tr>

I am scraping from an HTML table in this format:

<table>

    <tr>
        <th>Name</th>
        <th>Date</th>
        <th>Number</th>
        <th>Address</th>

    </tr>

    <tr> 1

        <td> Name-1 </td>
        <td> Date-1 </td>
        <td> Number-1 </td>
        <td> Address-1 </td>

    </tr>

    <tr> 2

        <td> Name-2 </td>
        <td> Date-2 </td>
        <td> Number-2 </td>
        <td> Address-2 </td>

    </tr>

</table>

It is the only table on that page. I want to store each TD tag with it's corresponding TH tag info to make a list, then eventually have it saved to a CSV. The actual info isn't saved with a -number, that's just to illustrate. The data has hundreds of table rows all with the same set of data formatted in this way in the table.

Basically, I'd want to make the 'name' be the 1st TD cell in each TR row, the date be the 2nd, and so on.

I can't seem to find a way to do this with Python3 and BeautifulSoup4, I know there's a way, I'm just too new.

Thank you all for your help, I am learning a lot as I go.



Solution 1:[1]

Assuming the data is uniform, the following basic example should work:

table_rows = soup.find_all("tr") #list of all <tr> tags
for row in table_rows:
    cells = row.find_all("td") #list of all <td> tags within a row
    if not cells: #skip rows without td elements
        continue
    name, date, number, address = cells #unpack list of <td> tags into separate variables

Solution 2:[2]

I have a similar issue. The script from sytech is working. Though, for instance, a table with 100 rows, my code will first show row 15 instead of the first row that appears in the html, then display row 16, row 17...row 100, row 1, row 2. using Clive's code above, I would get the following:

[<td> Name-15 </td>, <td> Date-15 </td>,<td> Number-15 </td>, <td> Address-15 </td>] [<td> Name-16 </td>, <td> Date-16 </td>,<td> Number-16 </td>, <td> Address-16 </td>] [<td> Name-16 </td>, <td> Date-16 </td>,<td> Number-16 </td>, <td> Address-16 </td>] etc... [<td> Name-100 </td>, <td> Date-100 </td>,<td> Number-100 </td>, <td> Address-100 </td>] [<td> Name-1 </td>, <td> Date-1 </td>,<td> Number-1 </td>, <td> Address-1 </td>]

Any idea why it wouldn't start with the first row? Apologies if this is formatted badly, and thank you for the help!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 sytech
Solution 2 DKilian