'How to convert multiple html's tables into a pandas dataframe?

I have local html files, I've been able to parse html, Get the table and insert into dataframe as following (Working):

from bs4 import BeautifulSoup
import pandas as pd

#parse html
with open("D:\Projects\AST\Analytics.html","r") as file:
    soup=BeautifulSoup(file,"html.parser")

# Creating list with all tables
tables = soup.find_all('table')

#tables into df
df = pd.read_html(str(tables))[0]
df

It have identical 18 columns and 100 rows each file

I tried going through a loop to read/parse then get tables into df but soup = BeautifulSoup(f, 'html.parser') was incorrect TypeError: 'module' object is not callable

import os 
folder = "D:\Projects\AST"
for filename in os.listdir(folder):
    if filename.endswith('.html'):
        fname = os.path.join(folder, filename)
        print('Filename: {}'.format(fname))

        with open (fname, 'r') as f:
            soup = BeautifulSoup(f, 'html.parser')
            tables2 = soup.find_all('table')
df2 = pd.read_html(str(tables2))[0]
df2

Any idea how to fix it? or another approach to get multi html parsed, tables into single df?



Solution 1:[1]

So to do this, you'll want to append the information you want from find_all into a list, and then you can convert the list into a pandas dataframe. Like here is how I've done it:

import requests
from bs4 import BeautifulSoup as _bs
import pandas as _pd

response = requests.get("https://www.upgrad.com/blog/software-development-project-ideas-topics-for-beginners/")
soup =_bs(response.content, "html.parser")
links_table = []
for links in soup.find_all('a'):
    links_table.append(links.text)

dataframe = _pd.DataFrame(links_table)

Dataframe will be your pandas dataframe. To be able to use find_all correctly, you'll want to throw it in a for loop o get all objects, and then append from there. Make sure the list is outside your for loop, and then the list you'll convert into the dataframe.

Hope this helps! -VikingOfValhalla

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 VikingOfValhalla