'How to convert multiple html's tables into a pandas dataframe?
I have local html files, I've been able to parse html, Get the table and insert into dataframe as following (Working):
from bs4 import BeautifulSoup
import pandas as pd
#parse html
with open("D:\Projects\AST\Analytics.html","r") as file:
soup=BeautifulSoup(file,"html.parser")
# Creating list with all tables
tables = soup.find_all('table')
#tables into df
df = pd.read_html(str(tables))[0]
df
It have identical 18 columns and 100 rows each file
I tried going through a loop to read/parse then get tables into df but soup = BeautifulSoup(f, 'html.parser') was incorrect TypeError: 'module' object is not callable
import os
folder = "D:\Projects\AST"
for filename in os.listdir(folder):
if filename.endswith('.html'):
fname = os.path.join(folder, filename)
print('Filename: {}'.format(fname))
with open (fname, 'r') as f:
soup = BeautifulSoup(f, 'html.parser')
tables2 = soup.find_all('table')
df2 = pd.read_html(str(tables2))[0]
df2
Any idea how to fix it? or another approach to get multi html parsed, tables into single df?
Solution 1:[1]
So to do this, you'll want to append the information you want from find_all into a list, and then you can convert the list into a pandas dataframe. Like here is how I've done it:
import requests
from bs4 import BeautifulSoup as _bs
import pandas as _pd
response = requests.get("https://www.upgrad.com/blog/software-development-project-ideas-topics-for-beginners/")
soup =_bs(response.content, "html.parser")
links_table = []
for links in soup.find_all('a'):
links_table.append(links.text)
dataframe = _pd.DataFrame(links_table)
Dataframe will be your pandas dataframe.
To be able to use find_all correctly, you'll want to throw it in a for loop o get all objects, and then append from there. Make sure the list is outside your for loop, and then the list you'll convert into the dataframe.
Hope this helps! -VikingOfValhalla
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | VikingOfValhalla |
