'Read Excel sheet table (Listobject) into python with pandas
There are multiple ways to read excel data into python. Pandas provides aslo an API for writing and reading
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('File.xlsx', sheetname='Sheet1')
That works fine.
BUT: What is the way to access the tables of every sheet directly into a pandas dataframe??
The above picture shows a sheet including a table SEPARATED THAN CELL (1,1).
Moreover the sheet might include several tables (listobjects in VBA).
I can not find anywhere the way to read them into pandas.
Note1: It is not possible to modify the workbook to bring all the tables towards cell(1,1). Note2: I would like to use just pandas (if it is possible) and minimize the need to import other libraries. But it there is no other way I am ready to use other lybray. In any case I could not manage with xlwings for instance.
here it looks like its possible to parse the excel file, but no soilution is provided for tables, just for complete sheets.
The documentation of pandas does not seem to offer that possibility.
Thanks.
Solution 1:[1]
I understand that this question has been marked solved already, but I found an article that provides a much more robust solution: Full Post
I suppose a newer version of this library supports better visibility of the workbook structure. Here is a summary:
- Load the workbook using the
load_workbookfunction fromopenpyxl - Then, you are able to access the sheets within, which contains collection of List-Objects (Tables) in excel.
- Once you gain access to the tables, you are able to get to the range addresses of those tables.
- Finally they loop through the ranges and create a pandas data-frame from it.
This is a nicer solution as it gives us the ability to loop through all the sheets and tables in a workbook.
Solution 2:[2]
Here is a way to parse one table, howver it's need you to know some informations on the seet parsed.
df = pd.read_excel("file.xlsx", usecols="B:I", index_col=3)
print(df)
Not elegant and work only if one table is present inside the sheet, but that a first step:
import pandas as pd
import string
letter = list(string.ascii_uppercase)
df1 = pd.read_excel("file.xlsx")
def get_start_column(df):
for i, column in enumerate(df.columns):
if df[column].first_valid_index():
return letter[i]
def get_last_column(df):
columns = df.columns
len_column = len(columns)
for i, column in enumerate(columns):
if df[column].first_valid_index():
return letter[len_column - i]
def get_first_row(df):
for index, row in df.iterrows():
if not row.isnull().values.all():
return index + 1
def usecols(df):
start = get_start_column(df)
end = get_last_column(df)
return f"{start}:{end}"
df = pd.read_excel("file.xlsx", usecols=usecols(df1), header=get_first_row(df1))
print(df)
Solution 3:[3]
You can use xlwings, great package for working with excel files in python.
This is for a single table, but it is pretty trivial to use xlwings collections (App>books>sheets>tables) to iterate over all tables. Tables are ofcourse listobjects.
import xlwings
import pandas
with xlwings.App() as App:
_ = App.books.open('my.xlsx')
rng = App.books['my.xlsx'].sheets['mysheet'].tables['mytablename'].range
df: pandas.DataFrame = rng.expand().options(pandas.DataFrame).value
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dharman |
| Solution 2 | |
| Solution 3 | Graham Monkman |

