'Unexpected indexing of tables in python-docx

I have a task to colect data from tables inside a very large number of *.docx files. I'm using python "docx" module to do this. I have written the script which works for 95% of all tables in all docuemnts. However I'm strugling with one paticular table in 5% of the remaining documents. My script gives very strange results when collecting data from these tables.

What I have noticed is that indexing of cells in these tables is very strange. It goes somewhat diagonally.

To illustrate this here is the script. It should just print the first cell of each row in the table.

from docx import Document as dc
doc_path='u:/Documents/Samples/Sample_document1.docx' doc=dc(doc_path)
tables=doc.tables
 
for i, row in enumerate(tables[0].rows):
    print(i, row.cells[0].text)

It works just fine with "Sample_document1.docx" but breaks and gives unexpected results with "Sample_document2.docx". The links to files are on my google disk are below.

Can you please tell me what is the reason of this effect and how I can by pass it?

https://docs.google.com/document/d/1TzkJB4OlrBy1jIVdf3HdKqkkoBpvIuBB/edit?usp=sharing&rtpof=true&sd=true

https://docs.google.com/document/d/13_3pCFp3sPCn6nNHmkq5j9ClLPhKG5VB/edit?usp=sharing&ouid=115832391196959770902&rtpof=true&sd=true



Solution 1:[1]

Check the output of the code below for first sample and second:

for row in tables[0].rows:
    print(row.cells)

it seems your code breaks when Python is trying to access row.cells[0] for row.cells-tuple with empty elements. You should check if they are empty or not.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 aestet