'Apache poi get table from text box

I'm using apache poi for iteration table in docx file. All works fine but if table in text box, my code don't see table - table.size() = 0

enter image description here

XWPFDocument doc = new XWPFDocument(new FileInputStream(fileName));

    List<XWPFTable> table = doc.getTables(); 

    for (XWPFTable xwpfTable : table) { 
        List<XWPFTableRow> row = xwpfTable.getRows();
        for (XWPFTableRow xwpfTableRow : row) { 
            List<XWPFTableCell> cell = xwpfTableRow.getTableCells();
            for (XWPFTableCell xwpfTableCell : cell) {
                if(xwpfTableCell != null){
                 List<XWPFTable> itable = xwpfTableCell.getTables(); 
                    if(itable.size()!=0){ 
                        for (XWPFTable xwpfiTable : itable) { 
                            List<XWPFTableRow> irow = xwpfiTable.getRows(); 
                            for (XWPFTableRow xwpfiTableRow : irow) { 
                                List<XWPFTableCell> icell = xwpfiTableRow.getTableCells(); 
                                for (XWPFTableCell xwpfiTableCell : icell) { 
                                    if(xwpfiTableCell!=null){   
                                    } 
                                } 
                            } 
                        } 
                    } 
                } 
            }
        } 
    }


Solution 1:[1]

Following code is low level parsing a *.docx document and getting all tables in document body of it.

The approach is using a org.apache.xmlbeans.XmlCursor and searching for all w:tbl elements in document.xml. If found add them to a List<CTTbl>.

Because a text box rectangle shape provides fall-back content in the document.xml, we need to skip the mc:Fallback elements. Else we would have the tables within the text boxes twice.

At last we go through the List<CTTbl> and get the contents of all the tables.

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTbl;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTc;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText;

import org.apache.xmlbeans.impl.values.XmlAnyTypeImpl;
import org.apache.xmlbeans.XmlCursor;

import javax.xml.namespace.QName;

import java.util.List;
import java.util.ArrayList;

public class WordReadAllTables {

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("22.docx"));

  CTBody ctbody = document.getDocument().getBody();

  XmlCursor xmlcursor = ctbody.newCursor();

  QName qnameTbl = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "tbl", "w");
  QName qnameFallback = new QName("http://schemas.openxmlformats.org/markup-compatibility/2006", "Fallback", "mc");

  List<CTTbl> allCTTbls = new ArrayList<CTTbl>();

  while (xmlcursor.hasNextToken()) {
   XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
   if (tokentype.isStart()) {
    if (qnameTbl.equals(xmlcursor.getName())) {
     if (xmlcursor.getObject() instanceof CTTbl) {
      allCTTbls.add((CTTbl)xmlcursor.getObject());
     } else if (xmlcursor.getObject() instanceof XmlAnyTypeImpl) {
      allCTTbls.add(CTTbl.Factory.parse(xmlcursor.getObject().newInputStream()));
     }
    } else if (qnameFallback.equals(xmlcursor.getName())) {
     xmlcursor.toEndToken();
    }
   } 
  }

  for (CTTbl cTTbl : allCTTbls) {
   StringBuffer tableHTML = new StringBuffer();
   tableHTML.append("<table>\n");
   for (CTRow cTRow : cTTbl.getTrList()) {
    tableHTML.append(" <tr>\n");
    for (CTTc cTTc : cTRow.getTcList()) {
     tableHTML.append("  <td>");
     for (CTP cTP : cTTc.getPList()) {
      for (CTR cTR : cTP.getRList()) {
       for (CTText cTText : cTR.getTList()) {
        tableHTML.append(cTText.getStringValue());
       }
      }
     }
     tableHTML.append("</td>");
    }
    tableHTML.append("\n </tr>\n");
   }
   tableHTML.append("</table>");

   System.out.println(tableHTML);

  }

  document.close();

 }
}

This code needs the full jar of all of the schemas ooxml-schemas-1.3.jar as mentioned in faq-N10025.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1