'Apache POI word table giving OOM exception on large set of data

I am using apache poi to generate Word from jdbc query . It works fine for 4000 records but once we inserted 17000 records in the table, I started getting Java Out of Memory space . I am getting all the data in one shot as shown below , is there a way to insert rows in word table in batches or any other optimized way which I am missing.

This is my driver code which gets in table name and HttpServlet and passes it to class which implements ResultSetExtractor.

public void generateWord(String tableName, HttpServletResponse response, String[] filter) throws DataAccessException, IOException {
        //Table validation logic
        String sql = "select * from table";
        mainJdbcTemplate.query(sql, paramMap,
                new StreamingWordResultSetExtractor(response.getOutputStream(), tableName));
    }

Custom class which implements Result Set extractor . I get in all the result set data and use poi to generate Word.

public class StreamingWordResultSetExtractor implements ResultSetExtractor<Void> {

    private final OutputStream os;
    private String tableName;

    /**
     * @param os the OutputStream to stream the Excel to
     */
    public StreamingWordResultSetExtractor(final OutputStream os, final String tableName) {
        this.os = os;
        this.tableName = tableName;
    }

    // Set Custom Font Size
    static void setFontSize(XWPFTableCell cell, int size) {
        for (XWPFParagraph paragraph : cell.getParagraphs()) {
            for (XWPFRun run : paragraph.getRuns()) {
                run.setFontSize(size);
            }
        }
    }

    // Set table cell as link
    static void setLink(XWPFTableCell cell, String text, String link) {
        XWPFParagraph paragraph = cell.addParagraph();
        XWPFRun run = paragraph.createHyperlinkRun(link);
        run.setText(text);
        run.setUnderline(UnderlinePatterns.SINGLE);
        run.setColor("0000FF");

    }

    // Set table style
    static XWPFStyle createTableStyle(XWPFStyles styles, String styleId) {
        //omitting code
        return style;
    }

    @Override
    public Void extractData(ResultSet rs) throws SQLException, DataAccessException {

        ResultSetMetaData rsmd = rs.getMetaData();
        int columnsNumber = rsmd.getColumnCount();

        log.info("Generating Word");
        XWPFDocument document = new XWPFDocument();

        // Set page orientation
        CTDocument1 ctDocument = document.getDocument();
        CTBody ctBody = ctDocument.getBody();
        CTSectPr ctSectPr = (ctBody.isSetSectPr()) ? ctBody.getSectPr() : ctBody.addNewSectPr();
        CTPageSz ctPageSz = (ctSectPr.isSetPgSz()) ? ctSectPr.getPgSz() : ctSectPr.addNewPgSz();
        ctPageSz.setOrient(STPageOrientation.LANDSCAPE);

        // set paper size A3
        ctPageSz.setW(java.math.BigInteger.valueOf(Math.round(23.4 * 1440))); // 16.5 inches
        ctPageSz.setH(java.math.BigInteger.valueOf(Math.round(16.5 * 1440))); // 11.7 inches

        XWPFParagraph paragraph = document.createParagraph();
        XWPFRun run = paragraph.createRun();
        run.setText(tableName);

        // Creating Table with 1 row and as many columns as in the result set
        XWPFTable table = document.createTable(1, columnsNumber);

        // Get header Row
        XWPFTableRow header = table.getRow(0);
        // Set header columns
        for (int col = 0; col < columnsNumber; col++) {
            header.getCell(col).setText(rsmd.getColumnLabel(col + 1));
            // header.getCell(col).getCTTc().addNewTcPr().addNewShd().setFill("0394fc");
            setFontSize(header.getCell(col), 8);

        }
        // Repeat header on every page
        header.setRepeatHeader(true);
        // Set data rows
        while (rs.next()) {
            XWPFTableRow desRow = table.createRow();
            for (int col = 0; col < columnsNumber; col++) {
                final var value = rs.getObject(col + 1);
                String v = value == null ? "" : value.toString();

                if (v.startsWith("<a")) {
                    Document doc = Jsoup.parse(v);
                    Elements rows = doc.getElementsByTag("a");
                    String text = "";
                    for (Element element : rows) {
                        text = element.text();
                        String href = element.attr("href").trim();
                        String encURL = "";
                        encURL = UriUtils.encodePath(href, StandardCharsets.UTF_8);
                        setLink(desRow.getCell(col), text, encURL);
                        setFontSize(desRow.getCell(col), 7);
                    }

                } else {
                    desRow.getCell(col).setText(v);
                    setFontSize(desRow.getCell(col), 7);
                }
            }
        }

        table.removeBorders();
        XWPFStyles styles = document.createStyles();
        XWPFStyle style;
        style = createTableStyle(styles, "ListTableStyle");
        table.setStyleID(style.getStyleId());
        try {
            document.write(os);
            document.close();
            log.info("Word generation complete");
        } catch (IOException e) {
            log.error("Error occurred: {0}", e);
        }
        return null;
    }

}


Solution 1:[1]

First make sure that ResultSet rs fetches the data lazily (depending on your configuration the whole query will be in memory and/or will not be freed when doing rs.next). Read more about this jdbc behavior.

With this, you can be sure that your jdbc query consumes very little memory.

On the other hand, the XWPFDocument class store all intermediate data in memory:

public class XWPFDocument extends POIXMLDocument implements Document, IBody {
    private static final Logger LOG = LogManager.getLogger(XWPFDocument.class);

    protected List<XWPFFooter> footers = new ArrayList<>();
    protected List<XWPFHeader> headers = new ArrayList<>();
    protected List<XWPFHyperlink> hyperlinks = new ArrayList<>();
    ...

then, you cannot use that class to generate your file in a streaming fashion way.

Now your processing limit is the memory used by this class.

If it consumes a lot of memory (which is likely), a dirty way is generate multiple output files and join these using some other utility or (since they are xml) join ciertain xml elements directly.

If you can generate an spreadsheet instead doc, you can use the streaming version SXSSFSheet.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1