'PDFBox - How to change encoding from WinAnsiEncoding to Unicode?

I am trying to find a way I could change the WinAnsiEncoding to Unicode, I've tried setting font like this,

PDDocument doc = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
doc.addPage(page);

File unicodeFileLocation = new File(getServletContext().getRealPath("/lib/ARIALUNI.TTF"));
PDTrueTypeFont unicodeFont = PDTrueTypeFont.loadTTF(doc, unicodeFileLocation);

...

// Create Table using boxable API
BaseTable table = new BaseTable(yStart, yStartNewPage, bottomMargin, tableWidth, margin, doc, page, true, drawContent);
// Title Field
Row<PDPage> titleRow = table.createRow(rowHeight);
Cell<PDPage> cell = titleRow.createCell(30, "Title");
cell = titleRow.createCell(70, TitleText);
cell.setFont(unicodeFont);

table.draw();

For simple Text this works fine, I can see the font change from Helvetica but if the text contains UTF-8 characters (e.g., U+0083 etc), I just see the following exception thrown,

java.lang.IllegalArgumentException: U+0083 is not available in this font's encoding: WinAnsiEncoding org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.encode(PDTrueTypeFont.java:371) org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:316) org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:345) be.quodlibet.boxable.text.PipelineLayer.push(PipelineLayer.java:65) be.quodlibet.boxable.Paragraph.getLines(Paragraph.java:341) be.quodlibet.boxable.Paragraph.getHeight(Paragraph.java:465) be.quodlibet.boxable.Cell.getTextHeight(Cell.java:392) be.quodlibet.boxable.Cell.getCellHeight(Cell.java:367) be.quodlibet.boxable.Row.getHeight(Row.java:166) be.quodlibet.boxable.Table.isEndOfPage(Table.java:728) be.quodlibet.boxable.Table.drawRow(Table.java:224) be.quodlibet.boxable.Table.draw(Table.java:200) com.ssl.pew.controller.ExportPEW.processRequest(ExportPEW.java:498) com.ssl.pew.controller.ExportPEW.doPost(ExportPEW.java:792) javax.servlet.http.HttpServlet.service(HttpServlet.java:648) javax.servlet.http.HttpServlet.service(HttpServlet.java:729) org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

When I try to see the encoding type, it's always WinAnsiEncoding which I do not need.

Encoding encoding = unicodeFont.getEncoding();
String encodingName = encoding.getEncodingName();

This gives me WinAnsiEncoding, is there any way I could change this?

To me, it seems like it's because of WinAnsiEncoding and if somehow I could change that, I might be able to solve this issue.

It seems that mostly people decided to move to iText which is not an option for me.



Solution 1:[1]

The FAQ says:

Font Handling

I’m getting java.lang.IllegalArgumentException: … is not available in this font’s encoding: WinAnsiEncoding

Check whether the character is available in WinAnsiEncoding by looking at the PDF Specification Appendix D. If not, but if it is available in this font (in windows, have a look with charmap.exe), then load the font with PDType0Font.load(), see also in the EmbeddedFonts.java example in the source code download.

It's working for me with, for example,

PDType0Font.load(document, new ClassPathResource("fonts/OpenSans-Regular.ttf").getFile());

Solution 2:[2]

Here try this

PDFont font = PDTrueTypeFont.load(document, new File(fontPath)), WinAnsiEncoding.INSTANCE);

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 kautilya hari