'EDI to XML Huge file conversions

I am converting an EDI file to XML. However my input file which happens to also be in BIF is approximately 100Mb is giving me a JAVA out of memory error.

I tried to consult Smook's Documentation for the huge file conversion, however it is a conversion from XML to EDI.

Below is the response I am getting when running my main

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
        at java.lang.StringBuffer.append(StringBuffer.java:367)
        at java.io.StringWriter.write(StringWriter.java:94)
        at java.io.Writer.write(Writer.java:127)
        at freemarker.core.TextBlock.accept(TextBlock.java:56)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.MixedContent.accept(MixedContent.java:57)
        at freemarker.core.Environment.visitByHiddingParent(Environment.java:278)
        at freemarker.core.IteratorBlock$Context.runLoop(IteratorBlock.java:157)
        at freemarker.core.Environment.visitIteratorBlock(Environment.java:501)
        at freemarker.core.IteratorBlock.accept(IteratorBlock.java:67)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.Macro$Context.runMacro(Macro.java:173)
        at freemarker.core.Environment.visit(Environment.java:686)
        at freemarker.core.UnifiedCall.accept(UnifiedCall.java:80)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.MixedContent.accept(MixedContent.java:57)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.Environment.process(Environment.java:235)
        at freemarker.template.Template.process(Template.java:262)
        at org.milyn.util.FreeMarkerTemplate.apply(FreeMarkerTemplate.java:92)
        at org.milyn.util.FreeMarkerTemplate.apply(FreeMarkerTemplate.java:86)
        at org.milyn.event.report.HtmlReportGenerator.applyTemplate(HtmlReportGenerator.java:76)
        at org.milyn.event.report.AbstractReportGenerator.processFinishEvent(AbstractReportGenerator.java:197)
        at org.milyn.event.report.AbstractReportGenerator.processLifecycleEvent(AbstractReportGenerator.java:157)
        at org.milyn.event.report.AbstractReportGenerator.onEvent(AbstractReportGenerator.java:92)
        at org.milyn.Smooks._filter(Smooks.java:558)
        at org.milyn.Smooks.filterSource(Smooks.java:482)
        at com.***.xfunctional.EdiToXml.runSmooksTransform(EdiToXml.java:40)
        at com.***.xfunctional.EdiToXml.main(EdiToXml.java:57)

import java.io.*;
import java.util.Arrays;
import java.util.Locale;
import javax.xml.transform.stream.StreamSource;
import org.milyn.Smooks;
import org.milyn.SmooksException;
import org.milyn.container.ExecutionContext;
import org.milyn.event.report.HtmlReportGenerator;
import org.milyn.io.StreamUtils;
import org.milyn.payload.StringResult;
import org.milyn.payload.SystemOutResult;
import org.xml.sax.SAXException;

public class EdiToXml {

  private static byte[] messageIn = readInputMessage();

  protected static String runSmooksTransform() throws IOException, SAXException, SmooksException {

    Locale defaultLocale = Locale.getDefault();
    Locale.setDefault(new Locale("en", "EN"));

    // Instantiate Smooks with the config...
    Smooks smooks = new Smooks("smooks-config.xml");
    try {
      // Create an exec context - no profiles....
      ExecutionContext executionContext = smooks.createExecutionContext();

      StringResult result = new StringResult();

      // Configure the execution context to generate a report...
      executionContext.setEventListener(new HtmlReportGenerator("target/report/report.html"));

      // Filter the input message to the outputWriter, using the execution context...
      smooks.filterSource(executionContext, new StreamSource(new ByteArrayInputStream(messageIn)),result);

      Locale.setDefault(defaultLocale);

      return result.getResult();
    } finally {
      smooks.close();
    }
  }

  public static void main(String[] args) throws IOException, SAXException, SmooksException {
    System.out.println("\n\n==============Message In==============");
    System.out.println("======================================\n");

    pause(
        "The EDI input stream can be seen above.  Press 'enter' to see this stream transformed into XML...");

    String messageOut = EdiToXml.runSmooksTransform();

    System.out.println("==============Message Out=============");
    System.out.println(messageOut);
    System.out.println("======================================\n\n");

    pause("And that's it!  Press 'enter' to finish...");
  }

  private static byte[] readInputMessage() {
    try {
      InputStream input = new BufferedInputStream(new FileInputStream("/home/****/Downloads/BifInputFile.DATA"));
      return StreamUtils.readStream(input);
    } catch (IOException e) {
      e.printStackTrace();
      return "<no-message/>".getBytes();
    }
  }

  private static void pause(String message) {
    try {
      BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
      System.out.print("> " + message);
      in.readLine();
    } catch (IOException e) {
    }
    System.out.println("\n");
  }

}

<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:edi="http://www.milyn.org/xsd/smooks/edi-1.4.xsd">
  <!--
     Configure the EDI Reader to parse the message stream into a stream of SAX events.
     -->
  <edi:reader mappingModel="edi-to-xml-bif-mapping.xml" validate="false"/>
</smooks-resource-list>

I edited this line in the code to reflect the usage of a stream :-

smooks.filterSource(executionContext, new StreamSource(new FileInputStream("/home/***/Downloads/sample-text-file.txt")), result);

However I now have this below as error. Anybody any guess what is the best approach ?

Exception in thread "main" org.milyn.SmooksException: Failed to filter source.
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:97)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:64)
    at org.milyn.Smooks._filter(Smooks.java:526)
    at org.milyn.Smooks.filterSource(Smooks.java:482)
    at ****.EdiToXml.runSmooksTransform(EdiToXml.java:41)
    at com.***.***.EdiToXml.main(EdiToXml.java:58)
Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [EDIFACT-BIF-TO-XML][1.0].  Must be a minimum of 1 instances of segment [UNH].  Currently at segment number 1.
    at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:504)
    at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:453)
    at org.milyn.edisax.EDIParser.parse(EDIParser.java:428)
    at org.milyn.edisax.EDIParser.parse(EDIParser.java:386)
    at org.milyn.smooks.edi.EDIReader.parse(EDIReader.java:111)
    at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:76)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
    ... 5 more


Solution 1:[1]

The message was valid and the xml mapping was good. I was just not using the optimal method for message reading and writing.

I came to realize the filterSource method of Smooks can directly be fed with an InputStream & OutputStream as variables. Kindly find below the piece of code that led to an efficient running of the program without going through JAVA memory error.

//Instantiate a FileInputStream
FileInputStream inputStream = new FileInputStream(inputFileName);

//Instantiate an FileOutputStream
FileOutputStream outputStream = new FileOutputStream(outputFileName);


try {    

  // Filter the input message to the outputWriter...
  smooks.filterSource(new StreamSource(inputStream), new StreamResult(outputStream));

  Locale.setDefault(defaultLocale);

} finally {
  smooks.close();
  inputStream.close();
  outputStream.close();
}

Thanks to the community.

Regards.

Solution 2:[2]

I'm the original author of Smooks and that Edifact parsing stuff. Jason emailed me asking for advice on this but I haven't been involved in it for a number of years now, so not sure how helpful I’d be.

Smooks doesn’t read the full message into memory. It streams it though a parser that converts it to a stream of SAX events, making it “look like” XML to anything downstream of it. If those events are then used to build a big Java object model in men then that might result in OOM errors etc.

Looking at the Exception message, it simply looks like the EDIFACT input doesn’t match the definition file being used.

Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [EDIFACT-BIF-TO-XML][1.0].  Must be a minimum of 1 instances of segment [UNH].  Currently at segment number 1.

Those EDIFACT definition files were originally generated directly from the definitions published by the EDIFACT group, but I do remember that many people “tweak” the message formats, which seems like what might be happening here (and hence the above error). One solution to that would be to tweak the pre-generated definitions to match.

I know that a lot of changes have been made in Smooks in this area in the last year or two (using Apache Daffodil for the definitions) but I wouldn’t be the best person to talk about that. You can try the Smooks mailing list for help on that.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jason Mootoo
Solution 2 Tom Fennelly