'efficiently change structure of JSON data

I have a json type of file containing the following content:

{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
...

I want to parse that mentioned content and format it as a valid JSON, in particular in the following structure:

{
 "entries":[
  {"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
  {"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
  {"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
  {"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
  ....
]} 

And then FileWrite that into a file with JAVA.

How to achieve that in GSon (primarly) or other libraries in an efficient way (accounting for large sized input files)?

What i have tried is the following to convert the structure:

    ....
    File jsonFile= new File("pathToJSONFile");
    FileReader fileReader
                = new FileReader(  
                jsonFile);
        // Convert fileReader to
        // bufferedReader
        BufferedReader buffReader
                = new BufferedReader(
                fileReader);
        String textToAppend = null;
        String line;
        textToAppend = '{' + "\"entries\":" + '[' ;
        line = buffReader.readLine();
        textToAppend += line;

        while ((line = buffReader.readLine()) != null ){
           textToAppend += ',';
           textToAppend += line;
        }

        textToAppend += ']';
        textToAppend += '}';
        // then FileWrite textToAppend to the output file.

But my solution is not efficient in time for large JSON input files.



Solution 1:[1]

I don't know how far you'll get trying to parse invalid JSON using a JSON library. Even if you get it to parse by utilizing the GsonBuilder.setLenient() method, you'll still end up with a data structure that doesn't match what you want.


Alternatively, you can make your current text manipulation approach more efficient by incrementally writing your output file as you process your input file instead of accumulating all the data to be written out.

Something like this:

import java.io.File;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;

public class FixJson {
    public static void main (String[] args) {
        if (args.length != 2) {
            System.err.println("Usage java FixJson <input-file> <output-file");
            System.exit(1);
        }

        File inputFile = new File(args[0]);
        File outputFile = new File(args[1]);

        try (
            BufferedReader input = Files.newBufferedReader(inputFile.toPath());
            BufferedWriter output = Files.newBufferedWriter(outputFile.toPath());
        ) {
            output.write("{\n \"entries\":[\n");

            String prevLine;
            String line;

            prevLine = input.readLine();
            while ((line = input.readLine()) != null) {
                prevLine = "   " + prevLine + ",\n";
                output.write(prevLine);
                prevLine = line;
            }
            prevLine = "   " + prevLine + "\n";
            output.write(prevLine);

            output.write(" ]\n}\n");
        } catch (IOException ex) {
            System.err.println("file IO error: " + ex);
            System.exit(1);
        }
    }
}

In use, it looks like:

small input

$ /usr/bin/time -f "%E time elapsed, %M kB max resident memory" java FixJson input-small.txt output-small.txt
0:00.24 time elapsed, 31132 kB max resident memory

$ cat input-small.txt
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}

$ cat output-small.txt
{
 "entries":[
   {"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
 ]
}

$ ls -lh *-small.txt
-rw-r--r-- 1 chuckx chuckx 300 Apr 25 19:01 input-small.txt
-rw-r--r-- 1 chuckx chuckx 335 Apr 25 23:31 output-small.txt

large input (just the small input repeated to create a 444000 line file)

$ /usr/bin/time -f "%E time elapsed, %M kB max resident memory" java FixJson input-large.txt output-large.txt
0:01.05 time elapsed, 130148 kB max resident memory

$ head -5 input-large.txt ; echo ... ; tail -5 input-large.txt
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
...
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}

$ head -5 output-large.txt ; echo ... ; tail -5 output-large.txt
{
 "entries":[
   {"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
...
   {"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
 ]
}

$ ls -lh *-large.txt
-rw-r--r-- 1 chuckx chuckx 32M Apr 25 22:12 input-large.txt
-rw-r--r-- 1 chuckx chuckx 34M Apr 25 23:31 output-large.txt

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 chuckx