'converting json based log into column format, i.e., one file per column

example for the log file:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}

It will generate 5 files:

  1. timestamp.column
  2. Field1.column
  3. Field_Doc.f1.column
  4. Field_Doc.f2.column
  5. Field_Doc.f3.column

The column file format is as follows:

  • string fields are separated by a new line '\n' character. Assume that no string value has new line characters, so no need to worry about escaping them
  • double, integer & boolean fields are represented as a single value per line
  • null, undefined & empty strings are represented as an empty line

Example content of timestamp.column:

2022-01-14T00:12:21.000
2022-01-18T00:15:51.000

Note: The fields in the log will be dynamic, do not assume that these are the expected properties

can someone tell me how to do this,

the size of the log file is about 4GB to 48GB



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source