'Jaeger with ElasticSearch: date-time parsing

We have an index in ElasticSearch that receives logs from both FluentD and Jaeger. The date-time column gets messed up, because apparently the two apps use a different format, FluentD uses ISO8601 whereas Jaeger uses Epoch-Millis. As a consequence, we have no logging in Kibana.

In the Helm values file that was used by my colleagues to install the EFK stack, there is a stanza for FluentD, but nothing for Jaeger, which makes sense as the creator of this chart only had FluentD in mind.

We use a dynamic mapping when the index gets created at midnight every 24 hours, and if the first log entry happens to be from FluentD, all is fine. But if the first entry is from Jaeger, we get no logging at all.

My questions are:

  1. Is it supported to have an index with two different sources?
  2. If yes, how can we ensure that ES receives and parses the two date-time formats properly?

Thanks for any clues or pointers.



Solution 1:[1]

Q.1:

Is it supported to have an index with two different sources?

Absolutely, you can send to the same index index_1 data coming from source_a, and source_b.

Whatever source you are using just configure the source to send the data to the target index.

I would also recommend to use the Elastic Common Schema a.k.a ECS. And map the fields from all your sources to the fields in the ECS

Q.2:

How can we ensure that ES receives and parses the two date-time formats properly?

They are many ways to achieve this:

  • Ingest Pipeline
  • Beats
  • Logstash

As you are not mentioning Beats nor Logstash. I expect you will find it more convenient to use ingest pipeline.

  1. Create an ingest pipeline [doc]
  2. Test the ingest pipeline [doc]
  3. Use the ingest pipeline on indexing [doc]

Toy project

It is quite easy to set up actually,

# Create a pipeline
PUT /_ingest/pipeline/71189349
{
  "processors": [
    {
      "date": {
        "field": "datems",
        "formats": ["UNIX"]
      }
    }
  ]
}

# Test the pipeline with some data of yours
POST /_ingest/pipeline/71189349/_simulate
{
  "docs": [{
    "_source":{
      "datems": "1645367903"
    }
  }]
}

For my sample I get this result. Update your pipeline, until you get the proper parsing

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "@timestamp" : "2022-02-20T14:38:23.000Z",
          "datems" : "1645367903"
        },
        "_ingest" : {
          "timestamp" : "2022-02-20T15:07:16.361424083Z"
        }
      }
    }
  ]
}

Once you are satisfied, just use this pipeline at query time.

POST <Your index>/_doc?pipeline=71189349
{
   ...
   <Your data>
   ...
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1