'Elasticsearch application latency investigation

We have an Elasticsearch setup w/ [data, master, client] nodes. Client receives only query traffic, pass query to data nodes, gets the response, sends back to caller (based on my general understanding).

We are seeing that 'took' latency in the query response is around ~16ms, but our application who is measuring latency when calling into client is around ~90ms. Here are some numbers on our setup:

  1. ES Setup: 3 client nodes (60GB/3 cpu/30GB heap each), 3 data nodes (80GB/16 cpu/30GB heap each), 3 master nodes. Its a k8 based helm chart setup.
  2. client/data have enough cpu/mem (based on k8's pod level cpu/mem usages)
  3. QPS - 20 req/sec
  4. Shard size ~ 24GB, 0 replicas. Each shard is on a separate data-node. Indices are using mmapfs/preload "*"
  5. Query types: bool query w/ 3 match clauses and 3 should for boosting on few fields. We have "_source=true".
  6. Our documents are quite bigger with (mean, p90, p99) as (200kb, 400kb, 800kb)
  7. Our response size is of the order of (mean, p99) (164kb, 840kb). We also observed latencies for bigger response sizes is much higher than the baseline.

Can someone comment on following questions:

  1. How can we know more exactly where is this extra latency is introduced? When reading about "took" here, it includes the querying and response forming stages. But something happens after that so that our application measured latency jumps to ~90ms. Where else can I look to look more into this increase? I have access to Prometheus ES dash and K8 pods usages, but all of them look normal and no spikes.
  2. Are there some ES settings we can play with to optimize this latency? We feel its mostly due to bigger response sizes. Can there be some compression introduced in ES to help w/ this?


Solution 1:[1]

Your question is very broad with very less information on your deployment, like types of search queries, index mapping, cluster/nodes/indices specification, and QPS..it's generally very difficult to suggest anything without looking at the system which has performance issues...

Reg, client nodes, yes, they receive the traffic but they also calculate the Global result from the local result set received from each shard involved in the search request. so they do the heavy processing and should have enough capacity otherwise would become bottleneck, even though your data nodes calculate the local result fast, processing at the client node would take more time and overall took time would increase.

You can also see if you have a room to improve some of the suggestions I wrote in this blog post.

Hope this helps.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Amit