'AWS Opensearch: Data Nodes vs Instance type cost differences

I'm just having cpu 100% issues on opensearch, and I was thinking about two ways to solve this issue.

The first one should be trying to increase my data nodes in opensearch from 1 to 2, to see how it goes. The second one is to increase the instance type from m5.large to m5.xlarge.

I currently have two questions regarding this.

  1. What should be the cost differences between these? (I know m5.xlarge is double the cost of m5.large, but I don't know the cost of the data node)
  2. What do you think it's the best way to solve Opensearch's cpu and ram running at 100%?


Solution 1:[1]

It depends on what exactly is causing the load.

Start by enabling search and indexing slow logs, look at the thread pools graphs.

Also, look at the Tune for indexing speed and Tune for search speed docs, they have some good insights.

Generally, scaling horizontally (adding more machines into the cluster) would help with concurrent requests while scaling vertically (increasing machine size) - with rare but heavy queries.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ilvar