'Elastic Beanstalk NetworkOut auto scaling if you have only 1 instance

We have an Elastic Beanstalk environment and since many weeks almost every morning at about 8.10 am/8.20 am we get a notification (SNS) that "Environment health has transitioned from Ok to Degraded" and 2 mins later we get another saying that "Environment health has transitioned from Degraded to Ok".

When this happens the running EC2 instance get killed and a new one is started.

I investigated by downloading logs of just terminated instances and by controlling all CloudWatch parameters and the only thing I found is that there's a CloudWatch NetworkOut < 2000000 alarm starting about 10 mins before instance termination.

So I suppose that the problem is that the instance is getting low traffic. I also doubt about it because we have 2 environments running for 2 countries (.it, .ch) and even if they are the same on every aspect just the .it has this problem.

But if the problem is that autoscaling triggers to down scale an instance for low-use, how is possible to handle autoscaling down scale if you have just 1 instance running and also you set only 1 instance kind (T3.micro) to be used, and avoid that this only instance get terminated?

Shall we change the metric for autoscaling down sizing? Actually it seems that every metric could have the same problem.



Solution 1:[1]

It turned out that it was a misunderstanding. I could erase the question but it may be usefull to somebosy experiencing the same.

So this is what happens, I use as en example what really happened last 21/03.

From EB events:

Mon Mar 21 14:01:12 UTC 2022 EB environment health transitioned from Ok to Warning

Mon Mar 21 14:02:18 UTC 2022 Added instance [i-...ba4] to your environment

Mon Mar 21 14:10:18 UTC 2022 Environment health has transitioned from Ok to Degraded. No data received from 1 out of 2 instances.

Mon Mar 21 14:11:18 UTC 2022 Removed instance [i-...a37] from your environment.

From EB email notifications:

Mon Mar 21 14:10:18 UTC 2022 Message: Environment health has transitioned from Ok to Degraded. No data received from 1 out of 2 instances.

Mon Mar 21 14:12:18 UTC 2022 Message: Environment health has transitioned from Degraded to Ok.

So:

  • at 14.01 the app had a peak in NetworkOut metric that triggers autoscaling (NetworkOut > 6.000.000)
  • at 14.02 [i-...ba4] instance was launched by autoscaling (now 2 instances)
  • at 14.10 since even 2 instances aren't enough for NetworkOut peak I get an email of degradated environment
  • at 14.11 NetworkOut peak ends and the previous [i-...a37] instance is removed by autoscaling
  • at 14.12 I get a second email notification with env health back to ok
  • after 14.12 the app is not still used by the users so when I check CloudWatch I see an alarm NetworkOut < 2.000.000 (default EB config NetworkOut metric value)

The point is that since Autoscaling triggering was not due to an increment of the number of users but instead to a big data download from one single user, then the launch of a new instance couldn't do anything to help it and the environment remained still degradeted for some minutes, so I got the health notification email from EB. But when I checked CloudWatch after a while, the alarm I could see was a NetworkOut < 2.000.000.

Then, seeing in EB events that one instance got killed and a new one launched by AutoScaling, I got the wrong idea that this notification was due to the low traffic alarm, and not to the previous data tranfer peak.

And since this peak was not due to an increment of the users but to a big data download from one single user, than it wasn't either easy to realize what happened from logs.

But it was clear when I finally checked EB monitoring NetworkIn parameter, because the downloaded data were coming from RDS, so in correspondence to the NetworkOut peak I could see also a NetworkIn peak.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1