'aws Sagemaker autoscaling with instance metrics per instance

I am using aws Sagemaker endpoint for inference. Based upon amount of traffic, endpoint should scale up and down by adding more instance into the endpoint. I am trying to use instance metrics (CPUUtilization, MemoryUtilization or DiskUtilization) as metric for sagemaker endpoint autoscaling. These are the predefined metrics as defined here: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipeline-logs-metrics.html

The problem is that the instance metrics for a given endpoint are sum of all the running instances within an endpoint. For example in the following endpoint runtime settings: Example of aws sagemaker endpoint runtime settings

Current running instances are 5 then the the value of CPUUtilization can range from 0 to 500%. Based upon the number of instances running the maximum value will change hence autoscaling policy should be changed. Question is: Is there any way to find out Metric per instance i.e. CPUUtilizationPerInstance without explicitly calculating them or through custom metric? Autoscaling policy of scaling up and down by setting a threshold on per instance CPUUtilization seems the right way. Is there any other similar option on aws?



Solution 1:[1]

There is an InvocationsPerInstance metric that shows the average number of invocations per instance when you use the 'Sum' statistic.

https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html

This blog post details how you would go about load testing your endpoint to find a good target value for InvocationsPerInstance to use in autoscaling: https://aws.amazon.com/blogs/machine-learning/load-test-and-optimize-an-amazon-sagemaker-endpoint-using-automatic-scaling/

Solution 2:[2]

This blog post describes how you would define a custom metric to track average cpu utilisation per instance.

tl;dr

    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 90.0,
        'CustomizedMetricSpecification':
        {
            'MetricName': 'CPUUtilization',
            'Namespace': '/aws/sagemaker/Endpoints',
            'Dimensions': [
                {'Name': 'EndpointName', 'Value': endpoint_name },
                {'Name': 'VariantName','Value': 'AllTraffic'}
            ],
            'Statistic': 'Average', # Possible - 'Statistic': 'Average'|'Minimum'|'Maximum'|'SampleCount'|'Sum'
            'Unit': 'Percent'
        },
        'ScaleInCooldown': 600,
        'ScaleOutCooldown': 300
    }

Solution 3:[3]

Yes, there is a way to find out "Metric per instance" and ack upon those.

This is done via Auto scaling policies. You have not used auto-scalling and I suggest to enable auto-scaling and start as low as possible with initial instance, like 1.

There is a aws documentation for the policies, so that is a nice start to understand the scaling based on metrics aws configure model autoscaling

Useful example with code for metrics

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 fm1ch4
Solution 2 trudolf
Solution 3 zhrist