'Invocation timed out using Sagemaker to invoke endpoints with pretrained custom PyTorch model [Inference]

I have a pretrained model based on PyTorch (contextualized_topic_models) and have deployed it using AWS sagemaker script model. However, when I tried to invoke endpoints for inference, it always returns "Invocation timed out error" no matter what I tried. I have tried different types of input and changing the input_fn() function but still it doesn't work.

I've tried to run my inference.py script on Colab (without connecting to the aws server) and each function seems to work perfectly fine with expected predictions returned.

I've been trying to debug this for 4 days now and even in my dream I thought about this issue... I'll be deeply grateful for any help.

Here's my deployment script.

from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(
    model_data=pretrained_model_data,
    entry_point="inference.py",
    role=role,
    framework_version="1.8.1",
    py_version="py36",
    sagemaker_session=sess,
)

endpoint_name = "topic-modeling-inference"

# Deploy
predictor = pytorch_model.deploy(
initial_instance_count = 1,
instance_type = "ml.g4dn.xlarge",
endpoint_name = endpoint_name
)

Endpoint test (prediction) script

# Test the model
import json
sm = boto3.client('sagemaker-runtime')
endpoint_name = "topic-modeling-inference"

prompt = [
    "Here is a piece of cake."
        ]

promptbody = [x.encode('utf-8') for x in prompt]
promptbody = promptbody[0]
#body= bytes(prompt[0], 'utf-8')
#tryout = prompt[0]


response = sm.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="text/csv",
Body=promptbody 
#Body=tryout.encode(encoding='UTF-8')
)

print(response)

#result = json.loads(response['Body'].read().decode('utf-8'))
#print(result)

Part of my inference.py script

def predict_fn(input_data, model):
    input_data_features = tp10.transform(text_for_contextual=input_data)
    topic_prediction = model.get_doc_topic_distribution(input_data_features, n_samples=20)
    topicID = np.argmax(topic_prediction)
    topicID = int(topicID.astype('str'))
    return topicID
    #prediction = model.get_topic_lists(20)[np.argmax(topic_prediction)]
    #return prediction

def input_fn(request_body, request_content_type):
    if request_content_type == "application/json":
        request = json.loads(request_body)
    else:
        request = request_body
    return request

def output_fn(prediction, response_content_type):
    if response_content_type == "application/json":
        response = str(json.dumps(prediction))
    else:
        response = str(json.dumps(prediction))
    return response

Any help or guidance will be wonderful. Thank you in advance.



Solution 1:[1]

I would suggest to look into the CloudWatch logs of the endpoint to see if there are any invocations reaching the endpoint.

If yes, see if they are sending a response back without any errors in the same log file.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 CrzyFella