'Azure ML model on ACI - multiple request handling?
I want to understand if the ML model deployed to an Azure Container Instance (ACI) will handle multiple simultaneous incoming requests. At the moment we have a need to host 2 ML models for an application that might have a peak of 20 requests per hour. What I am unsure about is whether the deployed container from the ML Workspace can handle multiple requests simultaneously? So for example, if it receives 5-10 requests simultaneously is the container deployed in ACI capable of multithreading and handling the incoming requests. OR does it queue these up to handle one at a time? Reason I ask is because a single call takes 10-15 seconds and so was wondering if subsequent requests arriving within a close duration while the first request is still being processed, get queued in FIFO order OR if it can internally spawn more threads to address the multiple requests like a web server would?
Thanks in advance!
Solution 1:[1]
We can deploy an azure machine learning model as a web service and creates a REST API end point
We can send the data to this end point and receive the prediction returned by the model.
When we are creating model to our local environment we need to create a web service , We can retrieve the URI used to access the service by Azure ML SDK.
Enable the authentication keys or tokens.
- Use the SDK to get the connection information
- Determine the request type
- Create calls
Check the document
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | SairamTadepalli-MT |
