'How to run inference for T5 tensorrt model deployed on nvidia triton?
I have deployed T5 tensorrt model on nvidia triton server and below is the config.pbtxt file, but facing problem while inferencing the model using triton client.
As per the config.pbtxt file there should be 4 inputs to the tensorrt model along with the decoder ids. But how can we send decoder as input to the model I think decoder is to be generated from models output.
name: "tensorrt_model"
platform: "tensorrt_plan"
max_batch_size: 0
input [
{
name: "input_ids"
data_type: TYPE_INT32
dims: [ -1, -1 ]
},
{
name: "attention_mask"
data_type: TYPE_INT32
dims: [-1, -1 ]
},
{
name: "decoder_input_ids"
data_type: TYPE_INT32
dims: [ -1, -1]
},
{
name: "decoder_attention_mask"
data_type: TYPE_INT32
dims: [ -1, -1 ]
}
]
output [
{
name: "last_hidden_state"
data_type: TYPE_FP32
dims: [ -1, -1, 768 ]
},
{
name: "input.151"
data_type: TYPE_FP32
dims: [ -1, -1, -1 ]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
}
]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
