'tensorRT with grpc multi threading error, how to fix it?

Description

Environment

TensorRT Version: 8.2.3.0 NVIDIA GPU: gtx 1080ti NVIDIA Driver Version: 470.103.01 CUDA Version: 11.4 CUDNN Version: 8.2 Operating System: Linux 18.06 Python Version (if applicable): 3.8.0 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.10 Baremetal or Container (if so, version):

grpc server code

server = grpc.server(
    futures.ThreadPoolExecutor(),
    options=[
        ("grpc.max_send_message_length", -1),
        ("grpc.max_receive_message_length", -1),
        ("grpc.so_reuseport", 1),
        ("grpc.use_local_subchannel_pool", 1),
    ],
)

grpc stub init

grpcObject(encoder=trt_model, decoder=decoder)

trt_model init code

def __init__(self):
      cuda_ctx = cuda.Device(0).make_context()
      self.cuda_ctx = cuda_ctx
      if self.cuda_ctx:
          self.cuda_ctx.push() 
      ...

Hello. I'm using TensorRT via grpc. However, after setting max_worker in the multi-threading function of grpc, the following error occurs when requests come in from multiple clients. In case of max_worker=1, no error occurs. Can you help?

infer method

def infer(self, wav_path):

        input_signal = preprocess_stt(wav_path)

        if self.cuda_ctx:
            self.cuda_ctx.push()
        self.context.set_binding_shape(0, input_signal.shape)

        assert self.context.all_binding_shapes_specified
        h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)

        h_input_signal = cuda.register_host_memory(np.ascontiguousarray(to_numpy(input_signal)))
        cuda.memcpy_htod_async(self.d_input, h_input_signal, self.stream)
        self.context.execute_async(bindings=[int(self.d_input), int(self.d_output)], stream_handle=self.stream.handle)
        cuda.memcpy_dtoh_async(h_output, self.d_output, self.stream)
        self.stream.synchronize()

        if self.cuda_ctx:
            self.cuda_ctx.pop()
        return h_output

error

pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered
E0228 17:02:30.063214 140249774667520 _server.py:471] Exception iterating responses: cuMemHostAlloc failed: an illegal memory access was encountered
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/grpc/_server.py", line 461, in _take_response_from_response_iterator
    return next(response_iterator), True
  File "/data/grpc/stt_grpc/grpc_class/dummy_grpc_core.py", line 116, in getStream
    stt_result = trt_inference(self.trt_model, 'aaa.wav', self.decoder)
  File "/data/grpc/stt_grpc/stt_package/stt_func.py", line 525, in trt_inference
    model_output = actor.infer('aaa.wav')
  File "/data/grpc/stt_grpc/grpc_class/tensorrt_stt.py", line 153, in infer
    h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source