'tensorRT with grpc multi threading error, how to fix it?
Description
Environment
TensorRT Version: 8.2.3.0 NVIDIA GPU: gtx 1080ti NVIDIA Driver Version: 470.103.01 CUDA Version: 11.4 CUDNN Version: 8.2 Operating System: Linux 18.06 Python Version (if applicable): 3.8.0 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.10 Baremetal or Container (if so, version):
grpc server code
server = grpc.server(
futures.ThreadPoolExecutor(),
options=[
("grpc.max_send_message_length", -1),
("grpc.max_receive_message_length", -1),
("grpc.so_reuseport", 1),
("grpc.use_local_subchannel_pool", 1),
],
)
grpc stub init
grpcObject(encoder=trt_model, decoder=decoder)
trt_model init code
def __init__(self):
cuda_ctx = cuda.Device(0).make_context()
self.cuda_ctx = cuda_ctx
if self.cuda_ctx:
self.cuda_ctx.push()
...
Hello. I'm using TensorRT via grpc. However, after setting max_worker in the multi-threading function of grpc, the following error occurs when requests come in from multiple clients. In case of max_worker=1, no error occurs. Can you help?
infer method
def infer(self, wav_path):
input_signal = preprocess_stt(wav_path)
if self.cuda_ctx:
self.cuda_ctx.push()
self.context.set_binding_shape(0, input_signal.shape)
assert self.context.all_binding_shapes_specified
h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
h_input_signal = cuda.register_host_memory(np.ascontiguousarray(to_numpy(input_signal)))
cuda.memcpy_htod_async(self.d_input, h_input_signal, self.stream)
self.context.execute_async(bindings=[int(self.d_input), int(self.d_output)], stream_handle=self.stream.handle)
cuda.memcpy_dtoh_async(h_output, self.d_output, self.stream)
self.stream.synchronize()
if self.cuda_ctx:
self.cuda_ctx.pop()
return h_output
error
pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered
E0228 17:02:30.063214 140249774667520 _server.py:471] Exception iterating responses: cuMemHostAlloc failed: an illegal memory access was encountered
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/grpc/_server.py", line 461, in _take_response_from_response_iterator
return next(response_iterator), True
File "/data/grpc/stt_grpc/grpc_class/dummy_grpc_core.py", line 116, in getStream
stt_result = trt_inference(self.trt_model, 'aaa.wav', self.decoder)
File "/data/grpc/stt_grpc/stt_package/stt_func.py", line 525, in trt_inference
model_output = actor.infer('aaa.wav')
File "/data/grpc/stt_grpc/grpc_class/tensorrt_stt.py", line 153, in infer
h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
