'How to reduce the gstreamer pipeline latency in "AWS Kinesis Video Stream(kvs) -> gstreamer+opencv image processing -> kvs " process?

I created the image process pipeline like the image below and would like to reduce the entire pipeline latency;

I run the command like below in the MacBook; (vtenc_h264_hw is about 8 seconds faster than x264enc)

gst-launch-1.0 -v avfvideosrc \
! clockoverlay font-desc="Sans bold 60px" \
! videorate \
! video/x-raw,framerate=1/1 \
! vtenc_h264_hw allow-frame-reordering=FALSE realtime=TRUE max-keyframe-interval=2 bitrate=512  \
! h264parse \
! video/x-h264,stream-format=avc,alignment=au \
! kvssink stream-name="test-instream" storage-size=512 \
    access-key="${AWS_ACCESS_KEY_ID}" \
    secret-key="${AWS_SECRET_ACCESS_KEY}" \
    aws-region="${AWS_REGION}" \
    frame-timecodes=true \
    framerate=1

The resulting video in the Kinesis video stream(KVS) management console(test-instream) is below;

https://youtu.be/hg13YG7vgHw

There is about 2 seconds delay.

On the server-side, I processed the stream in AWS g4dn.xlarge instance with the python code below;


GSTREAMER_OUT = ' ! '.join([
    'appsrc',
    'clockoverlay halignment=right valignment=top font-desc="Sans bold 60px"',
    'videoconvert',
    'video/x-raw,format=YV12',
    'x264enc byte-stream=true noise-reduction=10000 speed-preset=ultrafast tune=zerolatency ',
    'video/x-h264,stream-format=avc,alignment=au,profile=baseline',
    ' '.join([
        'kvssink',
        f'stream-name=test-outstream',
        'storage-size=512',
        f'access-key={access_key}',
        f'secret-key={secret_key}',
        'aws-region=ap-northeast-1',
        'framerate=1',
    ]),
])

cap = cv2.VideoCapture(hls_stream_url)
out = cv2.VideoWriter(GSTREAMER_OUT, cv2.CAP_GSTREAMER, 0, target_fps, (frame_width, frame_height), True)

while True:
    ret, frame = cap.read()
    result = some_image_process(frame, gpu=true/false)
    out.write(result)

some_image_process took 0.04 seconds when gpu flag was set True and took 0.8 seconds when gpu flag was set False.

The resulting video in the Kinesis video stream management console(test-outstream) is below;

With GPU: https://youtu.be/D-HZtXb55gk
Without GPU: https://youtu.be/iwdk2fujaOE

The latency is about 40 seconds with GPU and about 25 seconds without GPU. It is strange that CPU processing was faster. Maybe this is because calling cap.read() too much when GPU was used. The error message Could not read complete segment. tells that.

I would like to make this pipeline work in near-real-time. Is it possible to make this pipeline faster?

I am trying to reduce the enc/dec time using NVIDIA deepstreamer. Is there any other choices? I read the faqs of AWS. Q: How do I think about latency in Amazon Kinesis Video Streams?

https://aws.amazon.com/kinesis/video-streams/faqs/?nc1=h_ls

Or should the KVS stream be split into images like the slide below?

https://speakerdeck.com/toshitanian/amazon-kinesis-video-streams-x-deep-learning?slide=23