'How to convert webrtc::VideoFrame to OpenCv Mat in C++

I am trying to display received WebRTC frames using OpenCV imshow(). WebRTC delivers frames as objects of webrtc::VideoFrame and in my case, I can access webrtc::I420Buffer from it. Now my question is how do I convert the data in webrtc::I420Buffer to cv::Mat, so that I can give it to imshow()?

Thsi is what the definition of webrtc::I420Buffer looks like


namespace webrtc {

// Plain I420 buffer in standard memory.
class RTC_EXPORT I420Buffer : public I420BufferInterface {
 public:
  ...

  int width() const override;
  int height() const override;
  const uint8_t* DataY() const override;
  const uint8_t* DataU() const override;
  const uint8_t* DataV() const override;

  int StrideY() const override;
  int StrideU() const override;
  int StrideV() const override;

  uint8_t* MutableDataY();
  uint8_t* MutableDataU();
  uint8_t* MutableDataV();

  ...

 private:
  const int width_;
  const int height_;
  const int stride_y_;
  const int stride_u_;
  const int stride_v_;
  const std::unique_ptr<uint8_t, AlignedFreeDeleter> data_;
};

c++opencv webrtc

Solution 1:^[1]

The main issue is converting from I420 color format to BGR (or BGRA) color format used by OpenCV.

Two good options for color conversion:

Using sws_scale - part of the C interface libraries of FFmpeg.
Using IPP color conversion function like ippiYCbCr420ToBGR_709HDTV_8u_P3C4R.

We may also use cv::cvtColor with cv::COLOR_YUV2BGR_I420 argument.
This is less recommended, because the Y, U and V color channels must be sequential in memory - in the general case, it requires too many "deep copy" operations.

After the color conversion we may use cv:Mat constructor that "wraps" the BGR (or BGRA) memory buffer (without using "deep copy").

Example (the terms "step", "stride" and "linesize" are equivalent):

cv::Mat bgra_img = cv::Mat(height, width, CV_8UC4, pDst, dstStep);

Create sample raw image in I420 format for testing:
We may use FFmpeg CLI for creating the input file that is used for testing:

 ffmpeg -f lavfi -i testsrc=size=640x480:duration=1:rate=1 -pix_fmt yuv420p -f rawvideo I420.yuv

Note: FFmpeg yuv420p is equivalent to I420 format.

The code sample includes two part:
The first part uses sws_scale, and the second part uses IPP.
Choose one of them (you don't have to use both).

For testing, I redefined and added some functionality to class I420Buffer.
It may look weird, but it is used only for testing.
Just follow the code sample, and see that it makes sense...

Here is the code sample (please read the comments):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

//Use OpenCV for showing the image
#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>


extern "C" {
//Required for using sws_scale
#include <libavutil/frame.h>
#include <libswscale/swscale.h>
}

//We don't need both IPP and LibAV, the IPP solution is a separate example.
#include <ipp.h>
#include <ippi.h>



//I420 format:
//            <------ stride_y_------>
//            <------- width ------>
// data_y_ -> yyyyyyyyyyyyyyyyyyyyyy00
//            yyyyyyyyyyyyyyyyyyyyyy00
//            yyyyyyyyyyyyyyyyyyyyyy00
//            yyyyyyyyyyyyyyyyyyyyyy00
//            yyyyyyyyyyyyyyyyyyyyyy00
//            yyyyyyyyyyyyyyyyyyyyyy00
//
//            < stride_u_>
//            <-width/2->
// data_u_ -> uuuuuuuuuuu0
//            uuuuuuuuuuu0
//            uuuuuuuuuuu0
//            
//            < stride_v_>
//            <-width/2->
// data_v_ -> uuuuuuuuuuu0
//            uuuuuuuuuuu0
//            uuuuuuuuuuu0

 


// Plain I420 buffer in standard memory.
// Some extra functionality is added for testing
////////////////////////////////////////////////////////////////////////////////
class I420Buffer {
    public:
        //Constructor (for testing):
        //Allocate buffers, and read I420 image for binary file.
        explicit I420Buffer(int w, int h, const char *input_file_name) : width_(w), height_(h), stride_y_(w), stride_u_(w / 2), stride_v_(w / 2) 
        {
            //The example uses stride = width (but in the general case the stride may be larger than width).
            data_y_ = new uint8_t[w*h];
            data_u_ = new uint8_t[w*h / 4];
            data_v_ = new uint8_t[w*h / 4];

            FILE* f = fopen(input_file_name, "rb");
            fread(data_y_, 1, w*h, f);  //Read Y color channel.
            fread(data_u_, 1, w*h/4, f);  //Read U color channel.
            fread(data_v_, 1, w*h/4, f);  //Read V color channel.
            fclose(f);
        };

        //Destructor (for testing):
        ~I420Buffer()
        {
            delete[] data_y_;
            delete[] data_u_;
            delete[] data_v_;
        }

        int width() const { return width_; };
        int height() const { return height_; };
        const uint8_t* DataY() const { return data_y_; };
        const uint8_t* DataU() const { return data_u_; };
        const uint8_t* DataV() const { return data_v_; };

        int StrideY() const { return stride_y_; };
        int StrideU() const { return stride_u_; };
        int StrideV() const { return stride_v_; };

        //uint8_t* MutableDataY();
        //uint8_t* MutableDataU();
        //uint8_t* MutableDataV();

    private:
        const int width_;
        const int height_;
        const int stride_y_;
        const int stride_u_;
        const int stride_v_;
        //const std::unique_ptr<uint8_t, AlignedFreeDeleter> data_;
        uint8_t* data_y_;   //Assume data_ is internally divided into Y, U and V buffers.
        uint8_t* data_u_;
        uint8_t* data_v_;
};
////////////////////////////////////////////////////////////////////////////////



int main()
{
    //Create raw video frame in I420 format using FFmpeg (for testing):
    //ffmpeg -f lavfi -i testsrc=size=640x480:duration=1:rate=1 -pix_fmt yuv420p -f rawvideo I420.yuv
    int width = 640;
    int height = 480;

    I420Buffer I(width, height, "I420.yuv");


    //Create SWS Context for converting from decode pixel format (like YUV420) to BGR
    ////////////////////////////////////////////////////////////////////////////
    struct SwsContext* sws_ctx = NULL;

    sws_ctx = sws_getContext(I.width(),
                             I.height(),
                             AV_PIX_FMT_YUV420P,  //Input format is yuv420p (equivalent to I420).
                             I.width(),
                             I.height(),
                             AV_PIX_FMT_BGR24,    //For OpenCV, we want BGR pixel format.
                             SWS_FAST_BILINEAR,
                             NULL,
                             NULL,
                             NULL);

    if (sws_ctx == nullptr)
    {
        return -1;  //Error!
    }
    ////////////////////////////////////////////////////////////////////////////


    //Allocate frame for storing image converted to BGR.
    ////////////////////////////////////////////////////////////////////////////
    AVFrame* pBGRFrame = av_frame_alloc();  //Allocate frame, because it is more continent than allocating and initializing data buffer and linesize.

    pBGRFrame->format = AV_PIX_FMT_BGR24;
    pBGRFrame->width = I.width();
    pBGRFrame->height = I.height();

    int sts = av_frame_get_buffer(pBGRFrame, 0);    //Buffers allocation

    if (sts < 0)
    {
        return -1;  //Error!
    }
    ////////////////////////////////////////////////////////////////////////////


    //Convert from input format (e.g YUV420) to BGR:
    //Use BT.601 conversion formula. It is more likely that the input is BT.709 and not BT.601 (read about it in Wikipedia).
    //It is possible to select BT.709 using sws_setColorspaceDetails
    ////////////////////////////////////////////////////////////////////////////
    const uint8_t* const src_data[] = { I.DataY(), I.DataU(), I.DataV() };
    const int src_stride[] = { I.StrideY(), I.StrideU(), I.StrideV() };

    sts = sws_scale(sws_ctx,                //struct SwsContext* c,
                    src_data,               //const uint8_t* const srcSlice[],
                    src_stride,             //const int srcStride[],
                    0,                      //int srcSliceY, 
                    I.height(),             //int srcSliceH,
                    pBGRFrame->data,        //uint8_t* const dst[], 
                    pBGRFrame->linesize);   //const int dstStride[]);

    if (sts != I.height())
    {
        return -1;  //Error!
    }


    //Use OpenCV for showing the image (and save the image in JPEG format):
    ////////////////////////////////////////////////////////////////////////////
    cv::Mat img = cv::Mat(pBGRFrame->height, pBGRFrame->width, CV_8UC3, pBGRFrame->data[0], pBGRFrame->linesize[0]);    //cv::Mat is OpenCV "thin image wrapper".
    cv::imshow("img", img);
    //cv::waitKey();

    //Save the inage in PNG format using OpenCV
    cv::imwrite("rgb.png", img);
    ////////////////////////////////////////////////////////////////////////////


    //Free
    sws_freeContext(sws_ctx);
    av_frame_free(&pBGRFrame);



    // Solution using IPP:
    // The IPP sample use BT.709 conversion formula, and convert to BGRA (not BGR)
    // It is more likely that the input is BT.709 and not BT.601 (read about it in Wikipedia).
    // Using color conversion function: ippiYCbCr420ToBGR_709HDTV_8u_P3C4R
    //https://www.intel.com/content/www/us/en/develop/documentation/ipp-dev-reference/top/volume-2-image-processing/image-color-conversion/color-model-conversion/ycbcr420tobgr-709hdtv.html
    ////////////////////////////////////////////////////////////////////////////
    IppStatus ipp_sts = ippInit();

    if (ipp_sts < ippStsNoErr)
    {
        return -1;  //Error.
    }

    const Ipp8u* pSrc[3] = { I.DataY(), I.DataU(), I.DataV() };
    int srcStep[3] = { I.StrideY(), I.StrideU(), I.StrideV() };
    Ipp8u* pDst = new uint8_t[I.width() * I.height() * 4];
    int dstStep = I.width() * 4;
    IppiSize roiSize = { I.width(), I.height() };

    ipp_sts = ippiYCbCr420ToBGR_709HDTV_8u_P3C4R(pSrc,      //const Ipp8u* pSrc[3], 
                                                 srcStep,   //int srcStep[3], 
                                                 pDst,      //Ipp8u* pDst, 
                                                 dstStep,   //int dstStep, 
                                                 roiSize,   //IppiSize roiSize, 
                                                 255);      //Ipp8u aval)

    if (ipp_sts < ippStsNoErr)
    {
        return -1;  //Error.
    }

    cv::Mat bgra_img = cv::Mat(I.height(), I.width(), CV_8UC4, pDst, dstStep);    //cv::Mat is OpenCV "thin image wrapper".
    cv::imshow("bgra_img", bgra_img);
    cv::waitKey();

    delete[] pDst;
    ////////////////////////////////////////////////////////////////////////////

    return 0;
}

Sample output (resized):

Note:
It seems to difficult to actually use WebRTC input stream.
The assumption is that the decoded raw video frame already exists in I420Buffer as defined in your post.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Rotem

'How to convert webrtc::VideoFrame to OpenCv Mat in C++

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]