'Projection of 3D Lidar point in the i-th camera image (KITTI Dataset)

I am working on a object classification problem and I am using Lidar and camera data from the Kitti Dataset., In this article : http://ww.cvlibs.net/publications/Geiger2013IJRR.pdf , they provide the formulas for projecting the 3d PointCloud into the i-th camera image plane, but I don't understand some things :

Following equation((3) :

If the 3D point X is in velodyne camera image and Y in the i'th camera image, why X has four coordinates and Y three? It should have been 3 and 2, no?

Formula
(source: noelshack.com)

I need to project the 3D point Cloud into the camera image plane for then creating lidar images to use them as a channel for the CNN. Anyone who has ideas for it ?

Thanks you in advance



Solution 1:[1]

For your first query regarding x and y dimension there are two explanation.

Reason 1.

  • For image re-projection pin hole camera model is used which is in perspective coordinate or homogenous coordinate. Perspective projection uses the image origin as centre of projection and points are mapped to the plane z=1. A 3D point [x y z] is represented by [xw yw zw w] and the point it maps on the plane is represented by [xw yw zw]. Normalising with w gives.

    So (x,y) -> [x y 1]T : Homogeneous Image Coordinates

    and (x,y,z) - > [x y z 1] T : Homogeneous Scene Coordinates

Reason 2.

  • With respect to the paper you have attached, considering equation (4) and (5)

    enter image description here

    enter image description here

    It is clear that P is of dimension 3X4 and R is expanded to 4x4 dimension.Also x is of dimension 1x4. So as per matrix multiplication rule number of columns of first matrix must equal to the number of rows of second matrix. So for given P of 3x4 and R of 4x4, x has to be 1x4.

Now coming to your second question of LiDAR image fusion, It requires intrinsic and extrinsic parameters (relative rotation and translation) and camera matrix. This rotation and translation forms a 3x4 matrix called as transformation matrix. So the point fusion equations becomes

[x y 1]^T = Transformation Matrix * Camera Matrix * [X Y Z 1]^T

You can also refer :: Lidar Image Fusion KITTI

Once your LiDAR image fusion is done, you can input this image to your CNN model.I am not aware of DNN modules for LiDAR fused image.

Hope this helps..

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ritesh