'Image point to point matching using intrinsics, extrinsics and third-party depth

I want to reopen a similar question to one which somebody posted a while ago with some major difference.

The previous post is https://stackoverflow.com/questions/52536520/image-matching-using-intrinsic-and-extrinsic-camera-parameters]

and my question is can I do the matching if I do have the depth? If it is possible can some describe a set of formulas which I have to solve to get the desirable matching ?

Here there is also some correspondence on slide 16/43: Depth from Stereo Lecture

In what units all the variables here, can some one clarify please ? Will this formula help me to calculate the desirable point to point correspondence ? I know the Z (mm, cm, m, whatever unit it is) and the x_l (I guess this is y coordinate of the pixel, so both x_l and x_r are on the same horizontal line, correct if I'm wrong), I'm not sure if T is in mm (or cm, m, i.e distance unit) and f is in pixels/mm (distance unit) or is it something else ?

Thank you in advance.

EDIT:

So as it was said by @fana, the solution is indeed a projection.

For my understanding it is P(v) = K (Rv+t), where R is 3 x 3 rotation matrix (calculated for example from calibration), t is the 3 x 1 translation vector and K is the 3 x 3 intrinsics matrix.

from the following video:

It can be seen that there is translation only in one dimension (because the situation is where the images are parallel so the translation takes place only on X-axis) but in other situation, as much as I understand if the cameras are not on the same parallel line, there is also translation on Y-axis. What is the translation on the Z-axis which I get through the calibration, is it some rescale factor due to different image resolutions for example ? Did I wrote the projection formula correctly in the general case?

I also want to ask about the whole idea.

Suppose I have 3 cameras, one with large FOV which gives me color and depth for each pixel, lets call it the first (3d tensor, color stacked with depth correspondingly), and two with which I want to do stereo, lets call them second and third.

Instead of calibrating the two cameras, my idea is to use the depth from the first camera to calculate the xyz of pixel u,v of its correspondent color frame, that can be done easily and now to project it on the second and the third image using the R,t found by calibration between the first camera and the second and the third, and using the K intrinsics matrices so the projection matrix seems to be full known, am I right ?

Assume for the case that FOV of color is big enough to include all that can be seen from the second and the third cameras.

That way, by projection each x,y,z of the first camera I can know where is the corresponding pixels on the two other cameras, is that correct ?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Image point to point matching using intrinsics, extrinsics and third-party depth

Sources

Related Questions