'Creating augmented reality using Python, OpenCV and SLAM

I was wondering if I could build an augmented reality system in Python using OpenCV and SLAM. If so, do you have any tutorials or documentation that you could recommend? I've been scratching my head for awhile now trying to find resources to start with, any help would be greatly appreciated!

If I were to be a bit more specific, it would be on how would I be able to integrate SLAM and AR together. SLAM acting as a form of mapping so that the AR would know where to place objects in.



Solution 1:[1]

if I want to be honest, python is not strong enough to bring you "RealTime Monocular SLAM System". So, firstly you should consider writing your SLAM system in C++ that is highly recommended for RealTime Systems! Secondly, You Can see some openSource SLAM systems here (Stella-VSLAM, ORBSLAM3, PTAM). But Consider that for developing a SLAM system, you should gain knowledge in wide range of computer science related topics! and the main reason of why ARCore and ARKit are working great is their efficient SLAM system. you can also read this resource for more info on SLAM systems. If you have more questions, please don't hesitate to ask!

Solution 2:[2]

Where to Start

In order to gain some knowledge on SLAM and Computer Vision I would recommend watching Cyrill Stachniss' SLAM course and reading the papers ORB-SLAM, ORB-SLAM2, ORB-SLAM3, and DSO. For Computer Vision I recommend reading R. Szeliski book.

Which Language to Use

I wrote my thesis on SLAM and AR systems, and the outcome is the following: State-of-the-art SLAM systems which achieve the best accuracy are still using machine learning techniques: SURF, ORB descriptors, Bag of Words (BoW) etc. All of the systems (ORB-SLAM3, DM-VIO, DSO) are written in C++.

I'm always using C++ for programming SLAM, and only sometimes I use Python to write scripts for example to fix the recovered trajectory.

SLAM + AR

There's no much resources on this subject, although the idea is simple. SLAM system has to give you the camera location, usually as the 4x4 transformation matrix, where the first 3x3 matrix is the rotation matrix, and the last 3x1 column is the translation part. Example of the transformation matrix.

Having the camera location, you can use the projective geometry to project the AR objects on the camera frame. ORB-SLAM2 has a nice AR demo to study; basically they display a 2D image, and put the 3D rendered image on top of that.

They use Pangolin, so you need to know how to use OpenGL, Pangolin. I recommend studying Pangolin by its' examples, as it mostly documented through them.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 amirhossein.razlighi
Solution 2 Ivan Podmogilniy