Camera projections issues

help

#1

I’m trying to verify my Camera implementation (extrinsics + intrinsics) by comparing reverse-projections of matching 2D points in different frames, but it seems to be wrong.

Brief context:

Dataset: ICL-NUIM (synthetic), video lr-kt0 [1]
The two frame used: 1 and 600
Point to match: bottom left corner of the wall frame:
- In frame 1 : x = 386, y = 278, depth = 16820 / 5000
- In frame 600: x = 14, y = 102, depth = 10670 / 5000

The camera extrinsics ground truth are given using the TUM format (translation, rotation in quaternion) [2].

I have implemented a Camera type using nalgebra [3]. I’m using it in the example file [4]. I was thinking that both back projections should give the same 3D point, but that’s not what I get.

Am I doing / assuming something fundamentally wrong, or did I make an implementation mistake? I am hoping some of you with more experience with nalgebra and projective geometry might know the answer.

[1] ICL-NUIM dataset: https://www.doc.ic.ac.uk/~ahanda/VaFRIC/iclnuim.html
[2] TUM dataset format: https://vision.in.tum.de/data/datasets/rgbd-dataset/file_formats
[3] My camera module: github mpizenberg/computer-vision-rs/blob/master/src/camera/mod.rs
[4] Short example: github mpizenberg/computer-vision-rs/blob/master/examples/camera.rs

Sorry for the github links, I have a new user restriction of two links.


#2

Here is the explanation of the Camera code:


#3

Turns out, I had two issues, one in the example, one in the camera module.

  • In the example, I had put 480 instead of (-)480 in one focal scaling value of the intrinsic parameters.
  • In the camera module, I had inversed the extrinsic projection and back-projection. Indeed the translation and rotation given by the dataset are the camera coordinate in the world reference, so the camera projection matrix, is then not P = [ R | t ] = t * R (in homogeneous) but P^(-1) so the one I used for back-projection.

Changing those two errors now gives me correct 3D world coordinates for the two matching points in the different frames.

:tada: