Evaluating rosify_difodo on RGBD-SLAM TUM dataset

4 minute read

After getting a good view (see previous post) of DIFODO working on TUM RDBD SLAM datasets, now its time to measure the accuracy or better said the error, that this algorithm makes against the groundtruth that each sequence of the dataset has.

New version of rosify difodo v1.5.0

But before doing this, a few changes were made to rosify difodo and a new version v1.50 was released.

  • This version includes new parameters to configure on the algorithm (through the launch files) that allow to filter out values from the depth images with upper and lower bounds.

  • Also the camera_fps is not required anymore since now the algorithm compute the time between depth iamges using the timestamp from the ros messages in order to use this time in a dynamic way for the odometry calculation.

  • In order to evaluate the algorithm using real metrics like RMSE, rosify difodo now creates a file on each execution with the odometry information ready to be used by TUM dataset and also by slam-testbed.

Evaluation of rosify difodo against the TUM RGBD dataset

For any reason the rosbags from TUM dataset seem to come with the first 4 seconds withouth depth and color images. Using as example the freiburg1_xyz sequence, if we compare the groundtruth duration with an odometry estimation example obtained with rgbd-slam we see that the rgbd-slam estimation is missing 4 seconds. By downloading the dataset we can see that the depth images number is less than expected, missing 4 seconds.

I think this may be done on purpose to allow the initialization of the topics before actually publishing the images, so no images are lost.

The best way to visualize this is loading the rosbag on rqt_bag:

alt

Also using:

rosbag info rgbd_dataset_freiburg1_rpy.bag

Give 798 msgs for depth and rgb information, if we suppose a frame rate of 30FPS, 798*33.3ms/1000 = 26.57secs and the rosbag duration its 30.4 thats almost 4 seconds of the rosbag are missing depth and rgb values.

Still this wont be a problem is we use to create the groundtruth, the timestamps from the depth images. This way it can be easily sinchronise with the groundtruth.

Option 1: Online evaluation tool from TUM

Using the online evaluation tool from TUM by uploading the file, and using the relative pose error option I could measure the error that rosify difodo has against this dataset.

Option 2: Python script tool for evaluation from TUM

This is the same script use in their online tool but you can have it local. They can be downloaded from here, the one we will focus on is the RPE version since is the recommended metric for odometry. Results from the online tool and this script should be the same.

Option 3: slam-testbed

slam-testbed is a tool created by one of the students of JdeRobot as myself. This tool also allows to get the error by inputting groundtruth and estimation in the same format use by TUM.

Results:

I could only use the tools from TUM since slam-testbed crashed without giving a result. After I use these tools to get the RMSE, the results were not satisfying. Up to 0.4-0.6m of RMSE when rgbd-slam gets 0.02-0.03 in the same sequences. Also the results that are claim in the paper of DIFODO are around 0.04m.

The strange thing is that using rviz and DifOdometry-Datasets app from mrpt I could visually see how both trajectories (on rviz the results from rosify-difodo and the one obtained by the original DIFODO) were the same. So the error has to be somewhere else.

After manually inspecting the datasets (the groundtruth and the output from rosify difodo) and after using DifOdometry-Datasets I get to the conclusion that the error was due to the coordinate reference system used. While the groundtruth start with a reference system, rosify difodo was not using this at all. This makes that a linear movement in one axis in rosify difodo, was the movement on two axis on the grountruth, leading to the high error. Also since rosify difodo starts its movement on the world coordinate system (0,0,0,0,0,0) the grountruth starts with an offset in translation and rotation.

So in order to get this to work, Ive only have two options:

  • Use the tf values provided by the rosbag datasets from TUM. I dont like these option since its something that will only work when using TUM dataset i think.
  • Get slam-testbed to work since it does a spatial registration (scale, translation and rotation) before measuring the RMSE. If both results were correctly registrated the problem will be solved.

Example of the tf from TUM rosbag sequences:

transforms: 
  - 
    header: 
      seq: 0
      stamp: 
        secs: 1305031098
        nsecs: 425826548
      frame_id: "/kinect"
    child_frame_id: "/openni_camera"
    transform: 
      translation: 
        x: -0.0112272525515
        y: 0.0502554918469
        z: -0.0574041954071
      rotation: 
        x: 0.129908557896
        y: -0.141388223098
        z: 0.681948549539
        w: 0.705747343414
  - 
    header: 
      seq: 0
      stamp: 
        secs: 1305031098
        nsecs: 425826548
      frame_id: "/world"
    child_frame_id: "/kinect"
    transform: 
      translation: 
        x: 1.36557455444
        y: 0.594936477661
        z: 1.73326101685
      rotation: 
        x: -0.290218127334
        y: -0.303852355187
        z: 0.61973983294
        w: 0.662849699616
---