Evaluating the new SD-SLAM

12 minute read

In this entry, Ill use the same datasets that I have been using until this point, the RGBD TUM dataset. The only problem is that this dataset doesnt have mixed sequences, with texture and non texture in a same sequence. I will create a way to simulate the lack of texture on these sequences so the DIFODO tracking from the new SD-SLAM version can be used.

TUM dataset

There dont seem to be international datasets that had the right events to test this new SD-SLAM. Datasets are either RGB or Depth. In TUM there are sequences without texture or without structure but there is no such a mixed sequence that loses texture somewhere in the middle and then recovers it.

To test this new algorithm TUM dataset will be used but absense of texture will be simulated by painting completely black the RGB images that comes into the algorithm to be processed. This way a simulation of a non texture, no light environment is recreated and allows to test the new SD-SLAM against the previous version.

This simulation has been included in the code of SD-SLAM and can be found in the branch TUM_experiments.

Two new parameters has been added in the configuration file to simulate time intervals where RGB texture is not desired:

#--------------------------------------------------------------------------------------------
# TUM experimental
#--------------------------------------------------------------------------------------------
# A matrix of any rows and 2 columns. Each pair of values from a row mean the starting second
# and end second where the rgb images will go completely black (simulating an environment without light)
Experiments.noRGBTimeIntervals: !!opencv-matrix
  rows: 3
  cols: 2
  dt: i
  data: [ 1, 2, 4, 5, 7, 12 ]

# Set to 0 to use the new SD-SLAM integrated with DIFODO. 1 to use the old SD-SLAM.
Experiments.UseOriginalRGBDSLAM: 0

Using any of the launch files such as sdslam_TUM1_evaluation_file_EXPERIMENTS.launch will allow to execute and test the algorithms.

How to run the tests

Ive created several launch and configuration files (yaml) that will launch all the necessary ROS nodes in order to run and test the desired algorithm. The following launch files were used:

roslaunch SD-SLAM sdslam_TUM1_evaluation_file_EXPERIMENTS_v1.launch

or

roslaunch SD-SLAM sdslam_TUM1_evaluation_file_EXPERIMENTS_v2.launch

Experiments

DIFODO supports SD-SLAM

In this experiment we want to test that DIFODO does indeed help SD-SLAM on the tracking when the texture is lost. For this, a sequence where both DIFODO and SD-SLAM had a similar performance and a good performance in general(check this previous entry), the sequence is rgbd_dataset_freiburg1_desk.bag.

alt

In this sequence the camera follows a semicircular trajectory around a desktop. The sequence is a round trip, so SLAM algorithms can relocalize due to loop closures. The total length sequence 23.8 seconds although only since the 4th second of this sequence the image topics start publishing. So the real lenght would be around 20 seconds. Also almost half of the time is one way and the other half of the time is return time to the position it started.

A set of test with this sequence has been done to test the new SD-SLAM performance.

Test 1

Intervals without texture (seconds): [4,5]

Description: Ive simulate a lost of light on the interval from the 4th to 5th second in the sequence.

APE Comparison

SD-SLAM Version rmse mean median std min max sse
Original Version 1.252861486742955 0.8894136295371773 0.14154747895750924 0.8823861402794543 0.035010855459414306 2.165339155882976 866.4533715399999
New Version 0.18634107746451184 0.1673215851092585 0.11993585785743979 0.08201514681545333 0.0352727940486716 0.3153194094882204 19.13237143

The new version does indeed much better just by taking a look at the APE.SD-SLAMv1 gets lost from the 4th second to the 5th since there is no texture information. The algorithm keeps in lost state until the end. Since on the end part of the scene has been seen before, the algorithm recovers and start tracking again. SD-SLAMv2 doesnt have this problem since from the 4th second to the 5th DIFODO is used, then once the texture is available again the algorithm keeps tracking and mapping using ORB tracking. For comparison here are the two trajectories:

SD-SLAM Old SD-SLAM New
alt alt

While for the new version the full trajectory is created, on the old version only the first part can be estimated.

Test 2

Intervals without texture (seconds): [4,5, 8,9, 12,13, 16,17]

Description: Same sequence as the test 1. Now there are more times simulated when the texture is lost. With this I want to check the influence of using DIFODO tracking in the results. Also how the algorithm is affected by several changes from one tracking to the other.

APE Comparison

SD-SLAM Version rmse mean median std min max sse
Original Version 1.252861486742955 0.8894136295371773 0.14154747895750924 0.8823861402794543 0.035010855459414306 2.165339155882976 866.4533715399999
New Version 0.3109334973422831 0.2845629077559233 0.2816500293250801 0.12531397088512292 0.035010855459414306 0.48306729344885274 54.527316830000004

The results in this case are the same for the original version since all the intervals without texture are in the part where it could not track. Once the original algorithm is lost unless it sees a known place it wont recover the tracking. Once again the newer version does much better as expected since the tracking is never stopped.

Test Number rmse mean median std min max sse
Test 1 0.18634107746451184 0.1673215851092585 0.11993585785743979 0.08201514681545333 0.0352727940486716 0.3153194094882204 19.13237143
Test 2 0.3109334973422831 0.2845629077559233 0.2816500293250801 0.12531397088512292 0.035010855459414306 0.48306729344885274 54.527316830000004

In the previous table the APE is compared from test to test for the new version of the algorithm. What can be seen is that the more DIFODO is used the more the error is accumulated. This is expected since we already know from the previous comparison between DIFODO and SD-SLAM that SD-SLAM has better estimations when there is enough visual information like in this case.

Test 3

Intervals without texture (seconds): [4,8]

Description: Same sequence as the experiment 1. Instead of having multiple small intervals, like in the test 2, where the texture is lost in this experiment Ill test the effect of losing texture for longer time intervals.

SD-SLAM Version rmse mean median std min max sse
Original Version 1.252861486742955 0.8894136295371773 0.14154747895750924 0.8823861402794543 0.035010855459414306 2.165339155882976 866.4533715399999
New Version 0.12891601105092543 0.11170181880008777 0.0895983258772172 0.0643586946887108 0.033064936110629335 0.33897206374567207 9.124016509999999

Surprisingly, in this test the effect of a long tracking with DIFODO appears to be beneficial. The results for the New version are even better than when tracking only with ORB (from a previous entry where DIFODO was compared against SD-SLAM). Although this is not common, in this particular dataset DIFODO and SD-SLAM (original version) seem to be pretty equal in terms of estimation based on the APE error. So in this particular case is possible that in that interval DIFODO outperforms SD-SLAM.

Testing the new SD-SLAM on hard environments

In this experiment we present a dataset without structure, where DIFODO dont perform well. The goal is test how the new SD-SLAM handles a difficult scenario, where there is low texture and also in some intervals the texture is also lost. The sequence used for the following test is rgbd_dataset_freiburg3_nostructure_texture_far.bag.

alt

The sequence is a linear trajectory where there are some papers on the floor with some text on it. This papers create enough texture for SD-SLAM to work well, but there is almost no structure. By the end on the sequence the papers go out of the view of the camera and only a cable is there, these doesnt present enought texture for SD-SLAM to track:

alt

The performance of DIFODO on his own (check this previous entry) in these sequence was bad while SD-SLAM get a good median APE value but not so well mean APE value due to not being able to track in the end, because of the absense of texture in the dataset itself. This comparison was previously discussed when comparing DIFODO against SD-SLAM here. The total length sequence is 16.1s seconds and the last 2-3 seconds dont have texture.

A set of test with this sequence has been done to test the new SD-SLAM performance.

Test 4

Intervals without texture (seconds): -

Description: The original sequence goes over the floor with some papers on it. It has texture but there is almost no structure, just the floor. Towards the end there is almost no texture which makes the original SD-SLAM to get lost and get a high APE. With the new algorithm having DIFODO we expect to track in this final area where there is no texture even if its difficult for DIFODO due to the absense of structure also.

SD-SLAM Version rmse mean median std min max sse
Original Version 1.3445785595856554 0.5417117519195889 0.08905994196229185 1.2306258085745017 0.01364441277593148 3.9141900119948185 705.07768613
New Version 0.12738512593808796 0.10729214299405194 0.09413654975619196 0.06866852526453764 0.01028056418685284 0.30382922835040077 6.799100560000001

On the original algorithm it can be seen how only the median value represents a good value, this is because at the end where there is no texture, no tracking is being made. This creates big errors affecting the mean value and rmse. Also the standard deviation is higher and the max value is really high. The introduction of DIFODO in the new version of SD-SLAM brings more stable values. The mean remains practically the same but now the deviation and max APE values are lowered. Also the rmse and mean are closer to the median which means that tracking is being performen in all the trajectory. The following images shows the trajectories for each version compare with the groundtruth (green):

SD-SLAM Old SD-SLAM New
alt alt

At the end more points can be seem in the newer version, although the estimation doesnt seem to be good as expected due to the absense of structure.

An image where the yellow frame indicates that DIFODO is being used for the tracking bby the end of the sequence: alt

Test 5

Intervals without texture (seconds): [4,5]

Description: As done with the first experiment, now Ill increase the usage of DIFODO by adding some intervals where there is no image.

SD-SLAM Version rmse mean median std min max sse
Original Version 2.5749071627717384 2.080248682137874 2.257702402582676 1.5174690498843273 0.014015348729160847 4.33860309546748 2731.62052152
New Version 0.3109334973422831 0.2845629077559233 0.2816500293250801 0.12531397088512292 0.035010855459414306 0.48306729344885274 54.527316830000004

When SD-SLAMv1 reach the simulated non light at the 4th second it gets lost, stops the tracking and doesnt recover since this is a non loop closure sequence, due to this the high errors in the APE. SD-SLAMv2 can manage this non texture and presents much lower APE values. Both trajectories can be seen in the following images:

SD-SLAM Old SD-SLAM New
alt alt

Still, since this sequence environment has low structure, it can be seen how the APE from the previous experiment to this has increase, the error increases because DIFODO estimations are bad in nonstructure environments. The following images shows the trajectory and also the 3D reconstruction on each case:

SD-SLAM Old SD-SLAM New
alt alt

The old version fails to reconstruct from the first time the texture is lost while the new version can keep reconstructing.

Test Number rmse mean median std min max sse
Test 4 0.12738512593808796 0.10729214299405194 0.09413654975619196 0.06866852526453764 0.01028056418685284 0.30382922835040077 6.799100560000001
Test 5 0.3109334973422831 0.2845629077559233 0.2816500293250801 0.12531397088512292 0.035010855459414306 0.48306729344885274 54.527316830000004

Looking at the error when using the original sequence (test 4) and the test 5 (with a non texture interval of 1 second) it can be observe how the error in this case grow quickly compared to the first experiment. This is because compare to the sequence from the 1st experiment, these sequence doesnt have structure, making more difficult the tracking of DIFODO and increasing the error of the estimations.

Test 6

Intervals without texture (seconds): [4,5, 6,7, 8,9]

Description: In this test, Ive simulate a lost of light in 3 intervals of 1 second. Since this is a non structure environment the expected results is a higher APE compared to the two previous test, the more that DIFODO is used on a challenging environment the more error we are likely to expect.

SD-SLAM Version rmse mean median std min max sse
Original Version 2.5749071627717384 2.080248682137874 2.257702402582676 1.5174690498843273 0.014015348729160847 4.3380309546748 2731.62052152
New Version 0.5854459104320145 0.4761017053295302 0.5655812320082766 0.34069646347428323 0.00980255068846882 1.0465368412053158 148.40941378

The results for the Original version are the same, the new version still has a lower error than the old version but its clear that is starting to have a bigger errror the more DIFODO tracking is used, as expected. The results for the 3D reconstaction and trajectory can be seen here for the new version:

alt alt

As can be seen the more time that DIFODO tracking take place the less that the trakectory advances forward. This is because the estimations from DIFODO on this non structure environment finds hard to estimate the forward trajectory and when DIFODO is tracking it doesnt advanced as it should.

Test Number rmse mean median std min max sse
Test 4 0.12738512593808796 0.10729214299405194 0.09413654975619196 0.06866852526453764 0.01028056418685284 0.30382922835040077 6.799100560000001
Test 5 0.3109334973422831 0.2845629077559233 0.2816500293250801 0.12531397088512292 0.035010855459414306 0.48306729344885274 54.527316830000004
Test 6 0.5854459104320145 0.4761017053295302 0.5655812320082766 0.34069646347428323 0.00980255068846882 1.0465368412053158 148.40941378

As can be seen in the previous table, the more that DIFODO tracking is needed the more error that is added to the final estimation. Expected results since SD-SLAM tracking is better in most of the cases when there is texture and structure and also this environment is hard or difficult to DIFODO tracking.

Test 7

Intervals without texture (seconds): [4, 9]

Description: In this test, Ive simulate a lost of light in a long interval. Is expected to see the APE increase even more in this case, since the total amount of DIFODO tracking is 5 seconds compare to the previous 3 seconds from test 6.

SD-SLAM Version rmse mean median std min max sse
Original Version 2.5749071627717384 2.080248682137874 2.257702402582676 1.5174690498843273 0.014015348729160847 4.3380309546748 2731.62052152
New Version 0.9999865044532059 0.7783609961085416 0.7649560080923203 0.6277954832789583 0.01315294643796582 1.6505809583295208 383.98963549

The results are the same for the original algorithm as expected and there is an increase in the new version. Lets compare it against the other test and intervals.

Test Number rmse mean median std min max sse
Test 4 0.12738512593808796 0.10729214299405194 0.09413654975619196 0.06866852526453764 0.01028056418685284 0.30382922835040077 6.799100560000001
Test 5 0.3109334973422831 0.2845629077559233 0.2816500293250801 0.12531397088512292 0.035010855459414306 0.48306729344885274 54.527316830000004
Test 6 0.5854459104320145 0.4761017053295302 0.5655812320082766 0.34069646347428323 0.00980255068846882 1.0465368412053158 148.40941378
Test 7 0.9999865044532059 0.7783609961085416 0.7649560080923203 0.6277954832789583 0.01315294643796582 1.6505809583295208 383.98963549

As expected since the tracking of DIFODO is longer the error has go close the the 1 meter APE, still is better that the original algorithm but is a very high error for SLAM systems. In any case this is a extremely difficult environment where there is no strcuture and also no texture during a 5 seconds interval. The following images show visually the result of the tracking:

alt alt

Conclusions

The new SD-SLAM is much more robust that the previous version since it can track in more environments, in particular those who doesnt have texture. The new SD-SLAM can track on non texture environments, and is less likely to get lost due to fast movements where the RGB image is blurry and keypoints cannot be found. In the environments with enough texture SD-SLAM will perform as always since the modifications only are visible when there is no texture in the environment due to any reason like not enough ligth, the environment doesnt have texture at all and fast movements among others.

As a final conclusion even though DIFODO does a good job to keep the tracking, it needs enough structure to give good estimations. Its shown that the performance of DIFODO is not as good as SD-SLAM as seen in a previous entry so this should be only used as a last resort. When there is texture available is preferred to use the original ORB tracking from SD-SLAM.

The odometry from DIFODO can be improved by adding more mechanisms like ICP to reduce the drift in the DIFODO odometry from time to time. Also a full SLAM algorithm could be create for depth only that would help SD-SLAM when using DIFODO, reducing the drift also by using previous known positions.