Summary, range flow a long but interesting read

11 minute read

Its been a while since I last wrote an entry about the advances of the project, so this is going to be a large one. Ill like to use this as a point to remind what Im trying to achieve and summarize everything I have learned from the beginning up to this day. Also some research has been going on, and some things are clearer.

What are we trying to achieve?

Lets start from the begginning:

So, the objective here is to create a SLAM algorithm (Simultaneous Location and Mapping) based on depth only. A lot of work has been done in the visual SLAM side (RGB as input) and with the increase of popularity of RGBD sensors that Kinect brought to the market also SLAM algorithms using RGBD has been developed.

But, not so many research and development of depth only SLAM algorithms has been done, but is starting to take form in the past years. The advantages of a Depth only SLAM algorithm against visual SLAM algos is that depth based doesnt need light at all (if a TOF sensor) and also work well on low texture environments. On the other hand depth based only need geometric information, and wont work properly on planar environments, and usually these sensors doesnt have a long range.

Realsense D435 is the chosen sensor to work with, which have given me a few problems(most due to my computer I guess). But is working at least. What realsense D435 provides are “depth images” of up to 60 FPS on 720p and 30FPS on 1080p. Thats a huge amount of information per second.

The idea is to use the D435 on a drone and use the SLAM algorithm so it can navigate autonomously.

Researching state of the art depth SLAM

Not a lot of SLAM algorithms based on depth only are available in comparison with RGD and RGBD SLAM.

Depth SLAM papers

I read a lot of papers, some of them were not usefull some were and some lead to the ones that I list here. Also I found out about a lot of good techniques or interesting terms in SLAM environments like: ICP, GICP, odometry, loop-closure, Bundle Adjustment, graph based maps, g2o… Some of this are related to visual SLAM only, others to general SLAM we could say.

This are the papers that I found more interesting,

1. Depth Camera Based Indoor Mobile Robot Localization and Navigation

This paper is really interesting since it creates an SLAM algorithm based on range images. Easy to read, good idea, the only problem is that is created for a ground robot (so no 6DOF) and the map created is on 2D. Still I recommend to read it as a start in the SLAM world.

2. 3D Pose Estimation and Mapping with Time-of-Flight Cameras

First of all, is not a SLAM algorithm although it talks about the problem of pose estimation and mapping. The topic here is always for RGBD not only depth but still is a good point to start getting into SLAM. Last but not least, is a great source to know about RGBD sensors, problems, characteristics…

3. Direct Depth SLAM

Direct Depth SLAM is exactly what we were looking for. Is a SLAM based only algorithm, 6DOF using range images (2D images). I could talk a lot about this algorithm but this is going to be large enough so Ill only say that to solve the pose estimation between two consecutive range images, they used the range flow. Although the authors dont directly say it, they probably used DIFODO algorithm to retrive egomotion from the camera. They say that their range flow implementation to get the camera motion is “quite” similar to the one in Fast Visual Odometry for 3D-range sensors. DIFODO is an odometry algorithm that is depth based only and uses the range flow (among others techniques) to create a fully odometry algorithm.

Odometry

Odometry is the concept of estimating motion (translation and rotation) from data to determing the position and orientation of a vehicle. A concept well known in robotics due to autonomous navigation, is the first problem we face when developing an SLAM algorithm. Is the “L” is SLAM, Localization. Visual odometry is the concept used when the input data are images. In this case range flow (and some ICP from time to time) is the odometry method used in DDS (Direct Depth SLAM). Concretely they appear to used DIFODO from Fast Visual Odometry for 3D-range sensors which uses range flow mainly. NOTE: The first time I research I didnt realised that the authors from DDS may have used or implement DIFODO so I started looking at range flow until I finally get to DIFODO again later.

While DIFODO seems the way to go there are some other odometry based only algorithms that are worth to check in the future:

From Direct Depth Slam we found:

  • Sparse Depth Odometry (SDO) [1]: Uses sparse geometric features (something like SIFT,SURF,ORB but with geometric information)
  • SDF Tracker [2]
  • DIFferential ODOmetry(DIFODO) [3]
  • Kinect Fusion [4]: There are a lot of derivatives works from this KinectFusion. KinectFusion Implements TSFD[36] and ICP, needs a GPU for real time*

Other references:

  • Yousif[37]: real time 3d registration and mapping, low cost RGBD sensor
  • Taguchi[38]: real time SLAM for ¿hand-held 3d sensor?
  • Renato[39]: Dense Planar SLAM (it seems to work with geometric planes)

Point cloud registration:

  • ICP (Iterative Closest Points)
  • GICP (Generalized ICP)
  • NDT (Normal Distribution Transform) Point Cloud registration can be used for pose estimation also. The bad thing about this kind of solution is that are dependant of a good initialization due to their iterative nature. Also they dont work well with with large displacements, and also they tend to be slow when since they are iterative.

From Fast Visual Odometry for 3-D Range Sensors (DIFODO paper)

  • GICP[1]
  • RDVO[2]

Range Flow

Range Flow is a concept which can be used to retrieve movement within range consecutive images or range sequences. Its the idea of optical flow applied to range images instead of RGB images.

In order to understand better the range flow I recommend to start with the optical flow concept which I had to refresh after having learn it on the Computer Vision Master Im currently in.

I recommend two articles which I found interesting about optical flow:

  1. Image motion Constraint Equation -> http://www.cim.mcgill.ca/~langer/546/8-imagemotion-2-notes.pdf
  2. Evaluating the Range Flow Motion Constraint by Hagen Spies, John L. Barron

On previous entries I already talked about this articles briefly.

So optical flow allows to either: Know the movement of objects or the movement of the camera. If our camera is not moving the differences in light intensity in our image will be due to the movements of objects on it (applying certain restrictions like constant sources of light…) On the other side if we consider a rigid environment and the camera is the one that is moving, then the changes of intensity in the image are due to the movement of the camera. This leads to “pose estimation”.

Range flow is the same concept, applied to depth images instead of intensity images. As with the optical flow, it can be used either to get the objects movement (with a static camera) ot to retrieve the motion of the camera (with a rigid environment).

There is a lot of information on how to apply range flow to estimate the movement of objects but not so many on how to estimate the camera motion.

The fathers of the range flow concept (at least the ones who have more papers in this subject) are:

  • John L. Barron: http://www.csd.uwo.ca/faculty/barron/
  • Hagen Spies

If you search for range flow you will find that almost 50% of the papers belongs to them (although all papers are quite similar in my opinion)

Some range flow reads that are usefull are:

  1. Evaluating the Range Flow Motion Constraint
  2. Range Flow Estimation
  3. Range Flow: The 3D Movement of Deformable Surfaces (Like a summary or condense read of Range Flow Estimation)
  4. Odometry for Autonomous Navigation in GPS denied environments (Thesis): This redirects me to Fast Visual Odometry for 3-D Range Sensors again.

These previous readings are good to familiarized with the range flow concept but not in a odometry way. All of these lectures are about retrieving the movement from objects with a static camera (all but the 4th lecture which is SLAM depth only but for LIDAR and doesnt use range flow, although is really interesting). The following lectures or papers are related to the pose estimation or odometry problem using range flow:

Fast Visual Odometry for 3-D Range Sensors (DIFODO)

Is probably the odometry algorithm used by DDS or an implementation of this. As I previously said the range flow concept deviates my attention from this, but finally I was redirected here again and later I found by rereading DDS that this paper was a reference and that DDS was based or implements this algorithm.

DIFODO VIDEO


Before digging into this algorithm, which is based on range flow, Ill like to point to some resources of interest of DIFODO:

First this is an algorithm develop by the MAPIR group from the University of Malaga.

The author: Link

The paper: DIFODO paper

Also they have a platform for robotics developed in C++ where they seem to have DIFODO algorithm implemented (To be checked):

MRPT Platform

DIFODO in MRPT

Datasets to evaluate DIFODO

Github code of DIFODO

MRPT in ROS


Well, after this, now is time to dive deep into the range flow implementation of DIFODO. This paper is based on “Rigid Body motion from range image sequences” from 1990. In this paper Horn and Harris applied for the first time (at least that I know) the range flow idea. They dont even call it range flow, they just apply optical flow to sequences of depth images from a low resolution sensor (64x256 pixels, remember this is 1990). They used it to estimate the 6DOF of a sensor mounted on a land vehicle that drives through a terrain with elevation. Here they proposed the elevation rate constraint equation, better known today as range flow equation.

Omar Garrido

To follow the equations and get to know everything better I recommend to read both papers, in mi opinion, the DIFODO paper is easier to follow and read regarding this part, except for one part.

On DIFODO paper equation (5) seems to appear out of nowhere:

Omar Garrido

So in this particular case we have to go back at “Rigid Body motion from range image sequences”. On page 5 of this paper the equations of range flow appear along equation (5) of DIFODO but with a different notation. The authors here proposed the movement of a point “R” on the surface regarding the sensor movement. The movement of this point respect the camera is the opposite movement of the camera as can be seen in the following equations:

Omar Garrido

Still, I dont how to get to the last equation, but they recommend in this part the following paper “Direct Method for recovering motion”. 1988, Horn again using the optical flow to recover motion from a static environment (I havent read this paper in detail so I may be wrong in some things, go check it out by yourself just in case).

On the second page they proposed the same equation as in the previous image. They develop this equation but they dont get to the (X,Y,Z) equation.

Ill keep researching about this part.

The following part is substituting that equation in the range flow constrain equation to get the following:

Omar Garrido

On “Rigid Body motion from range images” they used this equation for each pixel of the image and used a least square minimization approach to get the translation and rotation, the motion of the camera. They also applied some linear algebra to solve the linear system obtained from the minimization approach. In the range flow equation they impose two restriction:

  1. They used an analogy of brightness change constraint equation from optical flow called range rate constraint equation see A.2. Which can be seen as smooth surfaces or planes are required.
  2. The other is that the enviromment is static or rigid, so is assumed that all the movement within frames in the range images are due to the camera movement (read Direct Methods for recovering motion Go to the 5-6th pages of the paper to know in detail. (The appendixes also contain good information)

That is, they estimate the motion of a camera with a low resolution camera on a rigid environment. But as everything in this world the solution is far from perfect, thats when the DIFODO paper comes again and improves this approach by going further.

Going back to DIFODO, after the equation from the previous image, they relate the pixel coordinates with the spatial coordinates (X,Y,Z) referenced to the camera frame with the pin hole model.

Omar Garrido Omar Garrido

Finally they substitutes these equations in the equation (6) to get the final equation:

Omar Garrido

In this case they also apply a least square solution but using a weighted least square solution. Also they add the used of a gaussian pyramid to solve the problem in different resolutions to avoid the problem of small and big displacements.

I will stop here, this is already too long and I still have to investigate this in detail. Ill try to use the DIFODO implementation from the MRPT platform and also Ill try to reproduce the approach from Rigid body motion from a range image sequences, which seems not too difficult. Also I still have to know how to get to the equation of (X,Y,Z)’, to fully understand the process of range flow.

See you next time with more interesting info.