Week 19: Extended TensorFlow implementation
A few weeks ago we found another repo with a CPMs’ implementation in TensorFlow:
- convolutional-pose-machines-tensorflow by @timctho.
What makes this repo interesting is that it extends the official release with new visualizations, models trained for hand pose estimation and Kalman filters. It’s important to note that it doesn’t include any model for human detection, which force the user to give as input images with properly centered and sized humans (or hands).
Visualizations
The test scripts available in the repo let you choose between three different visualization modes during live execution:
- Single: just input image with limbs drawn over it.
- HM: it displays the heatmap of each joint after the last stage, We’ve already got very similar pics.
- MULTI: combined heatmaps for each body joint between stages are shown. Displaying these heatmaps is really useful to understand how CPMs get a more refined estimation after each stage. An example can be seen in the image below.
Hand pose estimation
CPMs architecture can be applied to estimate the joints of any articulated object, as long as its parts are spatially dependent between them. Following that line of thought, a model trained with hands and a script to test it are provided within this repo. Live hand pose estimation is shown in the following picture:
Kalman filters
Probably the main improvement with respect to the original release is the introduction of Kalman filtering in the output of the pose estimation model. Kalman filtering is an algorithm that fuses information about the state of a system and (maybe) its environment and produces an estimation for the next state. It basically combines system expected and estimated measurements (e.g. position and velocity) with its uncertainties. That fusion of information allows to infere current state in a more reliable way than simply using measurements obtained if they’re noisy. Kalman filtering has been applied for smoothing output from laptop trackpads, positioning systems, etc. It makes sense to apply this technique to CPMs output as it is usually very noisy, with trembling joint locations. When Kalman filtering is applied, that effect is removed to a large extent. Besides that, the algorithm for Kalman filtering is really fast, so it can be used in real time applications like ours.
Performance
Inference time when 192 px boxsizes are evaluated is about 60 ms, although it’s only doing pose estimation (not human detection) and that it evaluates one human per frame. Inference times for previous Caffe and TensorFlow implementations, under the same restrictions, are 40 ms and 100 ms, respectively. That means the new repo is a bit slower than Caffe original release, but significantly improves the TensorFlow implemented that we’re already using. After taking a look at both TensorFlow repos code I’ve not been able to spot yet what makes the difference.