Week 1: Literature review

1 minute read

Convolutional Pose Machines (Wei et al.)

Convolutional pose machines (CPMs) try to address the human pose estimation problem using convolutional neural networks (CNNs) and 2D images. They inherit their architecture from the previously released pose machines. CPMs are formed by a sequence of CNNs that produces belief maps for the location of each part of the human body (ankle, elbow, head…). They’re multistage, so the image features and the belief maps generated in the previous stage are used as input to the following one. At each stage, the estimations of the locations of each part are more refined.

Image extracted from: https://github.com/shihenw/convolutional-pose-machines-release

CPMs first stage predicts part beliefs from any local image evidence, as the receptive field in that stage is just a small patch of the original image. In subsequent stages, the effective receptive field gets bigger and bigger, allowing CPMs to learn complex and long range dependencies between parts. In that way, detecting challenging parts is easier thanks to the belief maps of the easier ones. Large receptive fields are achieved by: Pooling, at the cost of lower precision.

  • Using larger kernels, which increases the number of parameters of the model.
  • Adding more convolutional layers, at the risk of facing the vanishing gradients problem.

In CPMs paper, this last approach is implemented. To solve the problem of vanishing gradients, intermediate supervision is applied after every stage, using the L2 norm as a loss function. The ideal belief maps which are compared with the ones generated by the CNN are built synthetically with Gaussian peaks at ground truth locations of the image. The overall objective function is minimized using stochastic gradient descent to jointly train every stage.

The source code is available in: convolutional-pose-machines-release. CPMs were originally built with Caffe, but it has been ported to TensorFlow.

Other papers & repos

Categories:

Updated: