本周阅读了《Stacked Hourglass Networks for Human Pose Estimation》,作者在人体姿态估计问题上提出了一种沙漏型的网络布局。特性在所有尺度上进行处理,并整合,以最有效地捕捉与身体相关的各种空间关系。作者展示了,重复进行自底向上和自顶向下的处理,并共同中间监督对提拔网络的性能至关紧张。作者将该架构称为 “堆叠沙漏” ,该网络连续执行池化和上采样的步骤,终极产生一组猜测。该方法在FLIC 数据集和MP Ⅱ数据集上的实现了当时最优的结果。
Abstracts
Reading "Stacked Hourglass Networks for Human Pose Estimation" this week, the authors propose an hourglass-type network architecture on the problem of human pose estimation. Features are processed at all scales and integrated to most effectively capture the various spatial relationships associated with the body. The authors show that repeated bottom-up and top-down processing, combined with intermediate supervision, is critical to improving the performance of the network. The authors refer to this architecture as a "stacked hourglass". The network performs the steps of pooling and upper envelope continuously, resulting in a set of predictions.
This method achieves the best results on FLIC data set and MP ⅱ data set. 简介
The network splits and produces a set of heatmaps (outlined in blue) where a loss can be applied. A 1x1 convolution remaps the heatmaps to match the number of channels of the intermediate features. These are added together along with the features from the preceding hourglass.
在整个网络中,作者共使用了8个hourglass模块,必要注意的是,这些hourglass模块的权重不是共享的,并且所有的模块都基于雷同的ground truth添加了损失函数。下面先容练习过程的细节。
关于中间监督loss的盘算,论文中是这么说的:
Predictions are generated after passing through each hourglass where the network has had an opportunity to process features at both local and global contexts. Subsequent hourglass modules allow these high level features to be processed again to further evaluate and reassess higher order spatial relationships. 所以,每个Hourglass Module的loss是单独盘算的,如许使得后面的Hourglass Module可以或许更好地再评估。 练习过程细节
作者在FLIC和MPII Human Pose数据集上进行了练习与评估。这篇论文只能用于单人姿态检测,但是在一张图片中经常有多个人,解决办法就是只对图片正中心的人物进行练习。将目标人物裁剪到正中心后再将输入图片resize到256×256。为了进行数据增量,作者将图片进行了旋转(+/-30度)、scaling(.75-1.25)。
网络使用RMSprop进行优化,学习率为2.5e-4. 测试的时候使用原图及其翻转的版本进行猜测,结果取匀称值。网络对于关节点的猜测是heatmap的最大激活值。损失函数使用均方误差(Mean Squared Error,MSE)来比较猜测的heatmap与ground truth的heatmap(在节点中心周围使用2D高斯分布,标准差为1)