Forward Propagation, Backward Regression and Pose Association for Hand Tracking in the Wild

1Stony Brook University 2University at Buffalo, 3Tulip Interfaces 4VinAI

HandLer. Tracking hands in unconstrained environments.


We propose HandLer, a novel convolutional architecture that can jointly detect and track hands online in unconstrained videos. HandLer is based on Cascade-RCNN with additional three novel stages. The first stage is Forward Propagation, where the features from frame t-1 are propagated to frame t based on previously detected hands and their estimated motion. The second stage is the Detection and Backward Regression, which uses outputs from the forward propagation to detect hands for frame t and their relative offset in frame t-1. The third stage uses an off-the-shelf human pose method to link any fragmented hand tracklets. We train the forward propagation and backward regression and detection stages end-to-end together with the other Cascade-RCNN components.To train and evaluate HandLer, we also contribute YouTube-Hand, the first challenging large-scale dataset of unconstrained videos annotated with hand locations and their trajectories. Experiments on this dataset and other benchmarks show that HandLer outperforms the existing state-of-the-art tracking algorithms by a large margin.



      title={Forward Propagation, Backward Regression and Pose Association for Hand Tracking in the Wild},
      author={Mingzhen Huang and Supreeth Narasimhaswamy and Saif Vazir and Haibin Ling and Minh Hoai},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},