APPTracker: Improving Tracking Multiple Objects in Low-Frame-Rate Videos


Tao Zhou1
Wenhan Luo2
Zhiguo Shi1
Jiming Chen1
Qi Ye1

1Zhejiang University, 2Sun Yat-sen University

ACM MM 2022

[Paper]
[Video (On YouTube)]

Abstract

Multi-object tracking (MOT) in the scenario of low-frame-rate videos is a promising solution for deploying MOT methods on edge devices with limited computing, storage, power, and transmitting bandwidth. Tracking with a low frame rate poses particular challenges in the association stage as objects in two successive frames typically exhibit much quicker variations in locations, velocities, appearances, and visibilities than those in normal frame rates. In this paper, we observe severe performance degeneration of many existing association strategies caused by such variations. Though optical-flow-based methods like CenterTrack can handle the large displacement to some extent due to their large receptive field, the temporally local nature makes them fail to give correct displacement estimations of objects whose visibility flip within adjacent frames. To overcome the local nature of optical-flow-based methods, we propose an online tracking method by extending the CenterTrack architecture with a new head, named APP, to recognize unreliable displacement estimations. Then we design a two-stage association policy where displacement estimations or historical motion cues are leveraged in the corresponding stage according to APP predictions. Our method, with little additional computational overhead, shows robustness in preserving identities in low-frame-rate video sequences. Experimental results on public datasets in various low-frame-rate settings demonstrate the advantages of the proposed method.

Overview of our method.

Comparison between our method and several state-of-the-art methods on the MOT17 validation set with different frame rate inputs. We generate low-frame-rate videos by extracting frames from original videos through fixed intervals. A higher sampling intervals leads to a lower frame rate.

Qualitative results of our method. Each row shows three consecutive frames. Bounding boxes with different colors represent different identities. We highlight recognized appearing objects by indicating the predicted APP scores on the right top of corresponding bounding boxes. Pink arrows indicate displacement estimations, which are not reliable for appearing objects. White arrows indicate the updated velocity estimations. It can be observed that emerging objects are accurately recognized and the identities of objects involved in occlusions are preserved.




BibTeX

@inproceedings{zhou2022apptracker,
title={APPTracker: Improving Tracking Multiple Objects in Low-Frame-Rate Videos.},
author={Tao Zhou, Wenhan Luo, Zhiguo Shi, Jiming Chen, Qi Ye},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
year={2022}}

Contact: Tao Zhou, Qi Ye

Page template borrowed from gkioxari