Visual tracking is an important task in computer vision and has been integrated into many applications such as autonomous vehicles. Single object tracking (SOT) is the fundamental problem of visual tracking where the tracking is performed on one specific object in the testing video. The ground truth bounding box is provided in the initial frame and the tracker needs to track the object location in all later frames. There could be various challenges such as occlusion, fast motion, viewpoint change, background clutters and so on. The SOT problem has been investigated for a long time, and supervised trackers that are trained offline with large scale labeled data dominate the leader board. The model complexity and computational cost of those large deep neural networks are unaffordable for edge devices. In recent years, unsupervised learning from unlabeled data and lightweight structures start to attract the attention of researchers.

Aiming at lightweight tracking with low computational complexity, we proposed a tracker that learns from the testing sequence per se and does not require any offline training, which further extended the boundary of tracking with low level vision features with comparable performance with recent unsupervised state-of-the-art deep trackers. The number of parameters and the required flops during inference time are much smaller than those of deep trackers. Some sample results are provided in the figures above. Our tracker is able to produce flexible bounding boxes and goes back to the object after occlusion or disappearance. We hope that this work can contribute to the innovations of lightweight tracking and help with better understanding of the roles of feature representations and offline training.

–Zhiruo Zhou