Congratulations to Weihao Gan for passing his qualifying exam on January 11, 2016. The title of his Ph.D. thesis proposal is “Advanced Online Object Tracking Techniques by Exploiting Spatial and Temporal Information”. His qualifying exam committee consisted of Jay Kuo, Antonio Ortega, Keith Chugg, Panayiotis Georgiou and Ulrich Neumann.

Abstract of thesis proposal:

Online object tracking is one of the fundamental computer vision problems. It is commonly used in real world applications such as traffic control in video surveillance, autonomous vehicle, robotic navigation, medical imaging, etc. It is a very challenging problem due to multiple time-varying attributes in video sequences. In this research, we attempt to achieve online object tracking using both spatial and temporal cues with two novel methods.

First, we develop a new method, called the “temporal prediction and spatial refinement (TPSR)” tracker, to integrate spatial and temporal cues effectively. The TPSR tracking system consists of three cascaded modules: pre-processing (PP), temporal prediction (TP) and spatial refinement (SR). Illumination variation and shaking camera movement are two challenging factors in a tracking problem. They are compensated in the PP module. Then, a joint region-based template matching (TM) and pixel-wised optical flow (OF) scheme is adopted in the TP module, where the switch between TM and OF is conducted automatically. These two modes work in a complementary manner to handle different foreground and background situations. Finally, to overcome the drifting error arising from the TP module, the bounding box location and size are finetuned using the local spatial information of the new frame in the SR module.

Next, we apply the deep neural network architecture to the online object tracking problem. We have made several major improvements on the state-of-the-art multi- domain network (MDNet) tracker. The enhanced MDNet (EMDNet) tracker not only keeps the advantages of the original MDNet in exploiting the spatial cues but also makes full use of the temporal cues. The EMDNet is a multi-domain learning framework and trained by the Stochastic Gradient Descent (SGD) method. To exploit the spatial cues fully, techniques such as bounding box regression and hard minibatch mining are adopted. Furthermore, the temporal optical flow map is generated to provide valuable cues for target motion prediction and segmentation. With the proposed EMDNet, we can take care of very challenging tracking cases such as articulated motion and fast motion, which the original MDNet cannot handle satisfactorily. Extensive experimental results are given to demonstrate the advantages of the EMDNet over the MDNet.