Congratulations to Shangwen Li for passing his defense on December 1, 2016. His Ph.D. thesis is entitled “Multimodal Image Retrieval and Object Classification Using Deep Learning Features”.

Abstract of thesis:

Computer vision has achieved a major breakthrough in recent years with the advancement of deep learning based methods. However, its performance is still yet to be claimed as robust for practical applications, and more advanced methods on top of deep learning architecture are needed. This work targets at using deep learning features to tackle two major computer vision problems: Multimodal Image Retrieval and Object Classification.

Multimodal Image Retrieval (MIR) aims at building the alignment between the visual and textual modalities, thus reduce the well-known “semantic gap” in image retrieval problem. As the most widely existing textual information of images, tag plays an important semantic role in MIR framework. However, treating all tags in an image as equally important may result in misalignment between visual and textual domains, leading to bad retrieval performance. To address this problem and build a robust retrieval system, we propose an MIR framework that embeds tag importance as the textual feature. In the first part, we propose an MIR system, called Multimodal Image Retrieval with Tag Importance Prediction (MIR/TIP), to embed the automatically predicted object tag importance in image retrieval. To achieve this goal, a discounted probability metric is first presented to measure the object tag importance from human sentence descriptions. Using this as ground truth, a structured object tag importance prediction model is proposed. The proposed model integrates visual, semantic, and context cues to achieve robust object tag importance prediction performance. Our experimental results demonstrate that, by embedding the predicted object tag importance, significant performance gain can be obtained in terms of both objective and subjective evaluation. In the second part, the MIR/TIP system is extended to account “scene”, which is another important aspect of image. To jointly measure the scene and object tag importance, the discounted probability metric is modified to consider the grammatical role of the scene tag in the human annotated sentence. The structured model is modified to predict the scene and object tag importance at the same time. Our experimental results demonstrate that the robustness of MIR system is greatly enhanced by our predicted scene and object tag importance.

Object classification is a long-standing problem in the computer vision field, which serves as the foundation for other problems such as object detection, scene classification, and image annotation. As the number of object categories continues to increase, it is inevitable to have certain categories that are more confusing than others due to the proximity of their samples in the feature space. In the third part, we conduct a detail analysis on confusing categories and propose a confusing categories identification and resolution (CCIR) scheme, which can be applied to any CNN-based object classification baseline method to further improve its performance.  In the CCIR scheme, we first present a procedure to cluster confusing object categories together to form a confusion set automatically.  Then, a binary-tree-structured (BTS) clustering method is adopted to split a confusion set into multiple subsets. A classifier is subsequently learned within each subset to enhance its performance. Experimental results on the ImageNet ILSVRC2012 dataset show that the proposed CCIR scheme can offer a significant performance gain over the AlexNet and the VGG16.

We are so glad to have him share his Ph.D. experience with us. Here is his sharing.

Ph.D. experience:

First, I would like to thank Professor Kuo for offering me this valuable PhD experience. I have learnt many critical thinking skills along with lifelong wisdom from Professor Kuo’s pre-seminar sharing. The weekly report mechanism also teaches me the importance of self-discipline. I admire Professor Kuo’s ability in managing such a large research group with such diversity. The alumni network is really a great asset for all members of MCL lab. Last, his dedicated attitude to research is something that I need to learn throughout my life.

PhD is definitely a rewarding experience. It not only prepares you technically for your future career, but also strengthens your mind. In my opinion, self-motivation, persistence, endurance, consideration are key factors leading to a successful PhD life. PhD life is surely full of frustration, but the self-satisfaction of passing the finishing line is indescribable. I would like to thank my labmates as well for their insight discussion and encouragement. The friendship with them is an indispensable part of PhD life.

Congratulations again to Shangwen and we wish him all the best in his future career.