Congratulations to Shangwen Li for passing his defense on December 1, 2016. His Ph.D. thesis is entitled “Multimodal Image Retrieval and Object Classification Using Deep Learning Features”.
Abstract of thesis:
Computer vision has achieved a major breakthrough in recent years with the advancement of deep learning based methods. However, its performance is still yet to be claimed as robust for practical applications, and more advanced methods on top of deep learning architecture are needed. This work targets at using deep learning features to tackle two major computer vision problems: Multimodal Image Retrieval and Object Classification.
Multimodal Image Retrieval (MIR) aims at building the alignment between the visual and textual modalities, thus reduce the well-known “semantic gap” in image retrieval problem. As the most widely existing textual information of images, tag plays an important semantic role in MIR framework. However, treating all tags in an image as equally important may result in misalignment between visual and textual domains, leading to bad retrieval performance. To address this problem and build a robust retrieval system, we propose an MIR framework that embeds tag importance as the textual feature. In the first part, we propose an MIR system, called Multimodal Image Retrieval with Tag Importance Prediction (MIR/TIP), to embed the automatically predicted object tag importance in image retrieval. To achieve this goal, a discounted probability metric is first presented to measure the object tag importance from human sentence descriptions. Using this as ground truth, a structured object tag importance prediction model is proposed. The proposed model integrates visual, semantic, and context cues to achieve robust object tag importance prediction performance. Our experimental results demonstrate that, by embedding the predicted object tag importance, significant performance gain can be obtained in terms of [...]