Abstract of thesis:

The world around us is highly structured. Images not only contain various object categories with complex scenes but also include relationships between different objects or between humans and objects.  In recent years, deep learning has made a lot of achievements to the computer vision community, in both visual recognition and image generation tasks. In this thesis, we mainly leverage structure information to enhance the visual generation and understanding of these computer vision tasks.

On the visual generation side, image inpainting is the task to reconstruct the missing region in an image with plausible contents based on its surrounding context. In order to overcome the difficulty to directly learn the distribution of high-dimensional image data, we first divide the task into inference and translation as two separate steps and leverage the semantic information to help refine the textures. Second, we propose to introduce the semantic segmentation information, which disentangles the inter-class difference and intra-class variation to improve the quality of the generated images. On the visual understanding side, we study the problem of novel human-object interaction (HOI) detection, which is to recognize the relationship between humans and objects in images. We formulate it as a domain generalization problem and propose a unified framework of domain generalization to learn object-invariant features for predicate prediction, aiming at improving the generalization ability of the model to unseen scenarios. Finally, we provide some interesting research directions which can be addressed in the future.


Ph.D. experience:

I would like to express my gratitude to my advisor Professor C.-C. Jay Kuo for the continuous support of my Ph.D. study during these years. He has given me the freedom to pursue various projects without objection, and he has also provided insightful discussions about the research. His enthusiasm and persistence in research have encouraged me to conquer challenges throughout my life.

I’m fortunate that I could join Media Communication Lab (MCL). Our large network of alumni provides us with great resources in both academia and industry, and our regular meetings and reports enable us to improve our soft skills in communication and collaboration. I would like to thank my labmates for their help and encouragement in my research and daily life. The friendship with our labmates is a valuable gift of my Ph.D. experience.