Let us hear what he has to say about his defense and an abstract of his thesis.
The success of machine learning algorithms depends heavily on data representation techniques. Natural language is one kind of unstructured data source which does not have a natural numerical representation as to its original form. In most natural language processing tasks, a distributed representation of text entities is necessary. In this dissertation, we investigate and propose representation learning techniques to learn embeddings for 1) words, 2) sentences and 3) knowledge graph entities.
We will first give an overview of our embedding research, including word embedding evaluation and enhancement, sentence embedding from both non-contextualized and contextualized word embeddings. Then, we will focus on two recent works: InductivE and DomainWE. InductivE is proposed to handle the inductive learning problem on commonsense knowledge graph completion task. The entity embeddings are directly computed from its textual descriptions and an iterative training framework is proposed to enhance unseen entities with more structural information. DomainWE aims to distill knowledge from large pre-trained language models and synthesis as static word embeddings. As an efficient and robust alternative to large pre-trained language models, DomainWE has a smaller model size which is particularly important for deployment purposes, yet demonstrates better performance comparing with generic word embeddings.
I would like to thank Professor Kuo who guides me step by step through the path from a fresh graduate to an academic researcher. Through the past years, I have learned a lot from Professor Kuo and our MCL members, which not only include academic skills but also life principles. One of the most important things for the Ph.D. study is to keep self-motivated and try your best to achieve your initial dreams. Finally, I wish the best to all our MCL family members a great future in both studies and career developments.