My research focus on sequence learning systems (whose input can be language, speech, video, etc.) and answering the following questions: what is memory and how we can build a system that can learn efficiently by remembering? Does visual imagination can help and if yes, how we can build a system that handles both language and vision? The foundation we built for the former question are two computing architectures: one is Extended Long Short-term Memory (ELSTM) for Recurrent Neural Network (RNN) modeling and the other is Tree-structured Multi-stage Principal Component Analysis (TMPCA) for language embedding. They are derived from the perspective of memory as a system function and memory as compact information representation (or dimension reduction (DR)) respectively. From the first perspective, we did detailed analysis of RNN cells model, demystified their model properties and concluded that all existing RNN cells (RNN, LSTM and GRU) suffer memory decay. The new proposal of ELSTM does not have such limitation. It has outstanding performance for complex language tasks. From the second perspective, PCA-based technique is utilized to do sequence embedding for the sake of maximizing input/output mutual information and explainable machine learning. The proposed TMPCA computes much faster than ordinary PCA while retaining much of the other merits. To answer the latter question, we argued that visual information can benefit the language learning task by increasing the system’s mutual information and successfully deployed a Transformer-based multi-modal NMT system that is trained/fine-tuned unsupervisedly on image captioning dataset. It is one of the first such systems ever developed for unsupervised MT and the new UMNMT system for the first time shows that a multi-modal solution can outperform the text-only ones significantly.
Congratulations to Yuanhang Su for Passing His Defense
Congratulations to Yuanhang Su for passing his PhD defense on April 16, 2019. His PhD thesis is entitled “Theory of Memory-enhanced Neural Systems and Image-assisted Neural Machine Translation”.
Abstract:
Ph.D. experience:
The process of asking question, finding the problem and then the solution is the food for a curious brain. Such a brain needs to think deep and big to grow and it takes a Ph.D. program for it to grow into maturity. Throughout the journey, each neuron in the brain will “stretch” its dendrites to receive impulses from other bigger brains. The impulses then need to be “non-linearly activated” to be internalized as nutrition. A healthy brain will always stay hungry, fearless in taking on established “axons” and meticulously in creating new ones. The lifetime of this brain may not last long, like an octopus whose brain will perish within 365 days. When the final day comes, the only wish of this brain is the impulse it ever emitted can have a life of its own.
BY Yuanhang Su
Share This Story, Choose Your Platform!
About the Author: Xuejing Lei
Thesis Title: Green Image Generation and Label Transfer Techniques, January 2024.
Employment: Amazon.com, Inc., Palo Alto, California, USA
The 175th PhD from MCL