News

  • Permalink Gallery

    Congratulations to Yuhang Song for Passing His Qualifying Exam

Congratulations to Yuhang Song for Passing His Qualifying Exam

Congratulations to Yuhang Song for passing his Qualifying Exam on January 10, 2019! Yuhang’s thesis proposal is titled with “High-Quality Image Inpainting with Deep Generative Models”. His qualifying exam committee consisted of Jay Kuo (Chair), Antonio Ortega, Alexander Sawchuk, Panayiotis Georgiou, and Ulrich Neumann (Outside Member).

We invited Yuhang to talk about his thesis proposal:

Image inpainting is the task to reconstruct the missing region in an image with plausible contents based on its surrounding context, which is a common topic of low-level computer vision. Recent development in deep generative models enables an efficient end-to-end framework for image synthesis and inpainting tasks, However, existing methods are limited to fill in small holes on low-resolution images, and very often generate unsatisfying results containing easily detectable flaws. In this thesis proposal, we specifically study two image inpainting related problems: 1) finetuning the image generation textures; 2) making use of the semantic segmentation information for higher quality image inpainting.

In order to overcome the difficulty to directly learn the distribution of high-dimensional image data, we divide the task into inference and translation as two separate steps and model each step with a deep neural network. We also use simple heuristics to guide the propagation of local textures from the boundary to the hole. We show that, by using such techniques, inpainting reduces to the problem of learning two image-feature translation functions in much smaller space and hence easier to train. We evaluate our method on several public datasets and show that we generate results of better visual quality than previous state-of-the-art methods.

The second research idea is motivated by the fact that existing methods based on generative models don’t exploit the segmentation information to constrain the object shapes, which usually lead to blurry [...]

By |January 21st, 2019|News|Comments Off on Congratulations to Yuhang Song for Passing His Qualifying Exam|

MCL Research on Graph Embedding

Research on graph representation learning has gained increasing attention among researchers because many speech/text data such social networks, linguistic (word co-occurrence) networks, biological networks and many other multi-media domain specific data can be well represented by graphs. Graph representation allows relational knowledge about interacting entities to be stored and accessed efficiently. Analyzing these graph data can provide significant insights into community detection, behavior analysis and many other useful applications for node classification, link prediction and clustering. To analyze the graph data, the first step is to find an accurate and efficient graph representation. The steps of graph embedding are shown in Figure 1. The input is a graph represented by an adjacency matrix. Graph representation learning aims to embed the matrix into a latent dimension that captures the intrinsic characteristics of the original graph. For each node u in the network, we embed it to a d dimensional space that represent the feature of that node, as shown in Figure 2.

Obtaining an accurate representation for the graph is challenging because of several factors. Finding the optimal dimension of the representation is not an easy task. Representation with higher number of dimensions might preserve more information of the original graph at the cost of more space and time. The choose of dimension can also be domain-specific and depends on the type of input graph. Choosing which property of the graph to embed is also challenging given the plethora of properties graphs have.

In our research, we first focus on node prediction task in deep learning models. Specifically, we explore node classification using tree-structured recursive neural networks. Then we switch our goal to improve the accuracy and efficiency of the deep-walk based matrix factorization method.

 

— By Fenxiao(Jessica) [...]

By |January 14th, 2019|News|Comments Off on MCL Research on Graph Embedding|

MCL Research on Word Embedding

Word embedding has obtained its popularity among various NLP tasks including sentiment analysis [1], information retrieval [2] and machine translation [3]. The goal for word embedding is transferring word to vector representation which embeds both syntactic and semantic information. In the meantime, relationship between words can be distinguished though measurements of corresponding word vectors.

Although embedded vector representations of words offer impressive performance on many natural language processing (NLP) applications, the information of ordered input sequences is lost to some extent if only context-based samples are used in the training. For further performance improvement, two new post-processing techniques, called post-processing via variance normalization (PVN) and post-processing via dynamic embedding (PDE), are proposed in this work. The PVN method normalizes the variance of principal components of word vectors, while the PDE method learns orthogonal latent variables from ordered input sequences [4]. Our post-processing technique could improve the performance on both intrinsic evaluations tasks including word similarity, word analogy and outlier detection, and extrinsic evaluation tasks including sentiment analysis and machine translation.

In the meantime, we are also interested in word embedding evaluation tasks. It can be divided into two categories: intrinsic evaluation and extrinsic evaluation. We are trying to understand more on the properties of word embedding as well as their evaluation methods. It is still an on-going project.

Further developments include contextualized word embedding [5] and pre-trained language models [6] are quite popular last year. Lots of exciting work can be done along this direction and performance is much better than previous models. Also, bilingual or multi-lingual word embedding could also be an interesting research area.

 

–By Bin Wang, working with Fenxiao (Jessica) Chen, Angela Wang and Yunchen (Joe) Wang

 

Reference:

[1] Shin, B.; Lee, T.; and Choi, J. D. [...]

By |January 7th, 2019|News|Comments Off on MCL Research on Word Embedding|

Happy New Year 2019!

In 2018, several students graduated from MCL with impressive work and started a new journey of life. Meanwhile, many new blood joined our group and enjoyed a wonderful time exploring in their research areas. In this year, MCL members kept moving forward in research and published high quality papers on top journals and conferences. Year 2018 has been a fruitful year for us.

Now we are standing at the end of 2018. Wish all members have a happy new year and a more wonderful 2019!

 

Image credits:

Image 1: http://www.traderstrustedacademy.com/category/happy-new-year-2019-hd-images/, cropped and resized with white padding; Image 2: http://www.hdnicewallpapers.com/Wallpaper-Download/New-Year/Happy-New-Year-Red-Rose, cropped and resized with white padding.

By |December 31st, 2018|News|Comments Off on Happy New Year 2019!|

Merry Christmas!

May your Christmas sparkle with moments of love, laughter and goodwill. And may the year ahead be full of contentment and joy. Wish all our fellows a Merry Christmas!

By |December 24th, 2018|News|Comments Off on Merry Christmas!|

MCL Research on Domain Adaptation

Trained deep learning models do not generalize well if the testing data has a different distribution from the training data set. For instance, in medical image segmentation, the MRI and CT scan of the same object look very different. If we simply train a model on the MRI scans, it is very likely that the model will not work on the CT scans. However, it is very expensive and time-consuming to manually label different data sets. Therefore, we wish to transfer the knowledge from a labeled training set to an unlabeled testing data with a different distribution. Domain adaptation can help us achieve this purpose.
Domain adaptation can be categorized into three types based on the availability of target domain data: supervised, semi-supervised, unsupervised [1]. In supervised domain adaptation, a limited amount of labeled target domain data is available. In the semi-supervised setting, unlabeled target domain data as well as a small amount of labeled target domain data is available. In the unsupervised setting, only unlabeled target domain data is available. Unsupervised domain adaptation is an ill-posed problem since we do not have labels for the target domain data. Proper assumptions on the target domain data are important for performing unsupervised domain adaptation. In our research, we focus on the unsupervised domain adaptation. Unsupervised domain adaptation can be applied to many computer vision problems, including classification, segmentation, and detection. Currently, we focus our experiment on classification.
–By Ruiyuan Lin

 

Reference:
[1] M. Wang and W. Deng, “Deep visual domain adaptation: A survey,” Neurocomputing, 2018.
Image Credits:
Anon, (2018). Available at: http://ai.bu.edu/visda-2018/assets/images/domain-adaptation.png [Accessed 16 Dec. 2018].
X. Peng,  B. Usman,  N. Kaushik,  J. Hoffman, D.  Wang, and K. Saenko, “Visda:  The visual domain adaptation challenge,” 2017.

By |December 16th, 2018|News|Comments Off on MCL Research on Domain Adaptation|

MCL Research on Visual Dialogues

The task of Visual Dialogue involves a conversation between an agent system and a human end-user regarding visual information presented. The conversation consists of multiple rounds of questions and answers. The key challenge for the agent system is to answer the questions of human users with meaningful information while keeping the conversation flow contiguous and natural. The current visual dialogue systems can be divided into two tracks, generative models and discriminative models. The discriminative models cannot directly generate a response but choose a response out of a pool of candidate responses. Although the discriminative models have achieved surprising results, they are usually not applicable in real scenarios where no candidate response pool is available. On the other hand, the generative models can directly generate a response based on the input information. However, most generative models based on maximum likelihood estimation (MLE) approach suffer from the tendency of generating generic responses.

We present a novel approach that incorporates a multi-modal recurrently guided attention mechanism with a simple yet effective training scheme to generate high quality responses in the Visual Dialogue interaction model. Our attention mechanism combines attentions globally from multiple modalities (e.g., image, text questions and dialogue history), and refines them locally and simultaneously for each modality. Generators using typical MLE-based methods only learn from good answers, and consequently tend to generate safe or generic responses. The new training scheme with weighted likelihood estimation (WLE) penalizes generic responses for unpaired questions in the training and enables the generator to learn from poor answers as well.

On benchmark dataset, our proposed Visual Dialogue system demonstrates state-of-the-art performance with improvement of 5.81% and 5.28 on recall@10 and mean rank, respectively.

–By Heming Zhang

 

By |December 9th, 2018|News|Comments Off on MCL Research on Visual Dialogues|

MCL Research on Re-enforcement Learning

Imagine a robot navigating across rooms following human instructions: “Turn left and take a right at the table. Take a left at the painting and then take your first right. Wait next to the exercise equipment”, the agent is expected to first execute the action “turn left” and then locates “the table” before “taking a right”. However, in practice, the agent might well turn right in the middle of the trajectory before a table is observed, in which case the follow-up navigation would definitely fail. Human on the other hand, has the ability to relate visual input with language semantics. In this example, human would locate visual landmarks such as table, painting, exercise equipment before making a decision (turn right, turn left and stop). We endow our agent with similar reasoning ability by equipping our agent with a synthesizer module that implicitly aligns language semantics with visual observations. The poster is available online: https://davidsonic.github.io/summary/Poster_3d_indoor.pdf and the demonstration video is available at https://sites.google.com/view/submission-2019.
–By Jiali Duan

By |December 2nd, 2018|News|Comments Off on MCL Research on Re-enforcement Learning|

MCL Research on Incremental Learning

Humans can accumulate and maintain the knowledge learned from previous tasks and use it seamlessly in learning new tasks and solving new problems — learning new concepts over time is a core characteristic of human learning. Therefore, it is desirable to have a computer vision system that can learn incrementally about new classes when training data for them becomes available, as this is a necessary step towards the ultimate goal of building real intelligent machines that learn like humans.

Despite the recent success of deep learning in computer vision for a broad range of tasks, classical training paradigm of deep models is ill-equipped for incremental learning (IL). Most deep neural networks can only be trained in batch mode in which the complete dataset is given and all classes are known prior to training. However, the real world is dynamic and new categories of interest can emerge over time. Re-training a model from scratch whenever a new class is encountered is prohibitively expensive due to the huge data storage requirements and computational cost.  Directly fine-tuning the existing model on only the data of new classes using stochastic gradient descent (SGD) optimization is not a better approach either, as this might lead to the notorious catastrophic forgetting effect, which refers to the severe performance degradation on old tasks.

Existing IL approaches attempting to overcome catastrophic forgetting tend to produce a model that is biased towards either the old classes or new classes, unless with the help of exemplars of the old data. To address this issue, we propose a class-incremental learning paradigm called Deep Model Consolidation (DMC), which works well even when the original training data is not available. The idea is to train a model on the [...]

By |November 25th, 2018|News|Comments Off on MCL Research on Incremental Learning|

MCL Celebrated Thanksgiving Holiday

On November 22, 2018, MCL members participated in the annual Thanksgiving Luncheon at Kirin Buffet. There was a wide variety of dishes and the food was very delicious. All the members enjoyed the buffet quite well, and had a wonderful time chatting with each other.

The Thanksgiving Luncheon has been a tradition of MCL for about 20 years. The whole group is like a warm and happy family gathering together. It’s also a good chance for us to have a rest after a busy semester. Thank Professor Kuo for holding this event and thank Yuhang for organizing it. Happy Thanksgiving to everyone!

 

By |November 22nd, 2018|News|Comments Off on MCL Celebrated Thanksgiving Holiday|