USC Media Communications Lab

Permalink Gallery
Congratulations to Ye Wang for Passing His Qualifying Exam

Congratulations to Ye Wang for Passing His Qualifying Exam

Congratulations to Ye Wang for Passing His Qualifying Exam on 01/16/2019! His thesis proposal is titled with “VIDEO OBJECT SEGMENTATION AND TRACKING WITH DEEP LEARNING TECHNIQUES”. His Qualifying Exam committee includes: Jay Kuo (Chair), Sandy Sawchuk, Antonio Ortega, Shri Narayanan, and Joseph Lim.

Abstract of thesis proposal:

Unsupervised video object segmentation is a crucial application in video analysis without knowing any prior information about the objects. It becomes tremendously challenging when multiple objects occur and interact in a given video clip. In this thesis proposal, a novel unsupervised video object segmentation approach via distractor-aware online adaptation (DOA) is proposed. DOA models spatial-temporal consistency in video sequences by capturing background dependencies from adjacent frames. Instance proposals are generated by the instance segmentation network for each frame and then selected by motion information as hard negatives if they exist and positives. To adopt high-quality hard negatives, the block matching algorithm is then applied to preceding frames to track the associated hard negatives. General negatives are also introduced in case that there are no hard negatives in the sequence and experiments demonstrate both kinds of negatives (distractors) are complementary. Finally, we conduct DOA using the positive, negative, and hard negative masks to update the foreground/background segmentation. The proposed approach achieves state-of-the-art results on two benchmark datasets, DAVIS 2016 and FBMS-59 datasets.

In addition, this thesis proposal reports a visible and thermal drone monitoring system that integrates deep-learning-based detection and tracking modules. The biggest challenge in adopting deep learning methods for drone detection is the paucity of training drone images especially thermal drone images. To address this issue, we develop two data augmentation techniques. One is a model-based drone augmentation technique that automatically generates visible drone images with a bounding box [...]

By Wei Wang|January 28th, 2019|News|Comments Off|

Permalink Gallery
Congratulations to Yuhang Song for Passing His Qualifying Exam

Congratulations to Yuhang Song for Passing His Qualifying Exam

Congratulations to Yuhang Song for passing his Qualifying Exam on January 10, 2019! Yuhang’s thesis proposal is titled with “High-Quality Image Inpainting with Deep Generative Models”. His qualifying exam committee consisted of Jay Kuo (Chair), Antonio Ortega, Alexander Sawchuk, Panayiotis Georgiou, and Ulrich Neumann (Outside Member).

We invited Yuhang to talk about his thesis proposal:

Image inpainting is the task to reconstruct the missing region in an image with plausible contents based on its surrounding context, which is a common topic of low-level computer vision. Recent development in deep generative models enables an efficient end-to-end framework for image synthesis and inpainting tasks, However, existing methods are limited to fill in small holes on low-resolution images, and very often generate unsatisfying results containing easily detectable flaws. In this thesis proposal, we specifically study two image inpainting related problems: 1) finetuning the image generation textures; 2) making use of the semantic segmentation information for higher quality image inpainting.

In order to overcome the difficulty to directly learn the distribution of high-dimensional image data, we divide the task into inference and translation as two separate steps and model each step with a deep neural network. We also use simple heuristics to guide the propagation of local textures from the boundary to the hole. We show that, by using such techniques, inpainting reduces to the problem of learning two image-feature translation functions in much smaller space and hence easier to train. We evaluate our method on several public datasets and show that we generate results of better visual quality than previous state-of-the-art methods.

The second research idea is motivated by the fact that existing methods based on generative models don’t exploit the segmentation information to constrain the object shapes, which usually lead to blurry [...]

By Wei Wang|January 21st, 2019|News|Comments Off|

Permalink Gallery
MCL Research on Graph Embedding

MCL Research on Graph Embedding

Research on graph representation learning has gained increasing attention among researchers because many speech/text data such social networks, linguistic (word co-occurrence) networks, biological networks and many other multi-media domain specific data can be well represented by graphs. Graph representation allows relational knowledge about interacting entities to be stored and accessed efficiently. Analyzing these graph data can provide significant insights into community detection, behavior analysis and many other useful applications for node classification, link prediction and clustering. To analyze the graph data, the first step is to find an accurate and efficient graph representation. The steps of graph embedding are shown in Figure 1. The input is a graph represented by an adjacency matrix. Graph representation learning aims to embed the matrix into a latent dimension that captures the intrinsic characteristics of the original graph. For each node u in the network, we embed it to a d dimensional space that represent the feature of that node, as shown in Figure 2.

Obtaining an accurate representation for the graph is challenging because of several factors. Finding the optimal dimension of the representation is not an easy task. Representation with higher number of dimensions might preserve more information of the original graph at the cost of more space and time. The choose of dimension can also be domain-specific and depends on the type of input graph. Choosing which property of the graph to embed is also challenging given the plethora of properties graphs have.

In our research, we first focus on node prediction task in deep learning models. Specifically, we explore node classification using tree-structured recursive neural networks. Then we switch our goal to improve the accuracy and efficiency of the deep-walk based matrix factorization method.

— By Fenxiao(Jessica) [...]

By Wei Wang|January 14th, 2019|News|Comments Off|

Permalink Gallery
MCL Research on Word Embedding

MCL Research on Word Embedding

Word embedding has obtained its popularity among various NLP tasks including sentiment analysis [1], information retrieval [2] and machine translation [3]. The goal for word embedding is transferring word to vector representation which embeds both syntactic and semantic information. In the meantime, relationship between words can be distinguished though measurements of corresponding word vectors.

Although embedded vector representations of words offer impressive performance on many natural language processing (NLP) applications, the information of ordered input sequences is lost to some extent if only context-based samples are used in the training. For further performance improvement, two new post-processing techniques, called post-processing via variance normalization (PVN) and post-processing via dynamic embedding (PDE), are proposed in this work. The PVN method normalizes the variance of principal components of word vectors, while the PDE method learns orthogonal latent variables from ordered input sequences [4]. Our post-processing technique could improve the performance on both intrinsic evaluations tasks including word similarity, word analogy and outlier detection, and extrinsic evaluation tasks including sentiment analysis and machine translation.

In the meantime, we are also interested in word embedding evaluation tasks. It can be divided into two categories: intrinsic evaluation and extrinsic evaluation. We are trying to understand more on the properties of word embedding as well as their evaluation methods. It is still an on-going project.

Further developments include contextualized word embedding [5] and pre-trained language models [6] are quite popular last year. Lots of exciting work can be done along this direction and performance is much better than previous models. Also, bilingual or multi-lingual word embedding could also be an interesting research area.

–By Bin Wang, working with Fenxiao (Jessica) Chen, Angela Wang and Yunchen (Joe) Wang

Reference:

[1] Shin, B.; Lee, T.; and Choi, J. D. [...]

By Wei Wang|January 7th, 2019|News|Comments Off|

Permalink Gallery
Happy New Year 2019!

Happy New Year 2019!

In 2018, several students graduated from MCL with impressive work and started a new journey of life. Meanwhile, many new blood joined our group and enjoyed a wonderful time exploring in their research areas. In this year, MCL members kept moving forward in research and published high quality papers on top journals and conferences. Year 2018 has been a fruitful year for us.

Now we are standing at the end of 2018. Wish all members have a happy new year and a more wonderful 2019!

Image credits:

Image 1: http://www.traderstrustedacademy.com/category/happy-new-year-2019-hd-images/, cropped and resized with white padding; Image 2: http://www.hdnicewallpapers.com/Wallpaper-Download/New-Year/Happy-New-Year-Red-Rose, cropped and resized with white padding.

By Yijing Yang|December 31st, 2018|News|Comments Off|

Permalink Gallery
Merry Christmas!

Merry Christmas!

May your Christmas sparkle with moments of love, laughter and goodwill. And may the year ahead be full of contentment and joy. Wish all our fellows a Merry Christmas!

By Yijing Yang|December 24th, 2018|News|Comments Off|

Permalink Gallery
MCL Research on Domain Adaptation

MCL Research on Domain Adaptation

Trained deep learning models do not generalize well if the testing data has a different distribution from the training data set. For instance, in medical image segmentation, the MRI and CT scan of the same object look very different. If we simply train a model on the MRI scans, it is very likely that the model will not work on the CT scans. However, it is very expensive and time-consuming to manually label different data sets. Therefore, we wish to transfer the knowledge from a labeled training set to an unlabeled testing data with a different distribution. Domain adaptation can help us achieve this purpose.
Domain adaptation can be categorized into three types based on the availability of target domain data: supervised, semi-supervised, unsupervised [1]. In supervised domain adaptation, a limited amount of labeled target domain data is available. In the semi-supervised setting, unlabeled target domain data as well as a small amount of labeled target domain data is available. In the unsupervised setting, only unlabeled target domain data is available. Unsupervised domain adaptation is an ill-posed problem since we do not have labels for the target domain data. Proper assumptions on the target domain data are important for performing unsupervised domain adaptation. In our research, we focus on the unsupervised domain adaptation. Unsupervised domain adaptation can be applied to many computer vision problems, including classification, segmentation, and detection. Currently, we focus our experiment on classification.
–By Ruiyuan Lin

Reference:
[1] M. Wang and W. Deng, “Deep visual domain adaptation: A survey,” Neurocomputing, 2018.
Image Credits:
Anon, (2018). Available at: http://ai.bu.edu/visda-2018/assets/images/domain-adaptation.png [Accessed 16 Dec. 2018].
X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko, “Visda: The visual domain adaptation challenge,” 2017.

By Yijing Yang|December 16th, 2018|News|Comments Off|

Permalink Gallery
MCL Research on Visual Dialogues

MCL Research on Visual Dialogues

The task of Visual Dialogue involves a conversation between an agent system and a human end-user regarding visual information presented. The conversation consists of multiple rounds of questions and answers. The key challenge for the agent system is to answer the questions of human users with meaningful information while keeping the conversation flow contiguous and natural. The current visual dialogue systems can be divided into two tracks, generative models and discriminative models. The discriminative models cannot directly generate a response but choose a response out of a pool of candidate responses. Although the discriminative models have achieved surprising results, they are usually not applicable in real scenarios where no candidate response pool is available. On the other hand, the generative models can directly generate a response based on the input information. However, most generative models based on maximum likelihood estimation (MLE) approach suffer from the tendency of generating generic responses.

We present a novel approach that incorporates a multi-modal recurrently guided attention mechanism with a simple yet effective training scheme to generate high quality responses in the Visual Dialogue interaction model. Our attention mechanism combines attentions globally from multiple modalities (e.g., image, text questions and dialogue history), and refines them locally and simultaneously for each modality. Generators using typical MLE-based methods only learn from good answers, and consequently tend to generate safe or generic responses. The new training scheme with weighted likelihood estimation (WLE) penalizes generic responses for unpaired questions in the training and enables the generator to learn from poor answers as well.

On benchmark dataset, our proposed Visual Dialogue system demonstrates state-of-the-art performance with improvement of 5.81% and 5.28 on recall@10 and mean rank, respectively.

–By Heming Zhang

By Yijing Yang|December 9th, 2018|News|Comments Off|

Permalink Gallery
MCL Research on Re-enforcement Learning

MCL Research on Re-enforcement Learning

Imagine a robot navigating across rooms following human instructions: “Turn left and take a right at the table. Take a left at the painting and then take your first right. Wait next to the exercise equipment”, the agent is expected to first execute the action “turn left” and then locates “the table” before “taking a right”. However, in practice, the agent might well turn right in the middle of the trajectory before a table is observed, in which case the follow-up navigation would definitely fail. Human on the other hand, has the ability to relate visual input with language semantics. In this example, human would locate visual landmarks such as table, painting, exercise equipment before making a decision (turn right, turn left and stop). We endow our agent with similar reasoning ability by equipping our agent with a synthesizer module that implicitly aligns language semantics with visual observations. The poster is available online: https://davidsonic.github.io/summary/Poster_3d_indoor.pdf and the demonstration video is available at https://sites.google.com/view/submission-2019.
–By Jiali Duan

By Yijing Yang|December 2nd, 2018|News|Comments Off|

Permalink Gallery
MCL Research on Incremental Learning

MCL Research on Incremental Learning

Humans can accumulate and maintain the knowledge learned from previous tasks and use it seamlessly in learning new tasks and solving new problems — learning new concepts over time is a core characteristic of human learning. Therefore, it is desirable to have a computer vision system that can learn incrementally about new classes when training data for them becomes available, as this is a necessary step towards the ultimate goal of building real intelligent machines that learn like humans.

Despite the recent success of deep learning in computer vision for a broad range of tasks, classical training paradigm of deep models is ill-equipped for incremental learning (IL). Most deep neural networks can only be trained in batch mode in which the complete dataset is given and all classes are known prior to training. However, the real world is dynamic and new categories of interest can emerge over time. Re-training a model from scratch whenever a new class is encountered is prohibitively expensive due to the huge data storage requirements and computational cost. Directly fine-tuning the existing model on only the data of new classes using stochastic gradient descent (SGD) optimization is not a better approach either, as this might lead to the notorious catastrophic forgetting effect, which refers to the severe performance degradation on old tasks.

Existing IL approaches attempting to overcome catastrophic forgetting tend to produce a model that is biased towards either the old classes or new classes, unless with the help of exemplars of the old data. To address this issue, we propose a class-incremental learning paradigm called Deep Model Consolidation (DMC), which works well even when the original training data is not available. The idea is to train a model on the [...]

By Yijing Yang|November 25th, 2018|News|Comments Off|

Previous 32 333435 36 Next

News