MCL Members Attended the PhD Hooding Ceremony

Nine MCL members attended the Viterbi PhD hooding ceremony on Thursday, May 9, 2019, from 8:30-11:00 a.m. in the Bovard Auditorium. They were Fenxiao Chen, Yueru Chen, Ronald Salloum, Yuhang Song, Yuanhang Su, Ye Wang, Chao Yang, Heming Zhang, and Junting Zhang. Congratulations to them for their accomplishments in completing their PhD program at USC!

Fenxiao (Jessica) Chen received the B.S. degree in General Engineering from Harvey Mudd College, Claremont, CA in 2014. She then continued with her PhD in Media Communications Lab at USC from 2017. Her research interests include natural language processing and deep learning.

Yueru Chen received her Bachelor’s degree in Physics from the University of Science and Technology of China in June 2014. Since 2015, she joined MCL for the PhD study. Her thesis topic is “Object Classification based on Neural-network-inspired Image Transforms”, where she focuses on solving the image classification problem based on the neural-network-inspired Saak transform and Saab transform.

Ronald Salloum received his B.S. degree in Electrical Engineering from California State Polytechnic University, Pomona, and his Ph.D. degree in Electrical Engineering from University of Southern California (USC). The title of his dissertation is “A Data-Driven Approach to Image Splicing Localization.” His research interests include multimedia forensics, machine learning, and biometrics.

Yuhang Song received his Bachelor’s degree in Electronic Engineering from Tsinghua University, Beijing in 2014. He then joined MCL to pursue Ph.D. degree in Electrical Engineering at USC from 2015. His research interests include deep generative models, image generation, visual relationship detection, and visual understanding.

Yuanhang Su received his Ph.D. at the University of Sothern California (USC) in computer vision, natural language processing and machine learning. He received M.S. degree from the USC in 2010 and the dual B.S. degree from the University [...]

By |May 19th, 2019|News|Comments Off on MCL Members Attended the PhD Hooding Ceremony|

Congratulations to Harry Yang for Passing His Defense!

Congratulations to Harry Yang for passing his defense on May 7, 2019! Let us hear what he would like to say about his defense and an abstract of his thesis.

“In the thesis, we tackle the problem of translating faces and bodies between different identities without paired training data: we cannot directly train a translation module using supervised signals in this case. Instead, we propose to train a conditional variational auto-encoder (CVAE) to disentangle different latent factors such as identity and expressions. In order to achieve effective disentanglement, we further use multi-view information such as keypoints and facial landmarks to train multiple CVAEs. By relying on these simplified representations of the data we are using a more easily disentangled representation to guide the disentanglement of image itself. Experiments demonstrate the effectiveness of our method in multiple face and body datasets. We also show that our model is a more robust image classifier and adversarial example detector comparing with traditional multi-class neural networks.

“To address the issue of scaling to new identities and also generate better-quality results, we further propose an alternative approach that uses self-supervised learning based on StyleGAN to factorize out different attributes of face images, such as hair color, facial expressions, skin color, and others. Using pre-trained StyleGAN combined with iterative style inference we can easily manipulate the facial expressions or combine the facial expressions of any two people, without the need of training a specific new model for each of the identity involved. This is one of the first scalable and high-quality approach for generating DeepFake data, which serves as a critical first step to learn a more robust and general classifier against adversarial examples.”

Harry also shared about his Ph.D. experience:

“Firstly, I would [...]

By |May 12th, 2019|News|Comments Off on Congratulations to Harry Yang for Passing His Defense!|

Welcome New MCL Member Dr. Na Li

We are so glad to welcome our new MCL member, Dr. Na Li!

Dr. Li is an Assistant Professor at Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS). Currently, she is a visiting scholar at MCL in USC, under the supervision of Prof. C.-C. Jay Kuo. Here is a short interview with Dr Li:

1. Could you briefly introduce yourself and your research interests?

I’m Na Li, Ph.D., an Assistant Professor at Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS). In 2009, I graduated from Hunan University with Bachelor degree on Computer Science and Technology, Changsha. In 2014, I received Ph.D. degree at Institute of Automation (IA), Chinese Academy of Sciences, Beijing. Since 2014, I join SIAT CAS at Shenzhen. I was doing internship as a research associate in High Performance Department in SAP China (Beijing) from Sep. 2013 to Nov. 2013. I am a big fun of Mobius Band. I mainly focus on intelligent video processing and analysis. My research interest including video coding, crowd behavior analysis, reinforcement learning, optimization, scheduling algorithm and related fields.

2. What is your impression about MCL and USC?

MCL is a big family guided by a big father. It consists of a bunch of guys with great mind. I like the seminar held in every Friday very much, which is full of sharing of food, knowledge and experiences. Everyone in MCL are trying their best to be excellent and contribute to the fields of both academy and industry. I am impressed by the way MCL and USC are well organized. The education system of USC is strong enough to build up great Trojans.

3. What is your future expectation and plan in MCL?

I am looking forward to [...]

By |May 5th, 2019|News|Comments Off on Welcome New MCL Member Dr. Na Li|

MCL Research on Multi-model Neural Machine Translation

Our long-term goal is to build intelligent systems that can perceive their visual environment and understand the linguistic information, and further make an accurate translation inference to another language. However, most multi-modal translation algorithms are not significantly better than an off-the-shelf text-only machine translation (MT) model. There remains an open question about how translation models should take advantage of visual context, because from the perspective of information theory, the mutual information of two random variables I(X; Y) will always be no greater than I(X, Z; Y) where Z is the additional visual input. This conclusion makes us believe that the visual content will hopefully help the translation systems.

Since the standard paradigm of multi-modal translation always considers the problem as a supervised learning task, the parallel corpus is usually sufficient to train a good translation model, and the gain from the extra image input is very limited. We however argue that the text-only UMT is fundamentally an ill-posed problem, since there are potentially many ways to associate target with source sentences. Intuitively, since the visual content and language are closely related, the image can play the role of a pivot “language” to bridge the two languages without paralleled corpus, making the problem “more well-defined” by reducing the problem to supervised learning.

We tackle the unsupervised translation with a multi-modal framework which includes two sequence-to-sequence encoder-decoder models and one shared image feature extractor in order to achieve the unsupervised translation. We employ transformer in both the text encoder and decoder of our model and design a novel joint attention mechanism to simulate the relationships among the language and visual domains.

Succinctly, our contributions are three-fold:

We formulate the multi-modal MT problem as unsupervised setting that fits the real [...]

By |April 28th, 2019|News|Comments Off on MCL Research on Multi-model Neural Machine Translation|

Congratulations to Yuanhang Su for Passing His Defense

Congratulations to Yuanhang Su for passing his PhD defense on April 16, 2019. His PhD thesis is entitled “Theory of Memory-enhanced Neural Systems and Image-assisted Neural Machine Translation”.


My research focus on sequence learning systems (whose input can be language, speech, video, etc.) and answering the following questions: what is memory and how we can build a system that can learn efficiently by remembering? Does visual imagination can help and if yes, how we can build a system that handles both language and vision? The foundation we built for the former question are two computing architectures: one is Extended Long Short-term Memory (ELSTM) for Recurrent Neural Network (RNN) modeling and the other is Tree-structured Multi-stage Principal Component Analysis (TMPCA) for language embedding. They are derived from the perspective of memory as a system function and memory as compact information representation (or dimension reduction (DR)) respectively. From the first perspective, we did detailed analysis of RNN cells model, demystified their model properties and concluded that all existing RNN cells (RNN, LSTM and GRU) suffer memory decay. The new proposal of ELSTM does not have such limitation. It has outstanding performance for complex language tasks. From the second perspective, PCA-based technique is utilized to do sequence embedding for the sake of maximizing input/output mutual information and explainable machine learning. The proposed TMPCA computes much faster than ordinary PCA while retaining much of the other merits. To answer the latter question, we argued that visual information can benefit the language learning task by increasing the system’s mutual information and successfully deployed a Transformer-based multi-modal NMT system that is trained/fine-tuned unsupervisedly on image captioning dataset. It is one of the first such systems ever developed for unsupervised MT and the [...]

By |April 22nd, 2019|News|Comments Off on Congratulations to Yuanhang Su for Passing His Defense|

Congratulations to Ron Salloum for Passing His Defense

Congratulations to Ron Salloum for passing his PhD defense on April 10, 2019. His PhD thesis is entitled “A Data-Driven Approach to Image Splicing Localization”.


The availability of low-cost and user-friendly editing software has made it significantly easier to manipulate images. Thus, there has been an increasing interest in developing forensic techniques to detect and localize image manipulations or forgeries. Splicing, which is one of the most common types of image forgery, involves copying a region from one image (referred to as the donor image) and pasting it onto another image (referred to as the host image). Forgers often use splicing to give a false impression that there is an additional object present in the image, or to remove an object from the image.

Many of the current splicing detection methods only determine whether a given image has been spliced and do not attempt to localize the spliced region. Relatively few methods attempt to tackle the splicing localization problem, which refers to the problem of determining which pixels in an image have been manipulated as a result of a splicing operation.

In my dissertation, I present two different splicing localization methods that we have developed. The first is the Multi-task Fully Convolutional Network (MFCN), which is a neural-network-based method that outperforms previous methods on many datasets. The second proposed method is based on cPCA++ (where cPCA stands for contrastive Principal Component Analysis), which is a new data visualization and clustering technique that we have developed. The cPCA++ method is more efficient than the MFCN and achieves comparable performance.

PhD Experience:

Pursuing my PhD degree was a very challenging but rewarding experience. I really enjoyed my time in the Media Communications Laboratory and had the opportunity to work on exciting research projects. [...]

By |April 16th, 2019|News|Comments Off on Congratulations to Ron Salloum for Passing His Defense|

MCL Research on Fake Image Detection

With the rapid development of image processing technology, generating an image without obvious visual artifacts becomes much easier. Progressive GAN have generated high resolution images which can almost fool human eyes. In this case, fake image detection is a must. Currently, convolutional neural network based method is tested by many researchers to do GAN image detection. They build deeper and deeper network, such as XceptionNet, in order to have higher ability to distinguish real and fake. These CNN-based methods have achieved very high accuracy of more than 99%. 
We want to build an interpretable method compared to others, with no back-propagation, and aims to achieve similar accuracy. In our method, we first detect 68 facial landmarks from both real and fake images. Then extract 32*32 patches which are centered by the 68 facial landmarks. Those patches together with their label, will be fed into two layers’ Saab architecture. After two fully connected layers, the probability of the patches being fake or real will be stored in 2 by 1 output vector. For 68 facial landmarks, we train 68 models. The output 68 2 by 1 vectors will be fed into a SVM classifier, and output the decision of whether the whole training image will fake or real. 
Author: Yao Zhu

By |April 8th, 2019|News|Comments Off on MCL Research on Fake Image Detection|

MCL Research on Point-cloud Analysis

With the rise of visualization, animation and autonomous driving applications, the demand for 3D point cloud analysis and understanding has rapidly increased. Point Cloud is a kind of data obtained from lidar scanning which contains abundant 3D information. Our research directions about point cloud in autonomous driving are object detection, segmentation and classification.

Due to its unstructured and unordered properties, people usually transfer point cloud into other data types such as mesh, voxel and multi-view. But the transformation must cause information lost. Recently, several deep-learning-solutions such as PointNet/Pointnet++ [1, 2] tailored to point clouds provide a more efficient and flexible way to handle 3D data. Some successful results for object classification and parts and semantic scene segmentation have been demonstrated. However, object and scene understanding with Convolutional Neural Networks (CNNs) on 3D volumetric data is still limited due to its high memory requirement and computational cost. This brings a challenge for autonomous driving since it requires real-time and concise processing of the observed scenes and objects.

An interpretable CNN design based on the feedforward (FF) methodology [3] without any backpropagation (BP) was recently proposed by the Media Communications Lab at USC. The FF design offers a complementary approach to CNN filter weights selection. We are now designing a feed-forward (FF) network for both object classification and indoor scene segmentation. The advantages of the FF design methodology are multiple folds. It is completely interpretable. It demands much less training complexity and training data. Furthermore, it can be generalized to weakly supervised or unsupervised learning scenarios in a straightforward manner. The latter is extremely important in real world application scenarios since data labeling is very tedious and expensive.


R. Qi, H. Su, K. Mo, and L. J. Guibas. [...]

By |April 1st, 2019|News|Comments Off on MCL Research on Point-cloud Analysis|

MCL Research on Domain Adaptation

Domain Adaptation is a sort of transfer learning, which is aimed to learn a model from source data distribution and apply to the target data of different distribution. Basically, the tasks in source and target domains are the same, such as both are image classification task or both are image segmentation task. There are three types of domain adaptation, differing in how many target samples are labeled with ground truth labels. In the supervised domain adaptation and the semi-supervised domain adaptation, all or part of target data is labeled respectively, while all target data is unlabeled in the unsupervised domain adaptation.


There are several classical methods supposed to solve domain shift problems by feature alignment in the unsupervised domain adaptation. [1] maps data of source and target domains into one subspace learned by reducing the distribution distance measured by maximum mean discrepancy. [2] aligns eigenvectors of two domains by learning a linear mapping function. [3] utilizes geometric and statistical changes between source and target domain to build an infinite number of subspaces and integrates them together. With the increasing popularity of deep learning, there are plenty of methods[4,5,6] utilize CNN or GAN in domain adaptation. But those methods demand a high computation cost due to back-propagation and GAN related methods are unstable in training. Besides, generalizability from one domain to the other is weak in deep learning based methods.


Professor Kuo proposed several explanations on explainable deep learning since 2014. The Saak and Saab transform gives a way to extract feature representation of images and original images can be reconstructed from the feature representation through inverse transform. This gives us a new way to handle domain adaptation task. We are now working on aligning Saab features [...]

By |March 25th, 2019|News|Comments Off on MCL Research on Domain Adaptation|

MCL Research on Active Learning

Deep learning has shown its effectiveness in various computer vision tasks. However, a large amount of labeled data is usually needed for deep learning approaches. Active learning can help reduce the labeling efforts by choosing the most informative samples to label and thus achieves a comparable performance with less labeled data.

There are two major types of active learning strategy: uncertainty based and diversity based.

The core idea of uncertainty based methods is to label those samples that are most uncertain to the existing model trained on current labeled set. For example, an image with a prediction of 50 percent cat is empirically considered to be more valuable than an image with a prediction of 99 percent cat, where the former has larger uncertainty. Besides uncertainty metrics from information theory like entropy, Beluch et al. [1] proposes to use an ensemble to estimate the uncertainty of unlabeled images and achieves good results in ImageNet dataset.

In contrast, diversity based methods rely on an assumption that a more diverse set of images chosen as the training set can lead to better performance. Sener et al. [2] formalizes the active learning problem into a core-set problem and achieves competitive performance in CIFAR-10 dataset. Mixed-integer programming is used to solve their objective function.

Our current research focuses on balancing the two factors (uncertainty and diversity) in a explainable way.

[1] Beluch, William H., et al. “The power of ensembles for active learning in image classification.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[2] Sener, Ozan, and Silvio Savarese. “Active learning for convolutional neural networks: A core-set approach.” (2018).

Author: Yeji Shen

By |March 18th, 2019|News|Comments Off on MCL Research on Active Learning|