Congratulations to Yuanhang Su for Passing His Defense

Congratulations to Yuanhang Su for passing his PhD defense on April 16, 2019. His PhD thesis is entitled “Theory of Memory-enhanced Neural Systems and Image-assisted Neural Machine Translation”.


My research focus on sequence learning systems (whose input can be language, speech, video, etc.) and answering the following questions: what is memory and how we can build a system that can learn efficiently by remembering? Does visual imagination can help and if yes, how we can build a system that handles both language and vision? The foundation we built for the former question are two computing architectures: one is Extended Long Short-term Memory (ELSTM) for Recurrent Neural Network (RNN) modeling and the other is Tree-structured Multi-stage Principal Component Analysis (TMPCA) for language embedding. They are derived from the perspective of memory as a system function and memory as compact information representation (or dimension reduction (DR)) respectively. From the first perspective, we did detailed analysis of RNN cells model, demystified their model properties and concluded that all existing RNN cells (RNN, LSTM and GRU) suffer memory decay. The new proposal of ELSTM does not have such limitation. It has outstanding performance for complex language tasks. From the second perspective, PCA-based technique is utilized to do sequence embedding for the sake of maximizing input/output mutual information and explainable machine learning. The proposed TMPCA computes much faster than ordinary PCA while retaining much of the other merits. To answer the latter question, we argued that visual information can benefit the language learning task by increasing the system’s mutual information and successfully deployed a Transformer-based multi-modal NMT system that is trained/fine-tuned unsupervisedly on image captioning dataset. It is one of the first such systems ever developed for unsupervised MT and the [...]

By |April 22nd, 2019|News|Comments Off on Congratulations to Yuanhang Su for Passing His Defense|

Congratulations to Ron Salloum for Passing His Defense

Congratulations to Ron Salloum for passing his PhD defense on April 10, 2019. His PhD thesis is entitled “A Data-Driven Approach to Image Splicing Localization”.


The availability of low-cost and user-friendly editing software has made it significantly easier to manipulate images. Thus, there has been an increasing interest in developing forensic techniques to detect and localize image manipulations or forgeries. Splicing, which is one of the most common types of image forgery, involves copying a region from one image (referred to as the donor image) and pasting it onto another image (referred to as the host image). Forgers often use splicing to give a false impression that there is an additional object present in the image, or to remove an object from the image.

Many of the current splicing detection methods only determine whether a given image has been spliced and do not attempt to localize the spliced region. Relatively few methods attempt to tackle the splicing localization problem, which refers to the problem of determining which pixels in an image have been manipulated as a result of a splicing operation.

In my dissertation, I present two different splicing localization methods that we have developed. The first is the Multi-task Fully Convolutional Network (MFCN), which is a neural-network-based method that outperforms previous methods on many datasets. The second proposed method is based on cPCA++ (where cPCA stands for contrastive Principal Component Analysis), which is a new data visualization and clustering technique that we have developed. The cPCA++ method is more efficient than the MFCN and achieves comparable performance.

PhD Experience:

Pursuing my PhD degree was a very challenging but rewarding experience. I really enjoyed my time in the Media Communications Laboratory and had the opportunity to work on exciting research projects. [...]

By |April 16th, 2019|News|Comments Off on Congratulations to Ron Salloum for Passing His Defense|

MCL Research on Fake Image Detection

With the rapid development of image processing technology, generating an image without obvious visual artifacts becomes much easier. Progressive GAN have generated high resolution images which can almost fool human eyes. In this case, fake image detection is a must. Currently, convolutional neural network based method is tested by many researchers to do GAN image detection. They build deeper and deeper network, such as XceptionNet, in order to have higher ability to distinguish real and fake. These CNN-based methods have achieved very high accuracy of more than 99%. 
We want to build an interpretable method compared to others, with no back-propagation, and aims to achieve similar accuracy. In our method, we first detect 68 facial landmarks from both real and fake images. Then extract 32*32 patches which are centered by the 68 facial landmarks. Those patches together with their label, will be fed into two layers’ Saab architecture. After two fully connected layers, the probability of the patches being fake or real will be stored in 2 by 1 output vector. For 68 facial landmarks, we train 68 models. The output 68 2 by 1 vectors will be fed into a SVM classifier, and output the decision of whether the whole training image will fake or real. 
Author: Yao Zhu

By |April 8th, 2019|News|Comments Off on MCL Research on Fake Image Detection|

MCL Research on Point-cloud Analysis

With the rise of visualization, animation and autonomous driving applications, the demand for 3D point cloud analysis and understanding has rapidly increased. Point Cloud is a kind of data obtained from lidar scanning which contains abundant 3D information. Our research directions about point cloud in autonomous driving are object detection, segmentation and classification.

Due to its unstructured and unordered properties, people usually transfer point cloud into other data types such as mesh, voxel and multi-view. But the transformation must cause information lost. Recently, several deep-learning-solutions such as PointNet/Pointnet++ [1, 2] tailored to point clouds provide a more efficient and flexible way to handle 3D data. Some successful results for object classification and parts and semantic scene segmentation have been demonstrated. However, object and scene understanding with Convolutional Neural Networks (CNNs) on 3D volumetric data is still limited due to its high memory requirement and computational cost. This brings a challenge for autonomous driving since it requires real-time and concise processing of the observed scenes and objects.

An interpretable CNN design based on the feedforward (FF) methodology [3] without any backpropagation (BP) was recently proposed by the Media Communications Lab at USC. The FF design offers a complementary approach to CNN filter weights selection. We are now designing a feed-forward (FF) network for both object classification and indoor scene segmentation. The advantages of the FF design methodology are multiple folds. It is completely interpretable. It demands much less training complexity and training data. Furthermore, it can be generalized to weakly supervised or unsupervised learning scenarios in a straightforward manner. The latter is extremely important in real world application scenarios since data labeling is very tedious and expensive.


R. Qi, H. Su, K. Mo, and L. J. Guibas. [...]

By |April 1st, 2019|News|Comments Off on MCL Research on Point-cloud Analysis|

MCL Research on Domain Adaptation

Domain Adaptation is a sort of transfer learning, which is aimed to learn a model from source data distribution and apply to the target data of different distribution. Basically, the tasks in source and target domains are the same, such as both are image classification task or both are image segmentation task. There are three types of domain adaptation, differing in how many target samples are labeled with ground truth labels. In the supervised domain adaptation and the semi-supervised domain adaptation, all or part of target data is labeled respectively, while all target data is unlabeled in the unsupervised domain adaptation.


There are several classical methods supposed to solve domain shift problems by feature alignment in the unsupervised domain adaptation. [1] maps data of source and target domains into one subspace learned by reducing the distribution distance measured by maximum mean discrepancy. [2] aligns eigenvectors of two domains by learning a linear mapping function. [3] utilizes geometric and statistical changes between source and target domain to build an infinite number of subspaces and integrates them together. With the increasing popularity of deep learning, there are plenty of methods[4,5,6] utilize CNN or GAN in domain adaptation. But those methods demand a high computation cost due to back-propagation and GAN related methods are unstable in training. Besides, generalizability from one domain to the other is weak in deep learning based methods.


Professor Kuo proposed several explanations on explainable deep learning since 2014. The Saak and Saab transform gives a way to extract feature representation of images and original images can be reconstructed from the feature representation through inverse transform. This gives us a new way to handle domain adaptation task. We are now working on aligning Saab features [...]

By |March 25th, 2019|News|Comments Off on MCL Research on Domain Adaptation|

MCL Research on Active Learning

Deep learning has shown its effectiveness in various computer vision tasks. However, a large amount of labeled data is usually needed for deep learning approaches. Active learning can help reduce the labeling efforts by choosing the most informative samples to label and thus achieves a comparable performance with less labeled data.

There are two major types of active learning strategy: uncertainty based and diversity based.

The core idea of uncertainty based methods is to label those samples that are most uncertain to the existing model trained on current labeled set. For example, an image with a prediction of 50 percent cat is empirically considered to be more valuable than an image with a prediction of 99 percent cat, where the former has larger uncertainty. Besides uncertainty metrics from information theory like entropy, Beluch et al. [1] proposes to use an ensemble to estimate the uncertainty of unlabeled images and achieves good results in ImageNet dataset.

In contrast, diversity based methods rely on an assumption that a more diverse set of images chosen as the training set can lead to better performance. Sener et al. [2] formalizes the active learning problem into a core-set problem and achieves competitive performance in CIFAR-10 dataset. Mixed-integer programming is used to solve their objective function.

Our current research focuses on balancing the two factors (uncertainty and diversity) in a explainable way.

[1] Beluch, William H., et al. “The power of ensembles for active learning in image classification.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[2] Sener, Ozan, and Silvio Savarese. “Active learning for convolutional neural networks: A core-set approach.” (2018).

Author: Yeji Shen

By |March 18th, 2019|News|Comments Off on MCL Research on Active Learning|
  • Permalink Gallery

    Professor Kuo received IEEE Computer Society 2019 Edward J. McCluskey Technical Achievement Award

Professor Kuo received IEEE Computer Society 2019 Edward J. McCluskey Technical Achievement Award

Dr. C.-C. Jay Kuo, Distinguished Professor of Electrical Engineering and Computer Science, and the Director of the Multimedia Communications Laboratory at USC, has been selected to receive the IEEE Computer Society 2019 Edward J. McCluskey Technical Achievement Award, for “outstanding contributions to multimedia computing technologies and their applications.”

Professor Kuo is a world-renowned technical leader in multimedia computing technologies, systems and applications with an enduring impact on both academic and industry realms in the last three decades. He has made seminal contributions to video coding technologies in three areas: fast motion search, H.264 rate control, and perceptual coding. Professor Kuo’s deblocking filter and rate control technologies are widely used in video capturing devices such as smart phone cameras. Furthermore, he conducted extensive work in applying wavelets to image processing such as texture analysis, curve representation, fractal analysis, watermarking and data hiding. Recently, he has focused on machine learning, artificial intelligence, and computer vision and has developed a mathematical model that shed light on the mysterious behavior of deep learning networks.

Professor Kuo said, “It is a great honor to be named as the recipient of the IEEE Computer Society 2019 Edward J. McCluskey Technical Achievement Award. There are many outstanding researchers in the field, and I am truly humbled for this recognition.”.

For more, please click https://www.computer.org/press-room/2019-news/2019-edward-j-mccluskey-technical-achievement-award-c-c-jay-cuo

By |March 11th, 2019|News|Comments Off on Professor Kuo received IEEE Computer Society 2019 Edward J. McCluskey Technical Achievement Award|

MCL Research on Explainable Deep Learning

The deep learning technologies such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have great impacts on modern machine learning due to their impressive performance in many application fields that involve learning, modeling, and processing of complex sensing data. Yet, the working principle of deep learning remains mysterious. Furthermore, it has several well-known weaknesses: 1) vulnerability to adversarial attacks, 2) demanding heavy supervision, 3) generalizability from one domain to the other. Professor Kuo and his PhD students at Media Communications Lab (MCL) have been working on explainable deep learning since 2014 and published a sequence of pioneering papers on this topic.

Explanation of nonlinear activation, convolutional filters and discriminability of trained features of CNNs [1]-[3]. The role of CNN’s nonlinear activation function is well explained in [1] at the first time. That is, the nonlinear activation operation is used to resolve the sign confusion problem due to the cascade of convolutional operations in multiple layers. This work received the 2018 best paper award from the Journal of Visual Communication and Image Representation. The convolutional filters is viewed as a rectified correlations on a sphere (RECOS) and CNN’s operation is interpreted as a multi-layer RECOS transform in [2].  The discriminability of trained features of a CNN at different convolution layers is analyzed using two quantitative metrics in [3] – the Gaussian confusion measure (GCM) and the cluster purity measure (CPM), The analysis is validated by experimental results.

Saak transform and its application to adversarial attacks [4]-[5]. Being inspired by deep learning, we  develop  a new mathematical transform called the Saak (Subspace approximation with augmented kernels) transform in [4]. The Saak and inverse Saak transforms provide signal analysis and synthesis tools, respectively. CNNs are known to [...]

By |March 4th, 2019|News|Comments Off on MCL Research on Explainable Deep Learning|

Farewell to Dr. Xinfeng Zhang and Dr. Chao Yang

Dr. Xinfeng Zhang and Dr. Chao Yang are currently Postdoctoral Research Fellows at the MCL. They will complete their one-year stay and go back to China at the end of October.

Dr. Xinfeng Zhang received his PhD degree from Institute of Computing Technology, Chinese Academy of Sciences while Dr. Chao Yang received his PhD degree from Shanghai University. They are both experts in video coding. They joined the MCL in November 2017 and participated in two industrial projects: “Perceptual Video Coding based on Visual Attention Mechanism” (sponsored by Huawei) and “Joint Image Coding and Visual Understanding” (sponsored by Netflix, Tencent and Mediatek). They have done an excellent job in both projects, which leads to two journal papers under review.

MCL Director, Dr. C.-C. Jay Kuo, said that “It is our great pleasure to have Dr. Zhang and Dr. Yang to be around in our lab for the last year. They have made very important contributions. I do wish them the very best in their future career development.”

Dr. Xinfeng Zhang said that “It is a wonderful year for me in MCL, which is a prestigious research lab but also a family with love. I appreciate Prof. Kuo very much for the professional advices in my research, the strong support for my faculty job applications and the sincere guidance for life and career.  Moreover, I am very pleased to know all of the MCL members and become friends with you. Especially, thank Prof. Li, Dr. Yang, Haiqiang for the good research cooperation, and thank Bing Li and Bin Wang for the helps in my life of USC. Thanks, and best wishes for our MCL members!”

And Dr. Chao Yang said that “It’s a great honor to be here working [...]

By |October 21st, 2018|News|Comments Off on Farewell to Dr. Xinfeng Zhang and Dr. Chao Yang|
  • Permalink Gallery

    MCL Director, Dr. C.-C. Jay Kuo, Delivered Viterbi Special Guest Speech at Technion, Israel Institute of Technology

MCL Director, Dr. C.-C. Jay Kuo, Delivered Viterbi Special Guest Speech at Technion, Israel Institute of Technology

After a short stay in Athens for ICIP 2018, Professor Kuo flew to Israel and visited Technion, Israel Institute of Technology.  He delivered the Viterbi Special Guest Lecture titled with “Unveil Convolutional Neural Networks and Go Beyond” on October 11. The talk was very well received.

The Technion – Israel Institute of Technology – is a public research university in Haifa, Israel. The university was established in 1912 during the Ottoman Empire, which was more than 35 years before the State of Israel. The Technion is the oldest university in Israel. It is ranked the best university in Israel and in the whole of the Middle East.

There is a close tie between USC and Technion through Dr. Andrew J. Viterbi. Dr. Viterbi received a Technion Honorary Doctorate in 2000. He has been a Distinguished Visiting Professor of Electrical Engineering at the Technion since then. Dr. Viterbi announced a $50 million gift to secure and enhance the Technion-Israel Institute of Technology’s leadership position in electrical and computer engineering in Israel and globally in 2015. He is a member of the Technion Board of Governors.

Professor Kuo said, “It was my great honor to have this opportunity to be a bridge between the USC Viterbi School of Engineering and the Technion Viterbi Faculty of Electrical Engineering. It is very meaningful to have more interactions and faculty/student exchanges between these two world top universities.” Professor Kuo’s visit was sponsored by the Technion Rubiner/Viterbi Fund. He used the same office of Dr. Andrew Viterbi during his stay. Professor Kuo’s visit was hosted by Professor Josh Zeevi, who is a world renowned expert in vision and image sciences.

By |October 14th, 2018|News|Comments Off on MCL Director, Dr. C.-C. Jay Kuo, Delivered Viterbi Special Guest Speech at Technion, Israel Institute of Technology|