MCL Work Won APSIPA Sadaoki Furui Paper Award

Dr. Sachin Chachada, a former MCL alumnus, and Professor C.-C. Jay Kuo received the 2019 Sadaoki Furui Paper Award at the open ceremony of the 2019 APSIPA ASC held in Lanzhou, China, on November 19 for their paper below:
Sachin Chachada and C.-C. Jay Kuo, “Environmental sound recognition: a survey,” Published online: 15 December 2014, e14, APSIPA Trans. on Signal and Information Processing.
The paper has been cited by 107 in Google Scholar, the number of downloads in 2019 (to end of Sept.) is 1523. The abstract of the paper is given below.
Although research in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ESR) has received more attention in recent years. Research on ESR has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal’s temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes four parts: (i) basic environmental sound-processing schemes, (ii) stationary ESR techniques, (iii) non-stationary ESR techniques, and (iv) performance comparison of selected methods. Finally, concluding remarks and future research and development trends in the ESR field will be given.

By |November 24th, 2019|News|Comments Off on MCL Work Won APSIPA Sadaoki Furui Paper Award|

Congratulations to Ye Wang for Passing His PhD Defense

Congratulations to Ye Wang for passing his defense on November 12, 2019. His Ph.D. thesis is entitled “Video Object Segmentation and Tracking with Deep Learning Techniques”.

Abstract of the thesis:

Video object segmentation (VOS) aims to segment foreground objects from complex background scenes in video sequences. It is a challenging problem because of the complex nature of videos: occlusions, motion blur, deforming shapes and truncations etc. There are two main categories in existing VOS methods: semi-supervised and unsupervised. Semi-supervised VOS algorithms require manually annotated object regions in the first frame and then automatically segment the specified object in the remaining frames throughout the video sequence. Unsupervised VOS algorithms segment the most conspicuous and eye-attracting objects without prior knowledge of these objects in the video.

This thesis describes how we build an intelligent system to perform video object segmentation and tracking. We discuss the challenges in this field and then propose deep learning algorithms to significantly improve the performances in this thesis. First, this thesis addresses unsupervised video object segmentation by designing pseudo ground truth and online adaptation. Second, a novel unsupervised video object segmentation approach via distractor-aware online adaptation is proposed to deal with challenging videos when multiple objects occur and interact in a given video clip. Finally, this thesis also presents a two-stage approach, track and then segment, to perform semi-supervised video object segmentation with only bounding box annotations. Besides, we provide some interesting problems which can be addressed in the future.

Ph.D. experience:

I am very fortunate to join Media Communications Lab (MCL) led by Professor C.-C. Jay Kuo in Fall 2015, who gave me the chance and inspiration to explore the new direction and guided me through my whole PhD study. His immense knowledge has greatly [...]

By |November 17th, 2019|News|Comments Off on Congratulations to Ye Wang for Passing His PhD Defense|

MCL Research on Point Cloud Classification and Segmentation

Recently, Professor Kuo and his students at MCL proposed a new machine learning methodology called successive subspace learning (SSL). The methodology has been widely adopted in MCL to solve image processing and computer vision problems. In 3D domain, we have observed a great success in point cloud classification task. In the PointHop paper, we develop an explainable machine learning method for point cloud classification. The classification baseline is composed by four PointHop units, we construct the local-to-global attribute building process and use saab transform to control the dimension growth in each unit. We compare the test performance on ModelNet40 with the state-of-the-art methods, our method obtains comparable performance with the others while demands much less training time. For instance, PointNet costs about 5 hours to train, while ours only takes 20 minutes to train on the same dataset. The advantages of the methodology are very clear: interpretable and much less computation complexity.

The success in point cloud classification encourages us to go deeper in 3D domain. Therefore, we further look at the segmentation task which needs to assign label to each point in the point cloud. Referring to the common image segmentation network, we use the point cloud classification baseline as an encoder and add a decoder to complete segmentation. After building local neighboring regions and extracting local attributes from neighboring points in the encoder, the features are interpolated back to finest scale layer by layer with skip connections between same scales in the decoder. Also, saab transform is adopted between layers as feedfoward convolution to control the rapid growth of the feature dimension.

Our method also has the advantage of task-agnostic ability. Specifically, by learning the parameters in a one-pass manner, our [...]

By |November 11th, 2019|News|Comments Off on MCL Research on Point Cloud Classification and Segmentation|

Welcome New MCL Member – Hong-Shuo Chen

We are so happy to welcome a new undergraduate member of MCL, Hong-Shuo Chen. Here is an interview with Hong-Shuo:

1. Could you briefly introduce yourself and your research interests?

I am Hong-Shuo Chen, and my English name is Max. Before coming to USC, I got my bachelor degree from Electrical Engineering and Computer Science at National Chiao Tung University in Taiwan. I like coding, math and everything about engineering. It gives me great pleasure to come to MCL as a Ph.D. student. My research interests are image segmentation and texture analysis. The world of the computer vision is very broad that I want to explore more in this field.

2. What is your impression about MCL and USC?

USC is a top university of this world and MCL is really a cooperative and strong lab. I like this beautiful campus and really enjoy doing research in MCL. Professor Kuo is really a good mentor. He takes care of all the members in the lab and also has profound knowledge and experience. I really appreciate having this opportunity to learn and study in MCL.

3. What is your future expectation and plan in MCL?

With the guidance of Professor Kuo, I believe I can dive into the world of computer vision deeply and thoroughly, and become expert in the signal processing and machine learning. In the future, I hope I can become a professional engineer, solve problems of this world independently and make some contribution to this society.

By |October 6th, 2019|News|Comments Off on Welcome New MCL Member – Hong-Shuo Chen|

Welcome New MCL Member – Hongyu Fu

We are so happy to welcome a new graduate member of MCL, Hongyu Fu. Here is an interview with Hongyu:

1. Could you briefly introduce yourself and your research interests?
My name is Hongyu Fu, before becoming a PhD student in USC, I got my bachelor’s degree in electrical engineering from Peking University. My past research experience focuses mostly on semiconductor device physics and circuits, while exploring device and circuit based neuromorphic computing and machine learning topics, I have heard more and more about computer vision, AI and machine learning, which are always my keen interest that I haven’t got chances to learn before. Therefore, I really appreciate Prof. Kuo for giving me this opportunity to study in MCL and explore more in this exciting area.
2. What is your impression about MCL and USC?
I really love the beautiful campus, friendly atmosphere and enjoy the convenient facilities of USC. We have a fight-on spirit as Trojans, which makes USC a great network and everyone in this network passionate and motivated. MCL is a large and efficient group with a hard-working spirit and intelligent students with solid skills and deep knowledge under the supervision of Prof. Kuo, who is a very nice and responsible advisor profound in knowledge and research.

3. What is your future expectation and plan in MCL?
I will definitely work hard, learn solid skills of math, problems solving and research and hope to contribute to MCL in the future. I wish that in the near future, the MCL group and myself would have a solid standing in the machine learning and computer vision community and contribute to the improvement of this field.

By |September 29th, 2019|News|Comments Off on Welcome New MCL Member – Hongyu Fu|

MCL Research on Robot Learning

Title: Robot Learning via Human Adversarial Games
Author: Jiali Duan

Much work in robotics has focused on “humanin-the-loop” learning techniques that improve the efficiency of the learning process. However, these algorithms have made the strong assumption of a cooperating human supervisor that assists the robot. In reality, human observers tend to also act in an adversarial manner towards deployed robotic systems. We show that this can in fact improve the robustness of the learned models by proposing a physical framework that leverages perturbations applied by a human adversary, guiding the robot towards more robust models. In a manipulation task, we show that grasping success improves significantly when the robot trains with a human adversary as compared to training in a self-supervised manner. We validate our approach in a self-brewed simulator for human-robot interaction. Our work has been selected as Best Paper Finalist for IROS 2019 and more details can be found at: https://arxiv.org/abs/1903.00636.

Before training:


After training:

By |September 22nd, 2019|News|Comments Off on MCL Research on Robot Learning|

Congratulations to Yueru Chen for Passing Her PhD Defense

The thesis is entitled “Object Classification Based on Neural-Network-Inspired Image Transforms”.

Abstract of the thesis:

Convolutional neural networks (CNNs) have recently demonstrated impressive performance in image classification and change the way building feature extractors from carefully handcrafted design to automatically deep learned from a large labeled dataset. However, a great majority of current CNN literature are application-oriented, and there is no clear understanding and theoretical foundation to explain the outstanding performance and indicate the way to improve. In this thesis, we focus on solving the image classification problem-based on the neural-network-inspired transforms.

Being motivated by the multilayer RECOS (REctified-COrrelations on a Sphere) transform, two data-driven signal transforms are proposed, called the “Subspace approximation with augmented kernels” (Saak) transform and “Subspace approximation with adjusted bias” (Saab) transform corresponding to each Convolutional layers in CNNs. Based on the Saak transform, We firstly proposed an efficient, scalable and robust approach to the handwritten digits recognition problem. Next, we also develop an ensemble method using Saab transform to solve the image classification problem. The ensemble method fuses the output decision vectors of Saab-transform-based decision system. To enhance the performance of the ensemble system, it is critical to increasing the diversity of FF-CNN models. To achieve this objective, we introduce diversities by adopting three strategies: 1) different parameter settings in convolutional layers, 2) flexible feature subsets fed into the Fully-connected (FC) layers, and 3) multiple image embeddings of the same input source.  We also extend our ensemble method to semi-supervised learning. Since unlabeled data may not always enhance semi-supervised learning, we define an effective quality score and use it to select a subset of unlabeled data in the training process. In the last, we proposed a unified framework, called successive subspace learning [...]

By |September 15th, 2019|News|Comments Off on Congratulations to Yueru Chen for Passing Her PhD Defense|

Congratulations to Yuanhang Su for Passing His Defense

Congratulations to Yuanhang Su for passing his PhD defense on April 16, 2019. His PhD thesis is entitled “Theory of Memory-enhanced Neural Systems and Image-assisted Neural Machine Translation”.


My research focus on sequence learning systems (whose input can be language, speech, video, etc.) and answering the following questions: what is memory and how we can build a system that can learn efficiently by remembering? Does visual imagination can help and if yes, how we can build a system that handles both language and vision? The foundation we built for the former question are two computing architectures: one is Extended Long Short-term Memory (ELSTM) for Recurrent Neural Network (RNN) modeling and the other is Tree-structured Multi-stage Principal Component Analysis (TMPCA) for language embedding. They are derived from the perspective of memory as a system function and memory as compact information representation (or dimension reduction (DR)) respectively. From the first perspective, we did detailed analysis of RNN cells model, demystified their model properties and concluded that all existing RNN cells (RNN, LSTM and GRU) suffer memory decay. The new proposal of ELSTM does not have such limitation. It has outstanding performance for complex language tasks. From the second perspective, PCA-based technique is utilized to do sequence embedding for the sake of maximizing input/output mutual information and explainable machine learning. The proposed TMPCA computes much faster than ordinary PCA while retaining much of the other merits. To answer the latter question, we argued that visual information can benefit the language learning task by increasing the system’s mutual information and successfully deployed a Transformer-based multi-modal NMT system that is trained/fine-tuned unsupervisedly on image captioning dataset. It is one of the first such systems ever developed for unsupervised MT and the [...]

By |April 22nd, 2019|News|Comments Off on Congratulations to Yuanhang Su for Passing His Defense|

Congratulations to Ron Salloum for Passing His Defense

Congratulations to Ron Salloum for passing his PhD defense on April 10, 2019. His PhD thesis is entitled “A Data-Driven Approach to Image Splicing Localization”.


The availability of low-cost and user-friendly editing software has made it significantly easier to manipulate images. Thus, there has been an increasing interest in developing forensic techniques to detect and localize image manipulations or forgeries. Splicing, which is one of the most common types of image forgery, involves copying a region from one image (referred to as the donor image) and pasting it onto another image (referred to as the host image). Forgers often use splicing to give a false impression that there is an additional object present in the image, or to remove an object from the image.

Many of the current splicing detection methods only determine whether a given image has been spliced and do not attempt to localize the spliced region. Relatively few methods attempt to tackle the splicing localization problem, which refers to the problem of determining which pixels in an image have been manipulated as a result of a splicing operation.

In my dissertation, I present two different splicing localization methods that we have developed. The first is the Multi-task Fully Convolutional Network (MFCN), which is a neural-network-based method that outperforms previous methods on many datasets. The second proposed method is based on cPCA++ (where cPCA stands for contrastive Principal Component Analysis), which is a new data visualization and clustering technique that we have developed. The cPCA++ method is more efficient than the MFCN and achieves comparable performance.

PhD Experience:

Pursuing my PhD degree was a very challenging but rewarding experience. I really enjoyed my time in the Media Communications Laboratory and had the opportunity to work on exciting research projects. [...]

By |April 16th, 2019|News|Comments Off on Congratulations to Ron Salloum for Passing His Defense|

MCL Research on Fake Image Detection

With the rapid development of image processing technology, generating an image without obvious visual artifacts becomes much easier. Progressive GAN have generated high resolution images which can almost fool human eyes. In this case, fake image detection is a must. Currently, convolutional neural network based method is tested by many researchers to do GAN image detection. They build deeper and deeper network, such as XceptionNet, in order to have higher ability to distinguish real and fake. These CNN-based methods have achieved very high accuracy of more than 99%. 
We want to build an interpretable method compared to others, with no back-propagation, and aims to achieve similar accuracy. In our method, we first detect 68 facial landmarks from both real and fake images. Then extract 32*32 patches which are centered by the 68 facial landmarks. Those patches together with their label, will be fed into two layers’ Saab architecture. After two fully connected layers, the probability of the patches being fake or real will be stored in 2 by 1 output vector. For 68 facial landmarks, we train 68 models. The output 68 2 by 1 vectors will be fed into a SVM classifier, and output the decision of whether the whole training image will fake or real. 
Author: Yao Zhu

By |April 8th, 2019|News|Comments Off on MCL Research on Fake Image Detection|