USC Media Communications Lab

Permalink Gallery
Welcome New MCL Member Yifan Wang

Welcome New MCL Member Yifan Wang

We are so glad to welcome our new MCL member, Yifan Wang! Here is a short interview with Yifan:

1. Could you briefly introduce yourself and your research interests?

My name is Yifan Wang, a graduate student in Department of Electrical and Engineering of USC. I received my bachelor’s degree in electrical engineering from Fudan University. When I was an undergraduate student, I learnt courses from multi-fields including digital and analog circuit, computer architecture etc. Among these, I found my interests lies in vision fields which is of vital importance to both human and machine.

2. What is your impression about MCL and USC?

I have been USC for two semesters. It is a really nice place for studying, since there are few entertainment sites outside school along with the super good sun shine prohibit me from going outside regularly. MCL is a large warn family where I met lots of talent people. It would be nice to make new friends and learn new things in MCL.

3. What is your future expectation and plan in MCL?

I would like to learn more theories and mathematics related to computer vision which would be the foundation of my future research. Besides, I would like to gain more practice in deep learning fields which are highly hardware relied.

By Yijing Yang|June 9th, 2019|News|Comments Off|

Permalink Gallery
MCL Research on Word Embedding

MCL Research on Word Embedding

Word embeddings have been widely applied across several NLP tasks. The goal for word embedding is to transferring words into vector representations which embeds both syntactic and semantic information. General word embedding is usually generated by training on a large corpus like the whole wiki text data.

Our first work is mainly focus on improving the performance over trained word embedding models to make is more representative. The motivations are: (1) Even though current model are trained without considering the order of each dimension. But the obtain word embedding is usually carries a large mean and the variance is mostly lies on the first several principal components. This could lead hubness problem and we would like to analysis the statistics to make the whole space more iso-tropical. (2) The information of ordered input sequences is lost because of the context-based training scheme. From the above analysis, we proposed two ways to perform post-processing of word embedding call Post-processing via Variance Normalization (PVN) and Post-processing via Dynamic Embedding (PDE). The effectiveness of our model is verified over both intrinsic and extrinsic evaluation methods. For details, please refer to: [1].

During the past several years, word embedding is very popular, but the evaluation is mainly conducted over intrinsic evaluation methods because of their convenience. In Natural Language Processing society, we care about more the effective of word embedding on real NLP tasks like translation, sentiment analysis and question answering. Our second word focus on the word embedding quality and its relationship with evaluation methods. We have discussed criterions that a good word embedding should have and also for evaluation methods. Also, the properties of intrinsic evaluation methods are discussed because different intrinsic evaluator tests from different perspectives. Finally, [...]

By Yijing Yang|June 2nd, 2019|News|Comments Off|

Permalink Gallery
MCL Research on Point Cloud Classification

MCL Research on Point Cloud Classification

With the rise of visualization, animation and autonomous driving applications, the demand for 3D point cloud analysis and understanding has rapidly increased. Point Cloud is a kind of data obtained from lidar scanning which contains abundant 3D information. ModelNet40 is a point cloud dataset contains 40 classes of objects. In this project, we use ModelNet40 dataset for the analysis and evaluation of point cloud classification. Many of the recent works focus on developing end to end algorithm like other convolutional neural networks for images. However, object and scene understanding with Convolutional Neural Networks (CNNs) on 3D volumetric data is still limited due to its high memory requirement and computational cost. For some simple tasks like classification, this method is too much.

An interpretable CNN design based on the feedforward (FF) methodology [1] without any backpropagation (BP) was recently proposed by the Media Communications Lab at USC. The classification baseline is composed by four Saab units, each unit contains KNN query, space grouping and Saab transform, and between units we use farthest sampling to improve efficiency. We are still working on it to improve as much as possible. Our goal is to catch up with the state-of-the-art results and show that FF design is powerful and useful. The advantages of the FF design methodology are multiple folds. It is completely interpretable. It demands much less training complexity and training data. Furthermore, it can be generalized to weakly supervised or unsupervised learning scenarios in a straightforward manner. The latter is extremely important in real world application scenarios since data labeling is very tedious and expensive.

The advantages of the FF design methodology are multiple folds. It is completely interpretable. It demands much less training complexity and training data. [...]

By Yijing Yang|May 26th, 2019|News|Comments Off|

Permalink Gallery
MCL Members Attended the PhD Hooding Ceremony

MCL Members Attended the PhD Hooding Ceremony

Nine MCL members attended the Viterbi PhD hooding ceremony on Thursday, May 9, 2019, from 8:30-11:00 a.m. in the Bovard Auditorium. They were Fenxiao Chen, Yueru Chen, Ronald Salloum, Yuhang Song, Yuanhang Su, Ye Wang, Chao Yang, Heming Zhang, and Junting Zhang. Congratulations to them for their accomplishments in completing their PhD program at USC!

Fenxiao (Jessica) Chen received the B.S. degree in General Engineering from Harvey Mudd College, Claremont, CA in 2014. She then continued with her PhD in Media Communications Lab at USC from 2017. Her research interests include natural language processing and deep learning.

Yueru Chen received her Bachelor’s degree in Physics from the University of Science and Technology of China in June 2014. Since 2015, she joined MCL for the PhD study. Her thesis topic is “Object Classification based on Neural-network-inspired Image Transforms”, where she focuses on solving the image classification problem based on the neural-network-inspired Saak transform and Saab transform.

Ronald Salloum received his B.S. degree in Electrical Engineering from California State Polytechnic University, Pomona, and his Ph.D. degree in Electrical Engineering from University of Southern California (USC). The title of his dissertation is “A Data-Driven Approach to Image Splicing Localization.” His research interests include multimedia forensics, machine learning, and biometrics.

Yuhang Song received his Bachelor’s degree in Electronic Engineering from Tsinghua University, Beijing in 2014. He then joined MCL to pursue Ph.D. degree in Electrical Engineering at USC from 2015. His research interests include deep generative models, image generation, visual relationship detection, and visual understanding.

Yuanhang Su received his Ph.D. at the University of Sothern California (USC) in computer vision, natural language processing and machine learning. He received M.S. degree from the USC in 2010 and the dual B.S. degree from the University [...]

By Yijing Yang|May 19th, 2019|News|Comments Off|

Permalink Gallery
Congratulations to Harry Yang for Passing His Defense!

Congratulations to Harry Yang for Passing His Defense!

Congratulations to Harry Yang for passing his defense on May 7, 2019! Let us hear what he would like to say about his defense and an abstract of his thesis.

“In the thesis, we tackle the problem of translating faces and bodies between different identities without paired training data: we cannot directly train a translation module using supervised signals in this case. Instead, we propose to train a conditional variational auto-encoder (CVAE) to disentangle different latent factors such as identity and expressions. In order to achieve effective disentanglement, we further use multi-view information such as keypoints and facial landmarks to train multiple CVAEs. By relying on these simplified representations of the data we are using a more easily disentangled representation to guide the disentanglement of image itself. Experiments demonstrate the effectiveness of our method in multiple face and body datasets. We also show that our model is a more robust image classifier and adversarial example detector comparing with traditional multi-class neural networks.

“To address the issue of scaling to new identities and also generate better-quality results, we further propose an alternative approach that uses self-supervised learning based on StyleGAN to factorize out different attributes of face images, such as hair color, facial expressions, skin color, and others. Using pre-trained StyleGAN combined with iterative style inference we can easily manipulate the facial expressions or combine the facial expressions of any two people, without the need of training a specific new model for each of the identity involved. This is one of the first scalable and high-quality approach for generating DeepFake data, which serves as a critical first step to learn a more robust and general classifier against adversarial examples.”

Harry also shared about his Ph.D. experience:

“Firstly, I would [...]

By Yijing Yang|May 12th, 2019|News|Comments Off|

Permalink Gallery
Welcome New MCL Member Dr. Na Li

Welcome New MCL Member Dr. Na Li

We are so glad to welcome our new MCL member, Dr. Na Li!

Dr. Li is an Assistant Professor at Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS). Currently, she is a visiting scholar at MCL in USC, under the supervision of Prof. C.-C. Jay Kuo. Here is a short interview with Dr Li:

1. Could you briefly introduce yourself and your research interests?

I’m Na Li, Ph.D., an Assistant Professor at Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS). In 2009, I graduated from Hunan University with Bachelor degree on Computer Science and Technology, Changsha. In 2014, I received Ph.D. degree at Institute of Automation (IA), Chinese Academy of Sciences, Beijing. Since 2014, I join SIAT CAS at Shenzhen. I was doing internship as a research associate in High Performance Department in SAP China (Beijing) from Sep. 2013 to Nov. 2013. I am a big fun of Mobius Band. I mainly focus on intelligent video processing and analysis. My research interest including video coding, crowd behavior analysis, reinforcement learning, optimization, scheduling algorithm and related fields.

2. What is your impression about MCL and USC?

MCL is a big family guided by a big father. It consists of a bunch of guys with great mind. I like the seminar held in every Friday very much, which is full of sharing of food, knowledge and experiences. Everyone in MCL are trying their best to be excellent and contribute to the fields of both academy and industry. I am impressed by the way MCL and USC are well organized. The education system of USC is strong enough to build up great Trojans.

3. What is your future expectation and plan in MCL?

I am looking forward to [...]

By Yijing Yang|May 5th, 2019|News|Comments Off|

Permalink Gallery
MCL Research on Multi-model Neural Machine Translation

MCL Research on Multi-model Neural Machine Translation

Our long-term goal is to build intelligent systems that can perceive their visual environment and understand the linguistic information, and further make an accurate translation inference to another language. However, most multi-modal translation algorithms are not significantly better than an off-the-shelf text-only machine translation (MT) model. There remains an open question about how translation models should take advantage of visual context, because from the perspective of information theory, the mutual information of two random variables I(X; Y) will always be no greater than I(X, Z; Y) where Z is the additional visual input. This conclusion makes us believe that the visual content will hopefully help the translation systems.

Since the standard paradigm of multi-modal translation always considers the problem as a supervised learning task, the parallel corpus is usually sufficient to train a good translation model, and the gain from the extra image input is very limited. We however argue that the text-only UMT is fundamentally an ill-posed problem, since there are potentially many ways to associate target with source sentences. Intuitively, since the visual content and language are closely related, the image can play the role of a pivot “language” to bridge the two languages without paralleled corpus, making the problem “more well-defined” by reducing the problem to supervised learning.

We tackle the unsupervised translation with a multi-modal framework which includes two sequence-to-sequence encoder-decoder models and one shared image feature extractor in order to achieve the unsupervised translation. We employ transformer in both the text encoder and decoder of our model and design a novel joint attention mechanism to simulate the relationships among the language and visual domains.

Succinctly, our contributions are three-fold:

We formulate the multi-modal MT problem as unsupervised setting that fits the real [...]

By Yijing Yang|April 28th, 2019|News|Comments Off|

Permalink Gallery
Congratulations to Yuanhang Su for Passing His Defense

Congratulations to Yuanhang Su for Passing His Defense

Congratulations to Yuanhang Su for passing his PhD defense on April 16, 2019. His PhD thesis is entitled “Theory of Memory-enhanced Neural Systems and Image-assisted Neural Machine Translation”.

Abstract:

My research focus on sequence learning systems (whose input can be language, speech, video, etc.) and answering the following questions: what is memory and how we can build a system that can learn efficiently by remembering? Does visual imagination can help and if yes, how we can build a system that handles both language and vision? The foundation we built for the former question are two computing architectures: one is Extended Long Short-term Memory (ELSTM) for Recurrent Neural Network (RNN) modeling and the other is Tree-structured Multi-stage Principal Component Analysis (TMPCA) for language embedding. They are derived from the perspective of memory as a system function and memory as compact information representation (or dimension reduction (DR)) respectively. From the first perspective, we did detailed analysis of RNN cells model, demystified their model properties and concluded that all existing RNN cells (RNN, LSTM and GRU) suffer memory decay. The new proposal of ELSTM does not have such limitation. It has outstanding performance for complex language tasks. From the second perspective, PCA-based technique is utilized to do sequence embedding for the sake of maximizing input/output mutual information and explainable machine learning. The proposed TMPCA computes much faster than ordinary PCA while retaining much of the other merits. To answer the latter question, we argued that visual information can benefit the language learning task by increasing the system’s mutual information and successfully deployed a Transformer-based multi-modal NMT system that is trained/fine-tuned unsupervisedly on image captioning dataset. It is one of the first such systems ever developed for unsupervised MT and the [...]

By Xuejing Lei|April 22nd, 2019|News|Comments Off|

Permalink Gallery
Congratulations to Ron Salloum for Passing His Defense

Congratulations to Ron Salloum for Passing His Defense

Congratulations to Ron Salloum for passing his PhD defense on April 10, 2019. His PhD thesis is entitled “A Data-Driven Approach to Image Splicing Localization”.

Abstract:

The availability of low-cost and user-friendly editing software has made it significantly easier to manipulate images. Thus, there has been an increasing interest in developing forensic techniques to detect and localize image manipulations or forgeries. Splicing, which is one of the most common types of image forgery, involves copying a region from one image (referred to as the donor image) and pasting it onto another image (referred to as the host image). Forgers often use splicing to give a false impression that there is an additional object present in the image, or to remove an object from the image.

Many of the current splicing detection methods only determine whether a given image has been spliced and do not attempt to localize the spliced region. Relatively few methods attempt to tackle the splicing localization problem, which refers to the problem of determining which pixels in an image have been manipulated as a result of a splicing operation.

In my dissertation, I present two different splicing localization methods that we have developed. The first is the Multi-task Fully Convolutional Network (MFCN), which is a neural-network-based method that outperforms previous methods on many datasets. The second proposed method is based on cPCA++ (where cPCA stands for contrastive Principal Component Analysis), which is a new data visualization and clustering technique that we have developed. The cPCA++ method is more efficient than the MFCN and achieves comparable performance.

PhD Experience:

Pursuing my PhD degree was a very challenging but rewarding experience. I really enjoyed my time in the Media Communications Laboratory and had the opportunity to work on exciting research projects. [...]

By Xuejing Lei|April 16th, 2019|News|Comments Off|

Permalink Gallery
MCL Research on Fake Image Detection

MCL Research on Fake Image Detection

With the rapid development of image processing technology, generating an image without obvious visual artifacts becomes much easier. Progressive GAN have generated high resolution images which can almost fool human eyes. In this case, fake image detection is a must. Currently, convolutional neural network based method is tested by many researchers to do GAN image detection. They build deeper and deeper network, such as XceptionNet, in order to have higher ability to distinguish real and fake. These CNN-based methods have achieved very high accuracy of more than 99%.
We want to build an interpretable method compared to others, with no back-propagation, and aims to achieve similar accuracy. In our method, we first detect 68 facial landmarks from both real and fake images. Then extract 32*32 patches which are centered by the 68 facial landmarks. Those patches together with their label, will be fed into two layers’ Saab architecture. After two fully connected layers, the probability of the patches being fake or real will be stored in 2 by 1 output vector. For 68 facial landmarks, we train 68 models. The output 68 2 by 1 vectors will be fed into a SVM classifier, and output the decision of whether the whole training image will fake or real.
Author: Yao Zhu

By Xuejing Lei|April 8th, 2019|News|Comments Off|

Previous 30 313233 34 Next

News