USC Media Communications Lab

Permalink Gallery
Congratulations to Jiali Duan for Passing His Defense

Congratulations to Jiali Duan for Passing His Defense

Let us hear what he has to say about his defense and an abstract of his thesis.

Deep learning has brought impressive improvements in many fields, thanks to end-to-end data-driven optimization. However, people have little control over the system during training and limited understanding about the structure of knowledge being learned. In this thesis, we study theory and applications of adversarial and structured knowledge learning: 1) learning adversarial knowledge with human interaction or by incorporating human-in-the-loop; 2) learning structured knowledge by modelling contexts and users’ preferences via distance metric learning.

In the first part, we teach a robotic arm to learn robust manipulation grasps that can withstand perturbations, through end-to-end optimization with a human adversary. Specifically, we formulate the problem as a two-player game with incomplete information, played by a human and a robot, where the human’s goal is to minimize the reward the robot can get. We then extend this idea to improve the sample efficiency of deep reinforcement learning by incorporating human in the training loop. We presented a portable, interactive and parallel platform for human-agent curriculum learning experience.

In the second part, we present two works that address different aspects of structured representation learning. First, we proposed a self-training framework to improve distance metric learning. The challenge is the noise in pseudo labels, which prevents exploiting additional unlabeled data. Therefore, we introduced a new feature basis learning component for the teacher-student network, which better measures pairwise similarity and selects high confidence pairs. Second, we address image-attribute query, which allows a user to customize image-to-image retrieval by designating desired attributes in target images. We achieve this by adopting a composition module, which enforces object-attribute factorization and an attribute-set synthesis module to deal with sample insufficiency.

Looking back, Prof. Kuo is [...]

By Zhiruo Zhou|March 7th, 2021|News|Comments Off|

Permalink Gallery
MCL Research on PixelHop with Attention

MCL Research on PixelHop with Attention

In human visual system, a given image is processed and important information is distilled in order to recognize the objects. The salient regions are informative to draw more human’s attention than other parts of the image, such as backgrounds. Similarly, in computer vision, attention maps can be used to identify and take advantage of the effective spatial support of visual information in making image classification decisions. Besides, it can also be used to help improve the separability of different classes. Other applications of attention also include weakly supervised semantic segmentation, adversarial robustness, weakly object localization, domain shift, etc.

The studies about attention can be categorized into two different types: 1) post-hoc network analysis and 2) trainable attention generation. The former type (such as CAM [1]) analyzes the CNN models after being trained on the image-level labels as a network reasoning process. In contrast, the trainable attention mechanisms (e.g. [2], [3]) use learning targets related to attention in order to generate separable and discriminative attention maps. All of these related work are built based on CNNs in an end-to-end manner which are of high time and computational complexity.

In our research, we try to extract attention maps based on features extracted from channel-wise Saab transform in a feedforward way, which was proposed by Chen et. al. in PixelHop++ [4]. Features from shallow to deep Hops are considered together as a representation for each pixel, since they represent different receptive fields. By putting more weight on the important regions based on the generated attention maps, we expect our model to get better recognition performance, because regions with irrelevant information which are confusing or shared among different classes will be suppressed. This will also make the classification system more [...]

By Wei Wang|February 28th, 2021|News|Comments Off|

Permalink Gallery
MCL Research on SSL-based Object Tracking

MCL Research on SSL-based Object Tracking

Object tracking is a fundamental computer vision problem that finds a wide range of applications, such as video surveillance, smart traffic system, autonomous driving cars and so on. Nowadays, most state-of-the-art object trackers adopt deep neural networks for high tracking performance at the expense of huge computational resources and heavy memory use. Here, we seek a more lightweight solution that requires fewer resources for training and inference and has a much smaller model size, thus making real-time tracking possible on small devices such as mobile phone and autonomous drones.

The proposed object tracker is built upon the PixelHop framework so that it is called OTHop (Object Tracking PixelHop). The term “hop” denotes the neighborhood of a pixel. OTHop conducts spectral analysis using Saab transform on neighborhoods of various sizes centered on a pixel through a sequence of cascaded dimension reduction units, which naturally forms a multi-resolution feature extraction scheme, thus helping capture unusual patterns that we should pay more attention to during tracking. Then we adopt the XGBoost classifier as the binary predictor to differentiate foreground pixels and background pixels. The classifier is pre-trained on some offline dataset and then updated online using either the initial frame or preceding frames with Saab coefficients as the input. Base on the classification results we derive the object bounding box.

To sum up, OTHop has the following main steps:

Extract joint spatial-spectral features based on the PixelHop framework;
Predict the probability of a spatial region, which can be of various sizes, of being a foreground object or a background region with a trained XGBoost binary classifier;
Fuse results obtained at different hops in Steps 2 to obtain the ultimate object bounding boxes.

The tracker is tested on the [...]

By Wei Wang|February 22nd, 2021|News|Comments Off|

Permalink Gallery
MCL Research on SSL-based Object Proposal

MCL Research on SSL-based Object Proposal

Object proposal algorithms are needed to find bounding boxes for salient class-agnostic objects. It is an important pre-processing step for object detection in the wild. While most state-of-the-art object detection methods adopt an end-to-end deep neural networks, we aim at an independent object proposal unit that has low complexity and high performance. The proposed light-weight object proposal can be combined with any classification process to reduce model and computation complexity.

Our current method is built upon the PixelHop framework, it is called OPPHop (Object Proposal PixelHop). The term “hop” denotes the neighborhood of a pixel. OPPHop conducts spectral analysis on neighborhoods of different sizes centered on a pixel through a sequence of cascaded dimension reduction units. The neighborhoods of an object contain salient contours and, as a result, they have distinctive spectral signatures at a certain scale that matches the object size. The distinctive regions can be predicted based on supervised learning with Saab coefficients as the input.

— by Hongyu Fu

By Wei Wang|February 15th, 2021|News|Comments Off|

Permalink Gallery
MCL Research on Fake Video Detection

MCL Research on Fake Video Detection

As the number of Deepfake video contents grows rapidly, automatic Deepfake detection has received a lot of attention in the community of digital forensics. Deepfake videos can be potentially harmful to society, from non-consensual explicit content creation to forged media by foreign adversaries used in disinformation campaigns.

A light-weight high-performance Deepfake detection method, called DefakeHop, is proposed in this work. State-of-the-art Deepfake detection methods are built upon deep neural networks. DefakeHop extracts features automatically using the successive subspace learning (SSL) principle from various parts of face images. DefakeHop consists of three main modules: 1) PixelHop++, 2) feature distillation and 3) ensemble classification. To derive the rich feature representation of faces, DefakeHop extracts features using PixelHop++ units from various parts of face images. The theory of PixelHop++ have been developed by Kuo et al. using SSL. PixelHop++ has been recently used for feature learning from low-resolution face images but, to the best of our knowledge, this is the first time that it is used for feature learning from patches extracted from high-resolution color face images. Since features extracted by PixelHop++ are still not concise enough for classification, we also propose an effective feature distillation module to further reduce the feature dimension and derive a more concise description of the face. Our feature distillation module uses spatial dimension reduction to remove spatial correlation in a face and a soft classifier to include semantic meaning for each channel. Using this module the feature dimension is significantly reduced and only the most important information is kept. With a small model size of 42,845 parameters, DefakeHop achieves state-of-the-art performance with the area under the ROC curve (AUC) of 100%, 94.95%, and 90.56% on UADFV, Celeb-DF v1 and Celeb-DF v2 datasets, [...]

By Wei Wang|February 8th, 2021|News|Comments Off|

Permalink Gallery
Professor Kuo Received Technology and Engineering Emmy Award

Professor Kuo Received Technology and Engineering Emmy Award

National Awards Committee’s Technology & Engineering Achievement Committee of the National Academy of Television Arts and Sciences (NATAS) made the announcement on the recipients of the 2020 Technology and Engineering Emmy Award on January 25, 2021. MCL Director, Professor C.-C. Jay Kuo, was one of the recipients for his work on “Development of Perceptual Metrics for Video Encoding Optimization.” Professor Kuo had a brief interview on this prestigious recognition.

Q: You are honored for your work in Development of Perceptual Metrics for Video Encoding Optimization. Can you explain in very basic terms, what exactly this technology is and what it is used for?

A: This Technology and Engineering Emmy award is the outcome of my research collaboration with Netflix. We developed a new video quality assessment method called VMAF (Video Multimethod Assessment Fusion). VMAF is used by Netflix not only for video quality assessment but also for video encoding optimization. VMAF contributes to high quality streaming video from Netflix as well as other video streaming service providers.

Q: In just one or two lines, can you share why it’s so important?

A: Netflix makes VMAF an open-source tool to maximize its impact. It is the de facto standard in video quality assessment for premium video content in video streaming industry.

Q: If possible, can you share some well-known shows/movies/streaming services that use this technology?

A: VMAF-optimized encodes have covered the majority of Netflix’s streaming hours today, and VMAF has been used as a quality monitoring tool for almost all of Netflix’s streaming hours. Outside of Netflix, many companies use VMAF as well. The list includes as Twitch, Hostar, Crunchyroll, Tencent Cloud, Billibilli, among others.

By Wei Wang|February 1st, 2021|News|Comments Off|

Permalink Gallery
Welcome MCL New Member Jiahui Zhang

Welcome MCL New Member Jiahui Zhang

We have a new member, Jiahui Zhang, joining MCL in Spring 2021. Here is a short interview with Jiazhi with our great welcome.

1. Could you briefly introduce yourself and your research interests?

My name is Jiahui Zhang, I am a second-year master student in the Department of Electronic Engineering in USC. I got my bachelor degree from Beijing University of Technology. I am a sports fan. In my spare time, I like playing sports and watching sports games. I also like traveling to view good scenery. My research interests include deep learning, computer vision especially representation learning.

2. What is your impression about MCL and USC?

USC is a great school that could provide student a enjoyable environment on living, communicating and studying.

MCL is a wonderful lab filled with a number of intelligent researchers. Everyone is an expert in their research field. Besides, people in MCL lab from Professor Kuo to every lab member are very kind and friendly. People help each other on living, studying, and researching, which build a warm environment in the lab.

3. What is your future expectation and plan in MCL?

MCL has many great and excellent researchers, and I want to study and make friends with them. For academic, I want to accomplish some projects to accumulate my research experiences and make contribution to the lab.

By Wei Wang|January 24th, 2021|News|Comments Off|

Permalink Gallery
MCL Research on New Interpretation of MLP

MCL Research on New Interpretation of MLP

Our work on new MLP interpretation includes:

Interpretable MLP design [1]:

A closed-form solution exists in two-class linear discriminant analysis (LDA), which discriminates two Gaussian-distributed classes in a multi-dimensional feature space. In this work, we interpret the multilayer perceptron (MLP) as a generalization of a two-class LDA system so that it can handle an input composed by multiple Gaussian modalities belonging to multiple classes. Besides input layer lin and output layer lout, the MLP of interest consists of two intermediate layers, l1 and l2. We propose a feedforward design that has three stages: 1) from lin to l1: half-space partitionings accomplished by multiple parallel LDAs, 2) from l1 to l2: subspace isolation where one Gaussian modality is represented by one neuron, 3) from l2 to lout: class-wise subspace mergence, where each Gaussian modality is connected to its target class. Through this process, we present an automatic MLP design that can specify the network architecture (i.e., the layer number and the neuron number at a layer) and all filter weights in a feedforward one-pass fashion. This design can be generalized to an arbitrary distribution by leveraging the Gaussian mixture model (GMM). Experiments are conducted to compare the performance of the traditional backpropagation-based MLP (BP-MLP) and the new feedforward MLP (FF-MLP).

MLP as a piecewise low-order polynomial approximator [2]:

The construction of a multilayer perceptron (MLP) as a piecewise low-order polynomial approximator using a signal processing approach is presented in this work. The constructed MLP contains one input, one intermediate and one output layers. Its construction includes the specification of neuron numbers and all filter weights. Through the construction, a one-to-one correspondence between the approximation of an MLP and that of a piecewise low-order polynomial is established. Comparison [...]

By Zhiruo Zhou|January 17th, 2021|News|Comments Off|

Permalink Gallery
MCL Research on AI for Health Care

MCL Research on AI for Health Care

Research related to the future development of Health Care systems is always a significant endeavor, by touching many people lives. AI advancements in the last decade have given rise to new applications, with key aim to increase the automation level of different tasks, currently being carried out by experts. In particular, medical image analysis is a fast-growing area, having also been revolutionized by modern AI algorithms for visual content understanding. Magnetic Resonance Imaging (MRI) is widely used by radiologists in order to shed more light on patient’s health situation. It can provide useful cues to experts, thus assisting to take decisions about the appropriate treatment plan, maintaining also less discomfort for the patient and incurring less economical risks in the treatment process.

The question arises, how modern AI could contribute to automate the diagnosis process and provide a second and more objective assessment opinion to the experts. Many research ideas from the visual understanding area, adopt the deep learning (DL) paradigm, by training Deep Neural Networks (DNNs) to learn end-to-end representations for tumor classification, lesion areas detection, specific organ segmentation, survival prediction etc. Yet, one could identify some limitations on using DNNs in medical image analysis. It is well known that it is often hard to collect sufficient real samples for training DL models. Furthermore, decisions made by machines need to be transparent to physicians and especially be aware of the factors that led to those decisions, so that they are more trustworthy. DNNs are often perceived as “black-box” models, since their feature representations and decision paths are hard to be interpreted.

In MCL, we consider a new line of research on AI for medical image analysis, by adopting the Green Learning (GL) approach to address [...]

By Zhiruo Zhou|January 10th, 2021|News|Comments Off|

Permalink Gallery
MCL Research on Scalable Weakly-Supervised Graph Learning

MCL Research on Scalable Weakly-Supervised Graph Learning

The success of deep learning and neural networks often comes at the price of a large number of labeled data. Weakly-supervised learning (WSL) is an important paradigm that leverages a large number of unlabeled data to address this limitation. The need for WSL has arisen in many machine learning problems and found wide applications in computer vision, natural language processing, and graph-based modeling, where getting labeled data is expensive and there exists a large amount of unlabeled data.

Among weakly-supervised graph learning methods, label propagation (LP) has demonstrated good adaptability, scalability, and efficiency for node classification. However, LP-based methods are limited in their capability of integrating multiple data modalities for effective learning. Due to the recent success of neural networks, there has been an effort of applying neural networks into graph-structured data. One pioneering technique, known as graph convolutional networks (GCNs), has achieved impressive node classification performance for citation networks. However, GCNs fail to exploit the label distribution in the graph structure and difficult to scale for large graphs.

In this work, we propose a scalable weakly-supervised node classification method on graph-structured data, called GraphHop, where the underlying graph contains attributes of all nodes but labels of few nodes. Our method is an iterative algorithm that overcomes the deficiencies in LP and GCNs. With proper initial label vector embeddings, each iteration contains two steps: 1) label aggregation and 2) label update. In Step 1, each node aggregates its neighbors’ label vectors obtained in the previous iteration. In Step 2, a new label vector is predicted for each node based on the label of the node itself and the aggregated label information obtained in Step 1. This iterative procedure exploits the neighborhood information and enables GraphHop to [...]

By Zhiruo Zhou|January 3rd, 2021|News|Comments Off|

Previous 21 222324 25 Next

News