USC Media Communications Lab

Permalink Gallery
MCL Research on Green Progressive Learning

MCL Research on Green Progressive Learning

Image classification has been studied for many years as a fundamental problem in computer vision. With the development of convolutional neural networks (CNNs) and the availability of larger scale datasets, we see a rapid success in the classification using deep learning for both low- and high-resolution images. Although being effective, one major challenge associated with deep learning is that its underlying mechanism is not transparent. Being inspired by deep learning, the successive subspace learning (SSL) methodology was proposed by Kuo et.al. in a sequence of papers. Different from deep learning, SSL-based methods learn feature representations in an unsupervised feedforward manner using multi-stage principle component analysis (PCA). Joint spatial-spectral representations are obtained at different scales through multi-stage transforms.

Applying the existing SSL-based model the classification takes usage of all the data at a time for the training, which is a single-round approach. Among the samples, there are easy samples which is usually of a high ratio in the dataset, and a portion of hard samples. Easy samples can achieve quite high conditional accuracy, while hard samples need further attention as the distribution are masked by the easy sample. This motivates the design of Green Progressive Learning, which adds more rounds of training progressive to zoom in to smaller and smaller subspace of hard samples. The selection of training samples to train the progressive learning in each round is critical to the performance gain. In each learning round, the hard training samples are re-selected to represent the subspace. Experiments on MNIST and Fashion-MNIST show the potential of progressive learning, which can help boost the performance of difficult cases.

— By Yijing Yang

Reference:

Chen and C.-C. J. Kuo, “Pixelhop: A successive subspace learning (ssl) method for object recognition,” Journal [...]

By Zhiruo Zhou|March 13th, 2022|News|Comments Off|

Permalink Gallery
MCL Research on Subspace Learning Machine

MCL Research on Subspace Learning Machine

Classification-oriented machine learning models have been well-studied in the past decades. The focus has shifted to deep learning (DL) in recent years. Feature learning and classification are handled jointly in DL models. Although the best performance of classification tasks is often achieved by DL through back propagation (BP), DL models suffer from lack of interpretability, high computational cost and high model complexity. Feature extraction and classification are treated as separate modules in classical machine learning. We focus on the classical learning paradigm and propose a new high-performance classifier with features as the input. Examples of classical classifiers include support vector machine (SVM), decision tree (DT) , multilayer perceptron(MLP) feedforward multilayer perceptron(FF-MLP) and extreme learning machine (ELM). SVM, DT and FF-MLP share one common idea, i.e., feature space partitioning. Inspired by the MLP, the DT and the ELM, a new classification model, called the subspace learning machine (SLM), is proposed aiming at general classification tasks.

The SLM attempts to efficiently partition the input feature space into multiple discriminant subspaces in a hierarchical manner and it works as follows: First, SLM identifies a discriminant subspace by examining the discriminant power of input features. Then, it applies random projections to input discriminant subspace features to yield p 1D subspaces and finds optimal partitions in each of them. This is equivalent to partitioning input space with p hyper-planes whose orientations and biases are determined by random projections and partitions, respectively. Among p projections, we develop a criterion to choose the best q partitions that yield 2q partitioned subspaces. The subspace partitioning process is repeated at each child node. When the samples are sufficiently pure at a child node, the partitioning process stops and SLM makes final predictions. SLM offers [...]

By Zhiruo Zhou|March 6th, 2022|News|Comments Off|

Permalink Gallery
MCL Research on Unsupervised Nuclei Segmentation

MCL Research on Unsupervised Nuclei Segmentation

Nuclei segmentation is a consequential task in biological image analysis, helping in the reading process of histology images. Different attributes, such as shape, population, cluster formation and density play a significant role in clinical practice for cancer diagnosis and its aggressiveness level assessment. Given that the annotation of this data is carried out by expertized pathologists who reportedly [2] need to spend on average 120-150 hours to annotate 50 image patches (about 12M pixels), one can realize that annotated data are in scarcity. That is a big impediment for supervised methods, particularly for DL-based solutions that need massive annotated data to learn generalizable representations. Moreover, the annotations have a high inter-observer variation which is subject to the experience of the annotator [1]. On top of that, nuclei color and texture variations across images from different laboratories and multiple organs further widen the gap between train and test domains.

Given the aforementioned limitations, a natural way to solve the problem is to pursue an unsupervised line of research. Also, given the limited number of annotated data, our proposed method decouples from the DL paradigm and utilizes conceptually simpler techniques that make the pipeline more transparent in terms of segmentation decision making. It is mainly based on prior knowledge about the nuclei segmentation problem. CBM [3] pipeline starts out with a data-driven Color (C) transform, to highlight the nuclei cell regions over the background, followed by an adaptive Binarization (B) process built on the bi-modal assumption in each local region. That process is being run in a patch-wise manner, to leverage the local distribution assumptions between background and foreground. The final part of the pipeline uses Morphological (M) transformations that refines the segmented output based on certain [...]

By Zhiruo Zhou|February 27th, 2022|News|Comments Off|

Permalink Gallery
MCL Research on Learning-based Image Coding

MCL Research on Learning-based Image Coding

Traditional image coding has achieved great success within four decades. Image coding standards have been developed and widely used today such as JPEG and JPEG-2000. Furthermore, intra coding schemes of modern video coding standards also provide very effective image coding solutions. Several powerful tools have been used to de-correlate the pixel values:
1. Block transform coding, which is used in the majority of the codecs where images are partitioned into blocks of different sizes and pixel values in blocks are transformed from the spatial domain to the spectrum domain for energy compaction before quantization and entropy coding.
2. Intra prediction, as another powerful tool that reduces the pixel correlation using pixel values from neighboring blocks at a low cost. Residuals after intra prediction are still coded by block transform coding.

Recently, deep-learning-based compression methods have attracted a lot of attention due to their superior rate-distortion performance. Compared with the traditional codecs, learned based codec has the following characteristic:
1. Inter correlations: Traditional image codecs only explore correlation in the same image while learning-based image codecs can exploit correlation from other images (i.e., inter-image correlation).
2. Multi-scale representation: Traditional image codecs only capture the representation with variable block size while learning-based image codecs can exploit the multi-scale representation based on pooling. In other words, traditional image codecs primarily explore correlation at the block level while learning-based image codecs can exploit short, middle, and long-range correlations using the multi-scale representation.
3. Advanced loss functions: different loss functions can be easily designed in learning-based schemes to fit the human visual system (HVS) and attention can be introduced to the learning-based schemes conveniently.

To achieve low-complexity learning-based image coding, we propose a multi-grid multi-block-size vector quantization (MGBVQ) method based on [...]

By Zhiruo Zhou|February 20th, 2022|News|Comments Off|

Permalink Gallery
MCL Research on Point Cloud Object Retrieval and Pose Estimation

MCL Research on Point Cloud Object Retrieval and Pose Estimation

Object pose estimation is an important problem in 3D scene understanding. Given a 3D point cloud object, it tries to estimate the 6-DOF pose comprising of rotation and translation with respect to a chosen coordinate system. The pose information can then be used for downstream tasks such as object grasping, obstacle avoidance, path planning, etc. which are commonly encountered in Robotics. In a complete scene understanding system, pose estimation usually comes after a 3D detection algorithm has localized and classified the object.

The pose estimation problem is similar to the problem of point cloud object registration which has been previously studied at MCL. In particular, the R-PointHop [1] method was proposed which successfully registers a source point cloud with a template. In the most recent work, we present a method termed PCRP that modifies R-PointHop for object pose estimation when a similar template object is unavailable. PCRP assumes a gallery set of pre-aligned point cloud objects and reuses the R-PointHop features to retrieve a similar object from the gallery. To do so, the pointwise features obtained using R-PointHop are aggregated into a global feature vector for nearest neighbor retrieval using the Vector of Locally Aggregated Descriptors (VLAD) [2]. Then, the input object’s pose is estimated by registering it with the retrieved object.

Though point cloud retrieval is extensively studied in contexts like shape retrieval or place recognition, retrieval in presence of different object poses is less talked of. In this work we show how the similar object can be retrieved even in presence of different object poses. This is achieved due to the rotation invariant features learned by R-PointHop. Another improvement over R-PointHop is the replacement of conventional eight octant partitioning based point attributes with more [...]

By Zhiruo Zhou|February 13th, 2022|News|Comments Off|

Permalink Gallery
MCL Research on Point Cloud Compression

MCL Research on Point Cloud Compression

Point Cloud Compression (PCC) has received a lot of attention in recent years due to its wide applications such as virtual reality (VR), augmented reality (AR), and mixed reality (MR). Video-based PCC (V-PCC) and geometry-based PCC (G-PCC) are two distinct technologies developed by MPEG 3DG[1][2]. Deep-learning-based (DL-based) PCC is a strong competitor to them. Most DL methods generalize the DL-based image coding pipeline to the point cloud data [3][4]. They outperform G-PCC in the current MPEG 3DG standard in the dense point cloud compression. Yet, their performances are still inferior to that of V-PCC in the coding of dynamic point clouds.

We propose to design a learning-based PCC solution that could outperform those DL-based methods with lower complexity and less memory consumption. Our method uses geometry projection to generate 2D images and apply vector quantization-based 2D image codec to compress the projected map. For a point cloud sequence, we can do the projection in three steps. First, split the sequence into blocks by doing the octree partition. Second, project each 3D block into a plane and pack all the planes into a map. Third, encode/decode the 2D map and reconstruct the 3D point cloud sequence. They are demonstrated in Fig.1. We do the non-uniform sampling for the projected planes and pack all the planes to generate one depth map and one texture map in the reconstruction process. The two maps are shown in Fig.2.

Presently, we utilize the x264/x265 codec to code the maps. In the future, we will adopt a vector quantization-based image codec to compress the two maps.

— Qingyang Zhou

Reference

[1] S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Cesar, P. A. Chou, R. A. Cohen, M. Krivoku ́ca, S. Lasserre, Z. Li et [...]

By Zhiruo Zhou|February 6th, 2022|News|Comments Off|

Permalink Gallery
MCL Research Interest in Blind Video Quality Assessment

MCL Research Interest in Blind Video Quality Assessment

Blind Video Quality Assessment (BVQA) aims to predict perceptual qualities solely on the received videos. BVQA is essential to applications where source videos are unavailable such as assessing the quality of user-generated content and video conferencing. Early BVQA models were distortion-specific and mainly focused on transmission and compression related artifacts. Recent work tried to consider spatial and temporal distortions jointly and trained a regression model accordingly. Although they can achieve good performance on datasets with synthetic distortions, they do not work well for user-generated content datasets. DL-based BVQA solutions were proposed recently. They outperform all previous BVQA solutions.

We propose to design a lightweight and interpretable BVQA solution that is suitable for mobile and edge devices while its performance is competitive with that of DL models. We need to select a basic processing unit for quality assessment. For a full video sequence, we can decompose it into smaller units in three ways. First, crop out a fixed spatial location to generate a spatial video (sv) patch. Second, crop out a specific temporal duration with full spatial information as a temporal video (tv) patch. Third, crop out a partial spatial region as well as a small number of frames as a spatial-temporal video (stv) patch. They are illustrated in Fig. 1. We will adopt STCs as the basic units for the proposed BVQA method. We will give each STC a BVQA score and then ensemble their individual scores to generate the ultimate score of the full video. The diagram is shown in Fig. 2.

After the STC features are extracted, we will train a classifier to each output response and then ensemble their decision scores to yield the BVQA score for one STC. For the model training, we [...]

By Wei Wang|January 31st, 2022|News|Comments Off|

Permalink Gallery
Professor Kuo Appointed as EiC for APSIPA Trans. on Signal and Information Processing

Professor Kuo Appointed as EiC for APSIPA Trans. on Signal and Information Processing

MCL Director, Professor C.-C. Jay Kuo, has been appointed as the Editor-in-Chief for the APSIPA Transactions on Signal and Information Processing (ATSIP) by the APSIPA Board of Governors. His term starts from January 1, 2022, for two years.
ATSIP was established in 2014. This is the 9th year for the journal. Professor Antonio Ortega of the University of Southern California served as its inaugural EiC from 2014-2017 and Professor Tatsuya Kawahara of Kyoto University was its 2nd EiC from 2018-2021. Professor Kuo expressed his deep gratitude to both Professor Ortega and Professor Kawahara for their contributions in laying out an excellent foundation of the journal. The photo was taken on Dec. 19, 2019, when Professor Kuo and his wife visited Professor Tatsuya Kawahara at Kyoto University.
ATSIP is an open-access e-only journal in partnership with the NOW Publisher. It serves as an international forum for signal and information processing researchers across a broad spectrum of research, ranging from traditional modalities of signal processing to emerging areas where either (i) processing reaches higher semantic levels (e.g., from speech/image recognition to multimodal human behavior recognition) or (ii) processing is meant to extract information from datasets that are not traditionally considered signals (e.g., mining of Internet or sensor information). Papers published in ATSIP are indexed by Scopus, EI and ESCI, searchable on the Web of Science, and included in the IEEE Xplore database.

By Wei Wang|January 17th, 2022|News|Comments Off|

Permalink Gallery
MCL Research Interest in Syntactic Structure Aware Sentence Similarity Modeling

MCL Research Interest in Syntactic Structure Aware Sentence Similarity Modeling

Text similarity modeling plays an important role in a variety of applications of Natural Language Processing (NLP), such as information retrieval, text clustering, and plagiarism detection. Moreover, it can work as an automatic evaluation metric in natural language generation, like machine translation and image captioning, so that expensive and time-consuming human labeling can be saved.

Word Mover’s Distance (WMD) [1] is an efficient model to measure the semantic distance of two texts. In WMD, word embedding which learns semantically meaningful representations for words are incorporated in earth mover’s distance. The distance between two texts A and B is the minimum cumulative distance that all words from the text A needs to travel to match exactly the text B.

We try to incorporate syntactic parsing, which brings meaningful structure information, into WMD in our work. There are mainly two parts that can control the flow in WMD. One is the distance matrix and the flow of each word. Firstly, to compute the distance matrix, the original WMD only compares an individual pair of word embeddings to measure the distance between words and doesn’t consider other information in the sentence. To measure the distance between words better, we first form sub-tree structures from the dependency parsing tree. Instead of only comparing the similarity of the word embeddings, we also compare the sub-tree similarity that contains the words. Secondly, A word’s flow can be regarded as the word’s importance. If giving more flow to important words, the most flow will transport between important words. So, the total transportation cost is mainly decided by the similarity of important words. We currently utilize the word’s dependency relation in the parsing tree to assign importance weights for words. In the future, we [...]

By Wei Wang|January 10th, 2022|News|Comments Off|

Permalink Gallery
Happy New Year!

Happy New Year!

At the beginning of 2022, We wish all MCL members a more wonderful year with everlasting passion and courage!

Image credit:

https://thenewspocket.com/100-best-happy-new-year-wishes-messages-quotes-2022/

By Wei Wang|January 2nd, 2022|News|Comments Off|

Previous 10 111213 14 Next

News