
MCL Research on Advanced Image Generation

A progressive edge-guided generative model, called ProGenHop, is presented in this work.

The majority of existing generative models utilize neural networks to represent the underlying data distribution. Alternatively, ProGenHop offers a novel method based on Successive Subspace Learning (SSL) for feature extraction and image generation. The benefits of ProGenHop are its interpretability and significantly lower computational complexity, as opposed to compulationally-complex, black-box neural networks. ProGenHop maintains generation quality with a small training size. Moreover, ProGenHop is easily extendable to further generative model applications, such as attribute-guided image generation, super resolution, and high-resolution image generation.

A generative model learns a distribution for the underlying dataset during the training phase. During the generation phase, samples can be drawn from the distribution as new data. Most of the prior work like the GAN-based, VAE-based, and diffusion-based generative models utilize neural networks to learn complex non-linear transformations. In our work, we utilize the SSL pipeline for feature extraction. An SSL pipeline consists of consecutive Saab transformations. In essence, the SSL pipeline receives an RGB image and converts it into a feature vector. Since Saab transformation is a variant of Principle Component Analysis (PCA), it inherits PCA properties; One of these nice properties is that the Saab transformation generates feature vectors with uncorrelated components. This property facilitates the utilization of Gaussian priors for generative model training.

ProGenHop is an unconditional generative model which has a progressive approach in generating images: it starts the unconditional generation in a low-resolution regime, then sequentially increases the resolution via a cascade of conditional generation modules. ProGenHop has three modules, namely, Generation Core, Resolution Enhancer, and Quality Booster. The first module learns the distribution of low-resolution images using a Gaussian mixture model and performs unconditional image [...]

MCL Research on MRI Imaging of Lung Ventilation

Chronic diseases like chronic obstructive pulmonary disease (COPD) and asthma have high prevalence and reduce the compliance of the lung, thereby impeding normal ventilation. Functional lung imaging is of vital importance for the diagnosis and evaluation of these lung deseases. In recent years, high performance low field systems have shown great advantages for lung MRI imaging due to reduced susceptibility effects and improved vessel conspicuity. These MRI configurations provide improved field homogeneity compared with conventional field strengths (1.5T, 3.0T). More possibilities are brought to the researchers to detect regional volume changes throughout the respiratory cycle at lower field strengths, such as 0.55T.
Recently, under the collabration between Dynamic Imaging Science Center (DISC) and MCL, an approach for regional functional lung ventilation mapping using real-time MRI has been developed. It leverages the improved lung imaging and improved real-time imaging capability at 0.55T, without requiring contrast agents, repetition, or breath holds. In the image acquisition, a sequence of MRI in the time series representing several consecutive respiratory cycles is captured. To resolve the regional lung ventilation, an unsupervised non-rigid image registration is applied to register the lungs from different respiratory states to the end-of-exhalation. Deformation field is extracted to study the regional ventilation. Specifically, a data-driven binarization algorithm for segmentation is firstly applied to the lung parenchyma area and vessels, separately. A frame-by-frame salient point extraction and matching are performed between the two adjacent frames to form pairs of landmarks. Finally, Jacobian determinant (JD) maps are generated using the calculated deformation fields after a landmark-based B-spline registration.
In the study, the regional lung ventilation is analyzed on three breathing patterns. Besides, posture-related ventilation differences are also demonstrated in the study. It reveals that real-time image acquisition [...]

MCL Research on Steganalysis

With the pervasive of image steganography algorithms in social media, image steganalysis becomes inevitably important nowadays. One of the most secure steganographic scheme is called content-adaptive steganography. WOW, S-UNIWARD, HUGO and HILL are all successful steganography algorithms of this kind. Since content-adaptive steganography will calculate embedding cost after they evaluate the cover image, and will tend to put more embeddings in complex regions. It makes it harder for image steganalyzers to detect if the image has been embedded information or not.

Our goal is to provide a data-driven method, which does not apply hand-crafted high-pass filtering in preprocess step or any neural network based architectures. We have unsupervised feature extraction and machine learning-based classifier to fulfill the task. Specifically, we first split input image into 3×3 blocks, and partition blocks into several groups based their embedding cost. We use Saab transform to extract features on blocks and make decision. The difference of soft decision scores on cover image and stego image are efficient for us to do image-wise decision. In order to find the embed locations in unseen images, we train an embed location classifier from block soft decision scores, as shown in Fig. 1. Based on embed location probability score from each group, we train the final image-wise ensemble classifier and give us the image-level decision, as shown in Fig.2 .

Compared to CNN-based steganalysis models, our method does not use end-to-end training and backward propagation. Therefore, it is very light-weight in terms of model size and memory usage. In the meantime, our method can beat all traditional steganalysis method and some benchmarking CNN-based model.


— by Yao Zhu

MCL Research on Advanced Deepfake Video Detection

A robust fake satellite image detection method, called Geo-DefakeHop, is proposed in this work. Geo-DefakeHop is developed based on the parallel subspace learning (PSL) methodology. PSL maps the input image space into several feature subspaces using multiple filter banks. By exploring response differences of different channels between real and fake images for filter banks, Geo-DefakeHop learns the most discriminant channels and uses their soft decision scores as features. Then, Geo-DefakeHop selects a few discriminant features by validation dataset from each filter bank and ensembles them to make a final binary decision. Geo-DefakeHop offers a light-weight high-performance solution to fake satellite images detection. Its model size is analyzed, which ranges from 0.8 to 62K parameters. Furthermore, it is shown by experimental results that it achieves an F1-score higher than 95% under various common image manipulations such as resizing, compression, and noise corruption.

— By Max Chen

MCL Research on Graph Learning

Graph-based semi-supervised learning has shown prominent performance in node classification task by exploiting the underlying manifold structure of data. Recently, an enhancement on the classical label propagation (LP) named GraphHop is proposed, which has outperformed the existing graph convolutional networks (GCNs) on various networks. Although the superior performance in GraphHop model is explained in the view of smoothening both node attribute and label signals, its mechanisms are still not fundamentally clear.

In this work, we develop deeper insights into the the GraphHop model from the point of regularization framework. We show that GraphHop model can be cast into an iterative approximated optimization of a particular regularization function on graphs. Then, based on this variational interpretation, we propose two approaches to address the limits in the GraphHop model due to the approximated optimization process. In particular, these are 1) additional aggregations in optimizing the label embeddings; 2) adaptively selecting of the reliable unlabeled samples for the classifier training. Experiments show that equipped with these two improvements, our model called GraphHop++ is able to gain significantly better performance than the former GraphHop model, in addition to the state-of-the-art methods on various benchmark networks with limited label rates.

— By Tian Xie

MCL Research on Green Progressive Learning

Image classification has been studied for many years as a fundamental problem in computer vision. With  the development of convolutional neural networks (CNNs) and the availability of larger scale datasets, we see a rapid success in the classification using deep learning for both low- and high-resolution images. Although being effective, one major challenge associated with deep learning is that its underlying mechanism is not transparent. Being inspired by deep learning, the successive subspace learning (SSL) methodology was proposed by Kuo in a sequence of papers. Different from deep learning, SSL-based methods learn feature representations in an unsupervised feedforward manner using multi-stage principle component analysis (PCA). Joint spatial-spectral representations are obtained at different scales through multi-stage transforms.

Applying the existing SSL-based model the classification takes usage of all the data at a time for the training, which is a single-round approach. Among the samples, there are easy samples which is usually of a high ratio in the dataset, and a portion of hard samples. Easy samples can achieve quite high conditional accuracy, while hard samples need further attention as the distribution are masked by the easy sample. This motivates the design of Green Progressive Learning, which adds more rounds of training progressive to zoom in to smaller and smaller subspace of hard samples. The selection of training samples to train the progressive learning in each round is critical to the performance gain. In each learning round, the hard training samples are re-selected to represent the subspace. Experiments on MNIST and Fashion-MNIST show the potential of progressive learning, which can help boost the performance of difficult cases.

— By Yijing Yang


Chen and C.-C. J. Kuo, “Pixelhop: A successive subspace learning (ssl) method for object recognition,” Journal [...]

MCL Research on Subspace Learning Machine

Classification-oriented machine learning models have been well-studied in the past decades. The focus has shifted to deep learning (DL) in recent years. Feature learning and classification are handled jointly in DL models. Although the best performance of classification tasks is often achieved by DL through back propagation (BP), DL models suffer from lack of interpretability, high computational cost and high model complexity. Feature extraction and classification are treated as separate modules in classical machine learning. We focus on the classical learning paradigm and propose a new high-performance classifier with features as the input. Examples of classical classifiers include support vector machine (SVM), decision tree (DT) , multilayer perceptron(MLP) feedforward multilayer perceptron(FF-MLP) and extreme learning machine (ELM). SVM, DT and FF-MLP share one common idea, i.e., feature space partitioning. Inspired by the MLP, the DT and the ELM, a new classification model, called the subspace learning machine (SLM), is proposed aiming at general classification tasks.

The SLM attempts to efficiently partition the input feature space into multiple discriminant subspaces in a hierarchical manner and it works as follows: First, SLM identifies a discriminant subspace by examining the discriminant power of input features. Then, it applies random projections to input discriminant subspace features to yield p 1D subspaces and finds optimal partitions in each of them. This is equivalent to partitioning input space with p hyper-planes whose orientations and biases are determined by random projections and partitions, respectively.  Among p projections, we develop a criterion to choose the best q partitions that yield 2q partitioned subspaces. The subspace partitioning process is repeated at each child node.  When the samples are sufficiently pure at a child node, the partitioning process stops and SLM makes final predictions. SLM offers [...]

MCL Research on Unsupervised Nuclei Segmentation

Nuclei segmentation is a consequential task in biological image analysis, helping in the reading process of histology images. Different attributes, such as shape, population, cluster formation and density play a significant role in clinical practice for cancer diagnosis and its aggressiveness level assessment.  Given that the annotation of this data is carried out by expertized pathologists who reportedly [2] need to spend on average 120-150 hours to annotate 50 image patches (about 12M pixels), one can realize that annotated data are in scarcity. That is a big impediment for supervised methods, particularly for DL-based solutions that need massive annotated data to learn generalizable representations. Moreover, the annotations have a high inter-observer variation which is subject to the experience of the annotator [1]. On top of that, nuclei color and texture variations across images from different laboratories and multiple organs further widen the gap between train and test domains.

Given the aforementioned limitations, a natural way to solve the problem is to pursue an unsupervised line of research. Also, given the limited number of annotated data, our proposed method decouples from the DL paradigm and utilizes conceptually simpler techniques that make the pipeline more transparent in terms of segmentation decision making. It is mainly based on prior knowledge about the nuclei segmentation problem. CBM [3] pipeline starts out with a data-driven Color (C) transform, to highlight the nuclei cell regions over the background, followed by an adaptive Binarization (B) process built on the bi-modal assumption in each local region. That process is being run in a patch-wise manner, to leverage the local distribution assumptions between background and foreground. The final part of the pipeline uses Morphological (M) transformations that refines the segmented output based on certain [...]

MCL Research on Learning-based Image Coding

Traditional image coding has achieved great success within four decades. Image coding standards have been developed and widely used today such as JPEG and JPEG-2000. Furthermore, intra coding schemes of modern video coding standards also provide very effective image coding solutions. Several powerful tools have been used to de-correlate the pixel values:
1. Block transform coding, which is used in the majority of the codecs where images are partitioned into blocks of different sizes and pixel values in blocks are transformed from the spatial domain to the spectrum domain for energy compaction before quantization and entropy coding.
2. Intra prediction, as another powerful tool that reduces the pixel correlation using pixel values from neighboring blocks at a low cost. Residuals after intra prediction are still coded by block transform coding.

Recently, deep-learning-based compression methods have attracted a lot of attention due to their superior rate-distortion performance. Compared with the traditional codecs, learned based codec has the following characteristic:
1. Inter correlations: Traditional image codecs only explore correlation in the same image while learning-based image codecs can exploit correlation from other images (i.e., inter-image correlation).
2. Multi-scale representation: Traditional image codecs only capture the representation with variable block size while learning-based image codecs can exploit the multi-scale representation based on pooling. In other words, traditional image codecs primarily explore correlation at the block level while learning-based image codecs can exploit short, middle, and long-range correlations using the multi-scale representation.
3. Advanced loss functions: different loss functions can be easily designed in learning-based schemes to fit the human visual system (HVS) and attention can be introduced to the learning-based schemes conveniently.

To achieve low-complexity learning-based image coding, we propose a multi-grid multi-block-size vector quantization (MGBVQ) method based on [...]

    MCL Research on Point Cloud Object Retrieval and Pose Estimation

Object pose estimation is an important problem in 3D scene understanding. Given a 3D point cloud object, it tries to estimate the 6-DOF pose comprising of rotation and translation with respect to a chosen coordinate system. The pose information can then be used for downstream tasks such as object grasping, obstacle avoidance, path planning, etc. which are commonly encountered in Robotics. In a complete scene understanding system, pose estimation usually comes after a 3D detection algorithm has localized and classified the object.

The pose estimation problem is similar to the problem of point cloud object registration which has been previously studied at MCL. In particular, the R-PointHop [1] method was proposed which successfully registers a source point cloud with a template. In the most recent work, we present a method termed PCRP that modifies R-PointHop for object pose estimation when a similar template object is unavailable. PCRP assumes a gallery set of pre-aligned point cloud objects and reuses the R-PointHop features to retrieve a similar object from the gallery. To do so, the pointwise features obtained using R-PointHop are aggregated into a global feature vector for nearest neighbor retrieval using the Vector of Locally Aggregated Descriptors (VLAD) [2]. Then, the input object’s pose is estimated by registering it with the retrieved object.

Though point cloud retrieval is extensively studied in contexts like shape retrieval or place recognition, retrieval in presence of different object poses is less talked of. In this work we show how the similar object can be retrieved even in presence of different object poses. This is achieved due to the rotation invariant features learned by R-PointHop. Another improvement over R-PointHop is the replacement of conventional eight octant partitioning based point attributes with more [...]

