ZhiruoZhou

MCL Research on Data-Driven Image Compression

Block-based image coding is adopted by the JPEG standard, which has been widely used in the last three decades. The block-based discrete Cosine transform (DCT) and quantization matrices in YCbCr three color channels play key roles in JPEG. In this work, we propose a new image coding method, called DCST. It adopts data-driven colour transform and spatial transforms based on statistical properties of image pixels and machine learning. To match the data-driven forward transform, we propose a quantization table based on the human visual system (HVS). Furthermore, to efficiently compensate for the quantization error, a machine learning-based inverse transform is used. The performance of our new design is verified using the Kodak image dataset. The optimal inverse transformation can achieve 0.11-0.30dB over the standard JPEG over a wide range of quality factors. The whole pipeline outperforms JPEG with a gain of 0.5738 in the BD-PSNR (or a decrease of 9.5713 in the BD-rate) from 0.2 to 3bpp.

the colour input: previous standard use YCbCr as input which is ideal if viewed from statics while not optimal for every single image. In our case, we trained a PCA for every single image to perform color transformation which gives better de-correlate performance;
(2D)^2 PCA [1] transformation and corresponding quantization matrix designed based on the PCA kernel and HVS [2];
Machine learning inverse transform: which uses linear regression to compute the optimal inverse transform kernel for both color conversion and spatial to spectrum transform. This idea helps to estimate the quantization error and results from a better inverse result. Comparing with the previous method which compensates this error using probability mode during the post-processing stage. The ML-inverse transformation would take less time during the decoding [...]

By |April 18th, 2021|News|Comments Off on MCL Research on Data-Driven Image Compression|

MCL Research on Image Artifact Detection and Localization

Image anomaly detection and localization is a fundamental problem in pattern recognition and computer vision, with numerous applications in many areas, such as industrial manufacturing inspection, medical image diagnosis and even video surveillance analysis. The goal of image anomaly detection is to determine whether an input image contains an anomaly, and image anomaly localization is to locate the anomaly on the pixel level. Like most other anomaly detection problem, we formulate image anomaly detection as an unsupervised task, which means only normal images are available during model training. This is because anomalous examples are either too expensive or too few to model their distributions during the training, which also makes it an extremely challenging yet attracting problem.

To tackle this problem, we propose two method with both deep learning and successive subspace learning techniques.

We propose a new deep learning framework for unsupervised image anomaly detection and localization. Our model first utilizes an encoder to generate low-dimensional embeddings for local image patches, which are further fed into a density estimation network inspired by Gaussian Mixture Model. Given the low-dimensional patch embedding as input, density estimation network model the distribution of embedding like GMM clustering, and predict its cluster membership as output. Then, total probability of the give local patch could be computed and further used as a loss term to guide the learning process. Extensive experimental results show that the proposed method achieves very competitive performance compared with the state-of-the-art methods.
We are exploring to use successive subspace learning (SSL) to achieve a more efficient and interpretable method for image anomaly detection and localization. It first employees PixelHop++[1] as feature extractor, in which each hop could encode feature with different receptive field. Then, we [...]

By |April 11th, 2021|News|Comments Off on MCL Research on Image Artifact Detection and Localization|
  • Permalink Gallery

    MCL Research on Semantic Scene Segmentation Based on Multiple Sensor Inputs

MCL Research on Semantic Scene Segmentation Based on Multiple Sensor Inputs

Semantic segmentation can help people identify and locate objects, which provide important road information for upper-level navigational tasks. Due to the rapid development of deep Convolutional Neural Networks (CNNs)[1], the performance of image segmentation models has been greatly improved and CNNs is widely used for this task. However, maintaining the performance under different conditions is a non-trivial task. In the dark, rain, or fog environment, the quality of RGB images will be greatly reduced while other sensors may still get fair results. Thus, our model combines the information of RGB image and depth map. When driving, we often encounter obstacles, like the trash can, barrier, rubble, stones, and cargos. Recognizing and avoiding them is very crucial for safety. To address this problem, we apply multi-dataset learning. In this way, our model can learn more classes from other data sets, including obstacles.

In our experiment, we fully evaluate the RFNet[2] with different datasets and methods to combine them. Regarding the framework of our model, the inputs from different datasets will pass through the resizing module. Then, the depth map and RGB image are sent to the network. Ground truth will go to Relabeling module. Multi-dataset learning strategy is applied to Resizing, Relabeling and Modified Softmax layer. Finally, by comparing the results of relabeled ground truth and prediction, we can obtain the intersection of union(IoU) and value of cross-entropy loss.

Our result shows that our models have excellent performances in the urban environment and blended environment. However, in the field environment, the depth map helps the model very slightly. We also proposed a new thrifty relabeling, which can improve the performance of the model without increasing the complexity of the network. Moreover, more datasets can help the model [...]

By |April 4th, 2021|News|Comments Off on MCL Research on Semantic Scene Segmentation Based on Multiple Sensor Inputs|

MCL Research on Image Calibration from Multiple Sensors

In order to get an accurate perception of surrounding environment in different tasks including autonomous driving, robot navigation, and sensor-driven situational awareness, abundant environment information is necessary. This information can be obtained from different types of multimodal sensors, such as LiDAR sensors, electro-optical/infrared (EO/IR) cameras, GPS/IMU. Before using the collected data, information fusion among these sensors is a critical topic. Specifically, people want to utilize color and shape information from camera and distance information from LiDAR sensors. In which task, the process of finding correspondent points between two sensors is essential. This procedure is called multimodal sensor calibration, in which we need to find the 6DoF extrinsic parameters between these two sensors.

In this work, we develop a new deep learning-driven technique for accurate calibration of LiDAR-Camera pair, which is completely data-driven, does not require any specific calibration targets or hardware assistants, and the entire processing is end to end and fully automatic. We utilize the advanced deep neural network to align accurately the LiDAR point cloud to the image, and regress 6DoF extrinsic calibration parameters. Geometric supervision and transformation supervision are employed to guide the learning process to maximize the consistency of input images and point clouds. Given input LiDAR-Camera pairs as training dataset, the system automatically learns meaningful features, infers modal cross-correlations, and estimates the accurate 6DoF rigid body transformation between the 3D LiDAR and 2D image in real-time.

Images in slides show the system overview and experiment results. In experiment results, the background is the correspondent RGB image. The transparent colormap is the depth map, from blue to red corresponding small to a large distance. The first row is the input RGB images. The second row is input depth maps, the third row [...]

By |March 28th, 2021|News|Comments Off on MCL Research on Image Calibration from Multiple Sensors|

Congratulations to Ruiyuan Lin for Passing Her Defense

Let us hear what she has to say about her defense and an abstract of her thesis.

Neural networks have been shown to be effective in many applications. To better explain the behaviors of the neural networks, we examine the properties of neural networks experimentally and analytically in this research.

In the first part, we conduct experiments on convolutional neural networks (CNNs) and observe the behaviors of the networks. We also propose some insight or conjectures for CNNs in this part. First, we demonstrate how the accuracy changes with the size of the convolutional layers. Second, we  develop a design to determine the size of the convolutional layers based on SSL. Third, as a case study, we analyze the SqueezeNet, which is able to achieve the same accuracy as AlexNet with 50x fewer parameters, by studying the evolution of cross-entropy values across layers and doing visualization. Fourth, we also propose some insight on  co-training based deep semi-supervised learning.

In the second part, we propose new angles to understand and interpret neural network. To understand the behaviors of multilayer perceptrons (MLPs) as classifiers,  we interpret MLPs as a generalization of a two-class LDA system so that it can handle an input composed by multiple Gaussian modalities belonging to multiple classes. An MLP design with two hidden layers that also specifies the filter weights is proposed. To understand the behaviors of multilayer perceptrons (MLPs) as a regressor we construct MLPs as a piecewise low-order polynomial approximator using a signal processing approach. The constructed MLP contains one input, one intermediate and one output layers. Its construction includes the specification of neuron numbers and all filter weights.  Through the construction, a one-to-one correspondence between the approximation of an MLP and that [...]

By |March 22nd, 2021|News|Comments Off on Congratulations to Ruiyuan Lin for Passing Her Defense|

Congratulations to Bin Wang for Passing His Defense

Let us hear what he has to say about his defense and an abstract of his thesis.

The success of machine learning algorithms depends heavily on data representation techniques. Natural language is one kind of unstructured data source which does not have a natural numerical representation as to its original form. In most natural language processing tasks, a distributed representation of text entities is necessary. In this dissertation, we investigate and propose representation learning techniques to learn embeddings for 1) words, 2) sentences and 3) knowledge graph entities.

We will first give an overview of our embedding research, including word embedding evaluation and enhancement, sentence embedding from both non-contextualized and contextualized word embeddings. Then, we will focus on two recent works: InductivE and DomainWE. InductivE is proposed to handle the inductive learning problem on commonsense knowledge graph completion task. The entity embeddings are directly computed from its textual descriptions and an iterative training framework is proposed to enhance unseen entities with more structural information. DomainWE aims to distill knowledge from large pre-trained language models and synthesis as static word embeddings. As an efficient and robust alternative to large pre-trained language models, DomainWE has a smaller model size which is particularly important for deployment purposes, yet demonstrates better performance comparing with generic word embeddings.

I would like to thank Professor Kuo who guides me step by step through the path from a fresh graduate to an academic researcher. Through the past years, I have learned a lot from Professor Kuo and our MCL members, which not only include academic skills but also life principles. One of the most important things for the Ph.D. study is to keep self-motivated and try your best to achieve your initial dreams. Finally, [...]

By |March 15th, 2021|News|Comments Off on Congratulations to Bin Wang for Passing His Defense|

Congratulations to Jiali Duan for Passing His Defense

Let us hear what he has to say about his defense and an abstract of his thesis.

Deep learning has brought impressive improvements in many fields, thanks to end-to-end data-driven optimization. However, people have little control over the system during training and limited understanding about the structure of knowledge being learned. In this thesis, we study theory and applications of adversarial and structured knowledge learning: 1) learning adversarial knowledge with human interaction or by incorporating human-in-the-loop; 2) learning structured knowledge by modelling contexts and users’ preferences via distance metric learning.

In the first part, we teach a robotic arm to learn robust manipulation grasps that can withstand perturbations, through end-to-end optimization with a human adversary. Specifically, we formulate the problem as a two-player game with incomplete information, played by a human and a robot, where the human’s goal is to minimize the reward the robot can get. We then extend this idea to improve the sample efficiency of deep reinforcement learning by incorporating human in the training loop. We presented a portable, interactive and parallel platform for human-agent curriculum learning experience.

In the second part, we present two works that address different aspects of structured representation learning. First, we proposed a self-training framework to improve distance metric learning. The challenge is the noise in pseudo labels, which prevents exploiting additional unlabeled data. Therefore, we introduced a new feature basis learning component for the teacher-student network, which better measures pairwise similarity and selects high confidence pairs. Second, we address image-attribute query, which allows a user to customize image-to-image retrieval by designating desired attributes in target images. We achieve this by adopting a composition module, which enforces object-attribute factorization and an attribute-set synthesis module to deal with sample insufficiency.

Looking back, Prof. Kuo is [...]

By |March 7th, 2021|News|Comments Off on Congratulations to Jiali Duan for Passing His Defense|

MCL Research on New Interpretation of MLP

Our work on new MLP interpretation includes:

Interpretable MLP design [1]:

A closed-form solution exists in two-class linear discriminant analysis (LDA), which discriminates two Gaussian-distributed classes in a multi-dimensional feature space. In this work, we interpret the multilayer perceptron (MLP) as a generalization of a two-class LDA system so that it can handle an input composed by multiple Gaussian modalities belonging to multiple classes. Besides input layer lin and output layer lout, the MLP of interest consists of two intermediate layers,  l1 and l2.  We propose a feedforward design that has three stages: 1) from lin to l1: half-space partitionings accomplished by multiple parallel LDAs, 2) from l1 to l2: subspace isolation where one Gaussian modality is represented by one neuron, 3) from l2 to lout: class-wise subspace mergence, where each Gaussian modality is connected to its target class. Through this process, we present an automatic MLP design that can specify the network architecture (i.e., the layer number and the neuron number at a layer) and all filter weights in a feedforward one-pass fashion.  This design can be generalized to an arbitrary distribution by leveraging the Gaussian mixture model (GMM). Experiments are conducted to compare the performance of the traditional backpropagation-based MLP (BP-MLP) and the new feedforward MLP (FF-MLP).

MLP as a piecewise low-order polynomial approximator [2]:

The construction of a multilayer perceptron (MLP) as a piecewise low-order polynomial approximator using a signal processing approach is presented in this work. The constructed MLP contains one input, one intermediate and one output layers. Its construction includes the specification of neuron numbers and all filter weights. Through the construction, a one-to-one correspondence between the approximation of an MLP and that of a piecewise low-order polynomial is established. Comparison [...]

By |January 17th, 2021|News|Comments Off on MCL Research on New Interpretation of MLP|

MCL Research on AI for Health Care

Research related to the future development of Health Care systems is always a significant endeavor, by touching many people lives. AI advancements in the last decade have given rise to new applications, with key aim to increase the automation level of different tasks, currently being carried out by experts. In particular, medical image analysis is a fast-growing area, having also been revolutionized by modern AI algorithms for visual content understanding. Magnetic Resonance Imaging (MRI) is widely used by radiologists in order to shed more light on patient’s health situation. It can provide useful cues to experts, thus assisting to take decisions about the appropriate treatment plan, maintaining also less discomfort for the patient and incurring less economical risks in the treatment process.

The question arises, how modern AI could contribute to automate the diagnosis process and provide a second and more objective assessment opinion to the experts. Many research ideas from the visual understanding area, adopt the deep learning (DL) paradigm, by training Deep Neural Networks (DNNs) to learn end-to-end representations for tumor classification, lesion areas detection, specific organ segmentation, survival prediction etc. Yet, one could identify some limitations on using DNNs in medical image analysis. It is well known that it is often hard to collect sufficient real samples for training DL models. Furthermore, decisions made by machines need to be transparent to physicians and especially be aware of the factors that led to those decisions, so that they are more trustworthy. DNNs are often perceived as “black-box” models, since their feature representations and decision paths are hard to be interpreted.

In MCL, we consider a new line of research on AI for medical image analysis, by adopting the Green Learning (GL) approach to address [...]

By |January 10th, 2021|News|Comments Off on MCL Research on AI for Health Care|

MCL Research on Scalable Weakly-Supervised Graph Learning

The success of deep learning and neural networks often comes at the price of a large number of labeled data. Weakly-supervised learning (WSL) is an important paradigm that leverages a large number of unlabeled data to address this limitation. The need for WSL has arisen in many machine learning problems and found wide applications in computer vision, natural language processing, and graph-based modeling, where getting labeled data is expensive and there exists a large amount of unlabeled data.

Among weakly-supervised graph learning methods, label propagation (LP) has demonstrated good adaptability, scalability, and efficiency for node classification. However, LP-based methods are limited in their capability of integrating multiple data modalities for effective learning. Due to the recent success of neural networks, there has been an effort of applying neural networks into graph-structured data. One pioneering technique, known as graph convolutional networks (GCNs), has achieved impressive node classification performance for citation networks. However, GCNs fail to exploit the label distribution in the graph structure and difficult to scale for large graphs.

In this work, we propose a scalable weakly-supervised node classification method on graph-structured data, called GraphHop, where the underlying graph contains attributes of all nodes but labels of few nodes. Our method is an iterative algorithm that overcomes the deficiencies in LP and GCNs. With proper initial label vector embeddings, each iteration contains two steps: 1) label aggregation and 2) label update. In Step 1, each node aggregates its neighbors’ label vectors obtained in the previous iteration. In Step 2, a new label vector is predicted for each node based on the label of the node itself and the aggregated label information obtained in Step 1. This iterative procedure exploits the neighborhood information and enables GraphHop to [...]

By |January 3rd, 2021|News|Comments Off on MCL Research on Scalable Weakly-Supervised Graph Learning|