News

Professor Kuo Delivered Tencent Keynote Speech at VCIP 2020

MCL Director, Professor C.-C. Jay Kuo, gave an opening keynote at the IEEE International Conference on Visual Communications and Image Processing (VCIP) on December 2, 2020. The meeting would originally be held from December 1-4, 2020, in Macau. However, due to the COVID-19 pandemic, it became a virtual one. The keynote is titled with “Interpretable and Effective Learning for 3D Point Cloud Registration, Classification and Segmentation.” Here is the abstract:

“3D point cloud analysis and processing find numerous applications in computer-aided design, 3D printing, autonomous driving, etc. Most state-of-the-art point cloud processing methods are based on convolutional neural networks (CNNs). Although they outperform traditional methods in terms of accuracy, they demand heavy supervision and higher training complexity. Besides, they lack mathematical transparency. In this talk, I will present three interpretable and effective machine learning methods for 3D point cloud registration, classification and segmentation, respectively. First, an unsupervised registration method that extracts salient points for matching is presented. Second, an unambiguous way to order points sequentially in a point cloud set is developed. Then, their spatial coordinates can be treated as geometric attributes of 1D data array. This idea facilitates the classification task. Third, for the segmentation task, we show how to leverage prior knowledge on point clouds to derive an intuitive and effective segmentation method. Extensive experiments are conducted to demonstrate the performance of the three new methods. I will also provide performance benchmarking between these interpretable methods and deep learning methods.”

The keynote was well attended with many questions during the 10-minute Q&A session. Professor Kuo’s keynote was sponsored by Tencent and called the Tencent Keynote Speech.

By |December 7th, 2020|News|Comments Off on Professor Kuo Delivered Tencent Keynote Speech at VCIP 2020|

Happy Thanksgiving!

At this time of Thanksgiving celebration, hope everyone stay safe during the pandemic and have a good time with their beloved families or friends. It’s also a good time to take a rest to think back on our fulfillments this year and be thankful to those who support us during this hard time. Thanks to every MCL member for the collaboration and hard work throughout this year to keep the research activities ongoing smoothly!

Happy Thanksgiving!

 

Images credit to WallpaperAccess and Clipart Library.

By |November 26th, 2020|News|Comments Off on Happy Thanksgiving!|

MCL Research on Knowledge Graph

Knowledge graphs (KG) model human readable knowledge using entity and relation triples. One major branch of KG research is representation learning, in which we try to learn low dimensional embeddings for entity and relations. Simple arithmetic operations between embeddings of entities and relations can represent complex real world knowledge or even discover new ones. KGs are rapidly evolving with the enormous amount of new information generated everyday. Since it is infeasible to retrain KG embeddings whenever we encounter a new entity or relation, modeling unseen entities and relations remains a challenging task.

There are two main directions of research to handle unseen entities. One direction is to infer the embedding of new entities from its neighboring entities and relations that are observed during training. Researchers have either relied on Graph Neural Networks or designed specialized aggregation functions to collect the unseen nodes’ neighborhood information. The other path is to leverage feature information in entity nodes metadata. Specifically, entity name and descriptions are often available in textual format upon querying the KG. Recent advances in transformer language models have made it possible to extract high quality feature representation for contextual information after a minimal amount of fine tuning of the model. When transformer language models such as BERT are applied to extract entity representations, the model is capable of generating embedding for any entity with textual name or descriptions. As a result, the unseen entity problem is therefore resolved.

RotatE has been one of the most effective yet simple KG embedding models invented recently. In RotatE, entities and relations are models as complex vectors. Each element of the relation vector serves as an element-wise phase shifter that transforms source entity to target entity. We propose a specialized [...]

By |November 22nd, 2020|News|Comments Off on MCL Research on Knowledge Graph|

MCL Research on Natural Image Synthesis

Automatic new image synthesis based on a collection of sample images from the same class finds broad applications in computer graphics and computer vision. Examples include automatic synthesis of human faces, hand-written digits, etc.  On an abstract level, a generative model learns to resemble the probability distribution of data samples and generate new samples based on the learned model.  Research on generative models has attracted rich attention in the machine learning community for decades.

 

Image synthesis is challenging for two main reasons. First, it demands a sufficiently large number of images to define meaningful statistics for a target class.  Second, to generate new images of similar characteristics, one should find one or more effective representations of samples and process them with a proper mechanism. There is a resurge of interests in generative models due to the performance breakthrough achieved by deep learning (DL) technologies in the last 6-7 years. There are however concerns with DL-based generative models. Built upon multi-layer end-to-end optimization, the DL technology is essentially a nonconvex optimization problem. Because of the mathematical complexity associated with nonconvex optimization, DL-based solutions are a black box. Besides, the training of DL-based generative models demands a large amount of computational resource. We propose an explainable and effective generative model to address these concerns, named Successive Subspace Generative (SSG) model.

 

Subspaces of descending dimensions are successively constructed in a feedforward manner, which is called the embedding process. Through embedding, the sample distribution of the source and subsequent subspaces can be captured by embedding parameters and the sample distribution in the core. For generation, samples are first generated according to the learned distribution in the core. Then, they go from the core to the source by traversing the same [...]

By |November 16th, 2020|News|Comments Off on MCL Research on Natural Image Synthesis|

MCL Research on Video Object Tracking

The visual tracking problem has a long history and has diverse applications in video surveillance, smart traffic system, autonomous driving cars and so on. Deep learning methods have gradually dominated the online single object tracking field because of the superior tracking accuracy. However, they usually require training on tremendous labeled videos which are expensive and time-consuming to acquire.

We proposed an explainable self-supervised salient-point-based approach to track general objects in real time, by utilizing attention and features from both the spatial domain and  the temporal domain. There are two major parts in our tracking system: tracking adjacent frames by matching salient points which represent spatial attention, and utilizing temporal information storing in salient points across different frames to identify loss of object or appearance change. In both parts, the salient point plays an important role in capturing spatial-temporal information. Here the feature of a salient point comes from the concatenation of two hop layer features in two-stage channel-wise Saab. The first hop contains PCA information of local patches at high resolution, while the second hop works at a lower resolution with larger receptive field, thus naturally forming a multi-resolution feature extractor which help capture unusual patterns that we should pay more attention to during tracking.

We have got some preliminary results on the current framework. We evaluate our method on the long-term tracking benchmark TB-50 [1] where the used metrics include success plots and precision plots in one pass evaluation (OPE) mode. This dataset includes 50 video sequences and 29491 frames in total. Mean success rate indicates the average overlapping ratio between the prediction and the ground truth, while mean precision rate shows how close their centers are. The higher the two values are, the better [...]

By |November 8th, 2020|News|Comments Off on MCL Research on Video Object Tracking|

MCL Research on Spatial Attention

Object detection and recognition is critical to image understanding, and there has been a long competition between supervised and unsupervised approaches in visual attention extraction. We are interested in an unsupervised approach and our method contains two main complimentary parts: Spectral Clustering Segmentation and Contour Detection.

Spectral Clustering has been a mature method for image segmentation, during which images are viewed as graph. For a standard spectral clustering pipeline, usually with each pixel as a vertex, a pixelwise affinity matrix is calculated from the  graph, then the Laplacian matrix of the affinity matrix, and with predefined number of clusters K, Kmeans clustering is conducted with the first K smallest eigenvectors of the Laplacian matrix to give the final segmentation results. In our current method, Pointhop features are adapted instead of the biological features like colors or textures to construct the graph for input image, which is the core contribution to the progress. For each input image, Pointhop features are extracted with channel-wise Saab, and K-neighbors Graph is constructed with the feature map, then combined with the following standard spectral clustering process. To evaluate Segments from Spectral Clustering on Pixelhop, Contour Detection is introduced as complementary middle level features. Here, structure edge [1] detection results are used for contour detection, for each segment from the spectral clustering, the largest closed contours within the segment are evaluated by heuristic rules to check whether a reasonable object or not.

During this process, most objects proposed are parts of a main object, e.g. eyes, face, hand, harms of a human, then during the post process adjacent objects proposed are merged to construct bigger objects, and a full Rectangle Tree of Objects can be constructed for each input image.

 

By Hongyu Fu

By |November 3rd, 2020|News|Comments Off on MCL Research on Spatial Attention|

MCL Research on Image Super-resolution

Image super-resolution (SR) is a classic image reconstruction problem in computer vision (CV), which aims at recovering a high-resolution image from a low-resolution image. As a type of supervised generative problem, image SR attracts wide attention due to its strong connection with other CV topics, such as object recognition, object alignment, texture synthesis and so on. Besides, it has extensive applications in real world, for example, medical diagnosis, remote sensing, biometric information identification, etc.

For the state-of-the-art approaches for SR, typically there are two mainstreams: 1) example-based learning methods, and 2) Deep Learning (CNN-based) methods. Example-based methods either exploit external low-high resolution exemplar pairs [1], or learn internal similarity of the same image with different resolution scales [2]. However, features used in example-based methods are usually traditional gradient-related or just handcraft, which may affect model performance. While CNN-based SR methods (e.g. SRCNN [3]) does not really distinguish between feature extraction and decision making. Lots of basic CNN models/blocks are applied to SR problem, e.g. GAN, residual learning, attention network, and provide superior SR results. Nevertheless, the non-explainable process and exhaustive training cost are serious drawbacks of CNN-based methods.

By taking advantage of reasonable feature extraction [4], we utilize spatial-spectral compatible cw-Saab features to express exemplar pairs. In addition, we formulate a Successive-Subspace-Learning-based (SSL-based) method to gradually partition data into subspaces by feature statistics, and apply regression in each subspace for better local approximation. By visualization the samples in representative subspaces, we find obvious sample similarity in pixel domain. This demonstrates the efficiency of our method in splitting samples into subspaces with semantic meaning. In the future, we aim at providing such a SSL-based explainable method with high efficiency for SR problem.

—  By Wei Wang

 

Reference:

[1] Timofte, Radu, [...]

By |October 25th, 2020|News|Comments Off on MCL Research on Image Super-resolution|

MCL Research on Point Cloud Segmentation

Processing and analysis of 3D Point clouds are challenging since the 3D spatial coordinates of points are irregular so that 3D points cannot be properly ordered to be fed into deep neural networks (DNNs). To deal with the order problem, a certain transformation is needed in the deep learning pipeline. Transformation of a point cloud into another form often leads to information loss. Several DNNs have been designed for point cloud classification and segmentation in recent years. They address the point order problem and reach impressive performance in tasks such as classification, segmentation, registration, object detection, etc. However, DNNs rely on expensive labeled data. Furthermore, due to the end-to-end optimization, deep features are learned iteratively via backpropagation. To save both labeling and computational costs, it is desired to obtain features in an unsupervised and feedforward one-pass manner.

Unsupervised or self-supervised feature learning for 3D point clouds was investigated. Although no labels are needed, the learned features are not as powerful as the supervised one with degraded performance. Recently, two light-weight point cloud classification methods, PointHop [1] and PointHop++ [2], were proposed. Both of them have an unsupervised feature learning module, and their performance is comparable with state-of-the-art deep learning methods.

By generalizing the PointHop, we propose a new solution for joint point cloud classification and part segmentation here. Our main contribution is the development of an unsupervised feedforward feature (UFF) learning system [3] with an encoder-decoder architecture. UFF exploits the statistical correlation between points in a point cloud set to learn shape and point features in a one-pass feedforward manner. It obtains the global shape features with an encoder and the local point features using the encoder-decoder cascade. The shape/point features are then fed into classifiers [...]

By |October 18th, 2020|News|Comments Off on MCL Research on Point Cloud Segmentation|

MCL Research on Point Cloud Registration

Point cloud registration refers to the process of aligning two point clouds. The two point clouds to be aligned are commonly called source and target. The goal is to find a spatial transformation (3D rotation and translation) that needs to be applied to the source to optimally align it with the target.  Registration has become popular with the proliferation of 3D scanning devices like LiDAR and their applications in autonomous driving, robotics, graphics, mapping, etc.

Point clouds need to be registered in order to merge data from different sensors to obtain a globally consistent view, mapping a new observation to known data, etc. Registration is challenging due to several reasons. The source and the target point clouds may have different sampling densities and different number of points. Point clouds may contain outliers and/or be corrupted by noise. Sometimes, only partial views are available.

The problem of registration (or alignment) has been studied for a long while. Prior to point cloud processing, the focus has been on aligning lines, parametric curves and surfaces. The classical Iterative Closest Point (ICP) algorithm alternates between finding corresponding points and estimating the optimal rotation and translation. ICP just uses the spatial coordinates of points to establish point correspondences. More recently there has been a trend to use deep learning, feature based methods for registration. Two such popular methods include PointNetLK and Deep Closest Point (DCP). PointNetLK and DCP treat registration as a supervised learning problem and train end-to-end networks using deep learning. The supervision is in terms of class labels and ground truth rotation matrix and translation vector. We propose a method called ‘Salient Points Analysis (SPA)’ [1] for registration.  In contrast with the recent deep learning methods, our SPA method [...]

By |October 11th, 2020|News|Comments Off on MCL Research on Point Cloud Registration|

MCL Research on Texture Synthesis

Automatic synthesis of visually pleasant texture that resembles exemplary texture finds applications in computer graphics. We have witnessed amazing quality improvement of synthesized texture in the last 5-6 years due to the resurgence of neural networks. Texture synthesis based on deep learning (DL), such as convolutional neural networks (CNNs) and generative adversarial networks (GANs), yield visually pleasant results. DL-based methods learn transform kernels from numerous training data through end-to-end optimization.  However, these methods have two main shortcomings: 1) lack of mathematical transparency and 2) higher training and inference complexity.

To address these shortcomings, we investigate a non-parametric and interpretable texture synthesis method, called NITES, in this work. NITES is mathematically transparent and efficient in training and inference.  NITES consists of three steps. First, it analyzes the texture patches (as training samples) which are cropped from the input exemplary texture image to obtain its joint spatial-spectral representations. Second, the probabilistic distributions of training samples in the joint spatial-spectral spaces are characterized. The sample distribution in the core subspace was carefully studied, which allows us to build a core subspace generation model. Furthermore, a successive subspace generation model was developed to build a higher-dimensional subspace based on a lower-dimensional subspace. Finally, new texture images are generated by mimicking probabilities and/or conditional probabilities of the source texture patches. In particular, we adopt a data-driven transform, known as the channel-wise (c/w) Saab trans-form, which provides a powerful representation in the joint spatial-spectral space. The c/w Saab transform is derived from the successive subspace learning (SSL) theory.

Experimental results show the superior quality of generated texture images and efficiency of the proposed NITES method in terms of both training and inference time. It can generate visually pleasant texture images effectively, including [...]

By |October 4th, 2020|News, Research|Comments Off on MCL Research on Texture Synthesis|