Research

MCL Research on Small Neural Netwrok

Deep learning has shown great capabilities in many applications. Many works have proposed different architectures to improve the accuracy. However, such improvement may come at a cost of increased time and memory complexity. Time and memory complexity can be important to some applications such as mobile and embedded applications. For these applications, small neural network design can be helpful. Small neural networks aim to reduce the network size while maintaining good performance. Some examples of small neural networks include SqueezeNet [1], MobileNet [2], ShuffleNet [3].

Despite the success of small neural networks, the reason why such networks can achieve good performance while significantly reducing the size has not been studied. In our research, we aim to quantitatively justify the design of small neural networks. In particular, we currently focus on the design of SqueezeNet [1].  SqueezeNet significantly reduces the number of network parameters while maintaining comparable performance by

Replacing some of the 3×3 filters with 1×1 filters. Since each 3×3 filter has 9 weights while a 1×1 filter has only 1 weight, we can greatly reduce the number of parameters by using 1×1 filters in place of 3×3 filters.
Reduce the number of input channels to 3×3 filters. This significantly reduces the number of parameters for the 3×3 filters.
Activation maps are downsampled late in the network. This is motivated by the intuition that larger activation maps may improve accuracy.

A key module of SqueezeNet is the Fire module. A Fire module consists of a squeeze layer and a subsequent expand layer. The squeeze layer reduces the number of input channels to the 3×3 filters in the expand layer. In our work, we use some metrics and visualization techniques to analyze the role of [...]

By |May 3rd, 2020|Computer Vision and Scene Analysis, News, Research|Comments Off on MCL Research on Small Neural Netwrok|

MCL Research on Source-Distribution-Aimed Generative Model

There are typically two types of statistical models in mechine learning, discriminative models and generative models. Different from discriminative models that aim at drawing decision boundaries, generative models target at modeling the data distribution in the whole space. Generative models tackle a more difficult task than discriminative model because it needs to model complicated distributions. For example, generative models should capture correlations such as “Things look like boats are likely to appear near things that look like water” while discriminative model differentiates “boat” from “not boat”.

Image generative models have become popular in recent years since Generative Adversarial Network (GANs), can generate realistic natural images. They, however, have no clear relationship to probability distributions and suffer from difficult training process and mode dropping problem. Although difficult training process and mode dropping problems may be alleviated by using different loss functions [1], the underlying relationship to probability distributions remains vague in GANs. It encourages us to develop a SOurce-Distribution-Aimed (SODA) generative model that aims at providing clear probability distribution functions to describe data distribution.
There are two main modules in our SODA generative model. One is finding proper source data representations and the other is determining the source data distribution in each representation. One proper representation for source data is joint spatial-spectral representation proposed by Kuo, et.al. [2, 3]. By transforming between spectral domain and spatial domain, a rich set of spectral and spatial representations can be obtained. Spectral representations are vectors of Saab coefficients while spatial representations are pixels in an image or Saab coefficients that are arranged based on their pixel order in spatial domain. Spectral representation at the last stage give a global view of an image while the spatial representations describe details in [...]

By |April 27th, 2020|Computer Vision and Scene Analysis, News, Research|Comments Off on MCL Research on Source-Distribution-Aimed Generative Model|

MCL Research on Image Super-resolution

Image super-resolution (SR) is a classic problem in computer vision (CV), which aims at recovering a high-resolution image from a low-resolution image. As a type of supervised generative problem, image SR attracts wide attention due to its strong connection with other CV topics, such as object recognition, object alignment, texture synthesis and so on. Besides, it has extensive applications in real world, for example, medical diagnosis, remote sensing, biometric information identification, etc.

For the state-of-the-art approaches for SR, typically there are two mainstreams: 1) example-based learning methods, and 2) Deep Learning (CNN-based) methods. Example-based methods either exploit external low-high resolution exemplar pairs [1], or learn internal similarity of the same image with different resolution scales [2]. In order to tackle model overfitting and generativity, some dictionary strategies are normally applied for encoding (e.g. Sparse coding, SC). However, features used in example-based methods are usually traditional gradient-related or just handcraft, which may affect model efficiency. While CNN-based SR methods (e.g. SRCNN [3]) does not really distinguish between feature extraction and decision making. Lots of basic CNN models/blocks are applied to SR problem, e.g. GAN, residual learning, attention network, and provide superior SR results. Nevertheless, the non-explainable process and exhaustive training cost are serious drawbacks of CNN-based methods.

By taking advantage of reasonable feature extraction [4], we utilize spatial-spectral compatible features to express exemplar pairs. In addition, we formulate a Successive-Subspace-Learning-based (SSL-based) method to partition data into subspaces by feature statistics, and apply regression in each subspace for better local approximation. Moreover, some adaptation is also manipulated for better data fitting. In the future, we aim at providing such a SSL-based explainable method with high efficiency for SR problem.

— By Wei Wang

 

Reference:

[1] Timofte, Radu, Vincent De Smet, and [...]

By |April 6th, 2020|Computer Vision and Scene Analysis, News, Research|Comments Off on MCL Research on Image Super-resolution|

MCL Research on Statistics-based Attention

Object detection and recognition has always been one of the key challenges in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Many approaches to the task have been implemented over multiple decades, including handcrafted features, machine learning algorithms and deep learning.

Recently breakthrough results have been made via Deep Learning with loads of labelled data for supervised training, while Deep Learning is notorious for lacking in scalability and interpretability. With the advantage of recent proposed scalable and interpretable Pixelhop model [1] and pixelhop++ [2], a new object detection pipeline can be proposed via Object Proposal-> Feature Extraction using SSL ->  Classification, thus a machine learning based Statistic-based attention is the key to generate object proposals.

Apart from Deep Learning, object proposal via visual saliency on single images such as DRFI [3] can be a good start for a machine learning based object proposal. To further take advantage of the statistics from training data, we formulate the weakly supervised object proposal problem into object search with features capable of matching, such as SURF [4]. In the future, we aim to improve these results by further exploration with a query-retrieval based saliency proposal method along with adapted bag of word features.

 

-By Hongyu Fu

[1] Yueru Chen and C-C Jay Kuo, “Pixelhop: A successive subspace learning (ssl) method for object recognition,” Journal of Visual Communication and Image Representation, p. 102749, 2020.

[2]Yueru Chen , Mozhdeh Rouhsedaghat , Suya You, Raghuveer Rao and C.-C. Jay Kuo, [...]

By |March 22nd, 2020|News, Research|Comments Off on MCL Research on Statistics-based Attention|

MCL Research on SSL-based Graph Learning

In this research, we proposed an effective and explainable graph vertex classification method, called GraphHop. Unlike the graph convolutional network (GCN) that is based on the end-to-end optimization, the GraphHop method generates an effective feature set for each vertex in an unsupervised and feedforward manner. GraphHop determines the local-to-global attributes of each vertex through successive one-hop information exchange, called the GraphHop unit. The GraphHop method is mathematically transparent. It can be explained using the recently developed “successive subspace learning (SSL)” framework [1, 2], which is mathematically transparent. Unlike GCN that is based on the end-to-end optimization of an objective function using back propagation, GraphHop generates an effective feature set for each vertex in an unsupervised and feedforward manner. Since no backpropagation is required in the feature learning process, the training complexity of GraphHop is significantly lower. By following the traditional pattern recognition paradigm, the GraphHop method decouples the feature extraction task and the classification task into two separate modules, where the feature extraction module is completely un-supervised. In the feature extraction module, GraphHop determines the local-to-global attributes of each vertex through successive one-hop information exchange, called the GraphHop unit. To control the rapid increase of the dimension of vertex attributes, the Saab transform is adopted for dimension reduction inside the GraphHop unit. Multiple Graph-Hop units are cascaded to obtain the higher order proximity information of a vertex. In the classification module, vertex attributes of multiple GraphHop units are extracted and ensembled for the classification task. There are many machine learning tools to be considered. In the experiments, we choose the random forest classifier because of its good performance and low complexity. To demonstrate the effectiveness of the GraphHop method, we apply it to three real-world [...]

By |March 17th, 2020|News, Research|Comments Off on MCL Research on SSL-based Graph Learning|

MCL Research on Efficient Text Classification

A novel text data dimension reduction technique, called the tree-structured multi-linear principal component analysis (TMPCA), is proposed in this work. Being different from traditional text dimension reduction methods that deal with the word-level representation, the TMPCA technique reduces the dimension of input sequences and sentences to simplify the following text classification tasks. It is shown mathematically and experimentally that the TMPCA tool demands much lower complexity (and, hence, less computing power) than the ordinary principal component analysis (PCA).  Furthermore, it is demonstrated by experimental results that the support vector machine (SVM) method applied to the TMPCA-processed data achieves commensurable or better performance than the state-of-the-art recurrent neural network (RNN) approach.

 

by Yuanhang Su

By |March 26th, 2018|News, Research|Comments Off on MCL Research on Efficient Text Classification|

MCL Research on Image Splicing Localization

With the advent of Web 2.0 and ubiquitous adoption of low-cost and high-resolution digital cameras, users upload and share images on a daily basis. This trend of public image distribution and access to user-friendly editing software such as Photoshop and GIMP has made image forgery a serious issue. Splicing is one of the most common types of image forgery. It manipulates images by copying a region from one image (i.e., the donor image) and pasting it onto another image (i.e., the host image). Forgers often use splicing to give a false impression that there is an additional object present in the image, or to remove an object from the image. Image splicing can be potentially used in generating false propaganda for political purposes. For example, during the 2004 U.S. presidential election campaign, an image that showed John Kerry and Jane Fonda speaking together at an anti-Vietnam war protest was released and circulated. It was discovered later that this was a spliced image, and was created for political purposes. The spliced image and the two original authentic images that were used to create the spliced image can be seen above.

 

Many of the current splicing detection algorithms only deduce whether a given image has been spliced and do not attempt to localize the spliced area. Relatively few algorithms attempt to tackle the splicing localization problem, which refers to the problem of determining which pixels in an image have been manipulated as a result of a splicing operation.

 

Ronald Salloum and Professor Jay Kuo are currently working on an image splicing localization research project. They are exploring the use of deep learning and data-driven techniques to develop an effective solution to the problem of image splicing localization. They [...]

By |February 19th, 2018|News, Research|Comments Off on MCL Research on Image Splicing Localization|

MCL Research on Unsupervised Video Segmentation

We propose a method for unsupervised video object segmentation by transferring the knowledge encapsulated in image-based instance embedding networks. The instance embedding network produces an embedding vector for each pixel that enables identifying all pixels belonging to the same object. Though trained on static images, the instance embeddings are stable over consecutive video frames, which allow us to link objects together over time. Thus, we adapt the instance networks trained on static images to video object segmentation and incorporate the embeddings with objectness and optical flow features, without model retraining or online fine-tuning. The proposed method outperforms state-of-the-art unsupervised segmentation methods in the DAVIS dataset and the FBMS dataset.

The main contributions include

– A new strategy for adapting instance segmentation models trained on static images to videos. Notably, this strategy performs well on video datasets without requiring any video object segmentation annotations.

– Proposal of novel criteria for selecting a foreground object without supervision, based on semantic score and motion features over a track.

– Insights into the stability of instance segmentation embeddings over time.

 

By Siyang Li

By |February 12th, 2018|News, Research|Comments Off on MCL Research on Unsupervised Video Segmentation|

Image characterization and categorization based on learning of visual attention

Author: Jia He, Xiang Fu, Shangwen Li, Chang-Su Kim, and C.-C. Jay Kuo

Visual attention of an image, known as saliency of the image, is defined as the regions and contents of the image that attract human eyes’ attention, such as regions with high-contrast, bright luminance, vivid color, clear scene structure and so on, or can be the semantic objects that human expect to see. Our research is to learn the visual attention of the image database, and then develop image characterization and classification algorithms according to the learned visual attention features. These algorithms will be applied into image compression, retargeting, annotation, segmentation, image retrieval, etc.

Recently, the image saliency has been widely studied. However, most work focuses on extracting the salience map of the image using a bottom-up context computation framework [1~5]. The saliency of the image does not always match exactly the visual attention of human, since human tend to “be attracted” by things of their particular interests. To bridge the gap, the learning of visual attention should combine both bottom-up and top-down frameworks. To achieve this goal, we are building a hierarchical human perception tree and learning the image visual attentions with detailed image characteristics, including the salient region’s appearance, semantics, attention priority and intensity. And then the image classification will be based on the content of the saliency area and its saliency intensity. Our system will capture not only the locations of visual attention regions in an image but also estimate their priorities and intensities.

Building a hierarchical human perceptual tree for visual attention learning will be challenging because of its complication, and little work has been done on this modeling. We aim to model the perceptual tree as close as possible [...]

By |November 21st, 2013|Computer Vision and Scene Analysis|Comments Off on Image characterization and categorization based on learning of visual attention|

Hierarchical Bag-of-Words Model for Joint Muli-View Object Representation and Classification

Author: Xiang Fu, Sanjay Purushotham, Daru Xu, and C.-C. Jay Kuo

Rapid development of video sharing over the Internet creates a large number of videos every day. It is an essential task to organize or classify tons of Internet images or videos automatically online, which will be mostly helpful to the search of useful videos in the future, especially in the applications of video surveillance, image/video retrieval, etc.

One classic method to categorize videos for human is based on which makes content-based video analysis a hot topic. An object is undoubtedly the most significant component to represent the video content. Object recognition and classification plays a significant role for intelligent information processing.

The traditional tasks of object recognition and classification include two parts. One is to identify a particular object in an image from an unknown viewpoint given a few views of that object for training, which is called “multi-view specific object recognition”. Later on, researchers attempt to get the internal relation of object classes from one specific view, which develops to another task called “single-view object classification”. In this case, the object class diversity in appearance, shape, or color should be taken into consideration. These variations increase the difficulty in classification. Over the last decade, many researchers have solved the last two tasks using a concept called intra-class similarity. To further reduce the semantic gap between machine and human, the problem of “multi-view object classification” needs to be well studied.

As shown in Fig.1, there are three elements to define a view: angle, scale (distance), and height, which form the view sphere. Although the viewpoint as well as intra-class variations exist as illustrated in Fig.2, some common features can still be found for one object class by [...]

By |November 21st, 2013|Computer Vision and Scene Analysis|Comments Off on Hierarchical Bag-of-Words Model for Joint Muli-View Object Representation and Classification|