USC Media Communications Lab

Permalink Gallery
MCL Research on Point-Cloud-based 3D Scene Flow Estimation

MCL Research on Point-Cloud-based 3D Scene Flow Estimation

3D scene flow aims at finding the point-wise 3D displacement between consecutive point cloud scans. It finds applications in areas such as dynamic scene segmentation and may also guide inter-prediction in compression of dynamically acquired point clouds. We propose a green and interpretable 3D scene flow estimation method for the autonomous driving scenario and name it “PointFlowHop” [1]. We decompose our solution into vehicle ego-motion and object motion components.
The vehicle ego-motion is first compensated using the GreenPCO method which was recently proposed for the task of point cloud odometry estimation. Then, we divide the scene points into two classes – static and moving. The static points do not have any motion and can be assigned only the ego-motion component. The motion of the moving points is analyzed later. For classification, we use a lightweight XGBoost classifier with a 5-dimensional shape and motion feature as the input. Later, moving points are grouped into moving objects using DBSCAN clustering algorithm. Furthermore, the moving objects from the two point clouds are associated using the nearest centroids algorithm. An additional refinement step ensures reclassification of previously misclassified moving points. A rigid flow model is established for each object. Finally, the flow in local regions is refined assuming local scene rigidity.
PointFlowHop method adopts the green learning (GL) paradigm. The task-agnostic nature of the feature learning process in GL enables scene flow estimation through seamless modification and extension of prior related GL methods like R-PointHop and GreenPCO. Furthermore, a large number of operations in PointFlowHop are not performed during training. The ego-motion and object-level motion is optimized in inference only. Similarly, the moving points are grouped into objects only during inference. This makes the training process much faster [...]

By Wei Wang|March 28th, 2023|News, Research|Comments Off|

Permalink Gallery
MCL Research on Mask-Guided Image Synthesis Presented at AAAI-23

MCL Research on Mask-Guided Image Synthesis Presented at AAAI-23

Dr. Rouhsedaghat, a MCL alumna graduated last Summer, recently presented a work[1] on image synthesis related to her PhD thesis in AAAI-23. Here is the presentation summary from Dr. Rouhsedaghat:

We offer a method for one-shot mask-guided image synthesis that allows controlling manipulations of a single image by inverting a quasi-robust classifier equipped with strong regularizers. Our proposed method, entitled MAGIC , leverages structured gradients from a pre-trained quasi-robust classifier to better preserve the input semantics while preserving its classification accuracy, thereby guaranteeing credibility in the synthesis. Unlike current methods that use complex primitives to supervise the process or use attention maps as a weak supervisory signal, MAGIC aggregates gradients over the input, driven by a guide binary mask that enforces a strong, spatial prior. MAGIC implements a series of manipulations with a single framework achieving shape and location control, intense non-rigid shape deformations, and copy/move operations in the presence of repeating objects and gives users firm control over the synthesis by requiring to simply specify binary guide masks. Our study and findings are supported by various qualitative comparisons with the state-of-the-art on the same images sampled from ImageNet and quantitative analysis using machine perception along with a user survey of 100+ participants that endorse our synthesis quality.

[1]Rouhsedaghat, Mozhdeh, et al. “MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust Classifier.” arXiv preprint arXiv:2209.11549 (2022).

By Wei Wang|March 5th, 2023|News, Research|Comments Off|

Permalink Gallery
MCL Research on Texture Synthesis

MCL Research on Texture Synthesis

Automatic synthesis of visually pleasant texture that resembles exemplary texture finds applications in computer graphics. We have witnessed amazing quality improvement of synthesized texture in the last 5-6 years due to the resurgence of neural networks. Texture synthesis based on deep learning (DL), such as convolutional neural networks (CNNs) and generative adversarial networks (GANs), yield visually pleasant results. DL-based methods learn transform kernels from numerous training data through end-to-end optimization. However, these methods have two main shortcomings: 1) lack of mathematical transparency and 2) higher training and inference complexity.

To address these shortcomings, we investigate a non-parametric and interpretable texture synthesis method, called NITES, in this work. NITES is mathematically transparent and efficient in training and inference. NITES consists of three steps. First, it analyzes the texture patches (as training samples) which are cropped from the input exemplary texture image to obtain its joint spatial-spectral representations. Second, the probabilistic distributions of training samples in the joint spatial-spectral spaces are characterized. The sample distribution in the core subspace was carefully studied, which allows us to build a core subspace generation model. Furthermore, a successive subspace generation model was developed to build a higher-dimensional subspace based on a lower-dimensional subspace. Finally, new texture images are generated by mimicking probabilities and/or conditional probabilities of the source texture patches. In particular, we adopt a data-driven transform, known as the channel-wise (c/w) Saab trans-form, which provides a powerful representation in the joint spatial-spectral space. The c/w Saab transform is derived from the successive subspace learning (SSL) theory.

Experimental results show the superior quality of generated texture images and efficiency of the proposed NITES method in terms of both training and inference time. It can generate visually pleasant texture images effectively, including [...]

By Xuejing Lei|October 4th, 2020|News, Research|Comments Off|

Permalink Gallery
MCL Research on Small Neural Netwrok

MCL Research on Small Neural Netwrok

Deep learning has shown great capabilities in many applications. Many works have proposed different architectures to improve the accuracy. However, such improvement may come at a cost of increased time and memory complexity. Time and memory complexity can be important to some applications such as mobile and embedded applications. For these applications, small neural network design can be helpful. Small neural networks aim to reduce the network size while maintaining good performance. Some examples of small neural networks include SqueezeNet [1], MobileNet [2], ShuffleNet [3].

Despite the success of small neural networks, the reason why such networks can achieve good performance while significantly reducing the size has not been studied. In our research, we aim to quantitatively justify the design of small neural networks. In particular, we currently focus on the design of SqueezeNet [1]. SqueezeNet significantly reduces the number of network parameters while maintaining comparable performance by

Replacing some of the 3×3 filters with 1×1 filters. Since each 3×3 filter has 9 weights while a 1×1 filter has only 1 weight, we can greatly reduce the number of parameters by using 1×1 filters in place of 3×3 filters.
Reduce the number of input channels to 3×3 filters. This significantly reduces the number of parameters for the 3×3 filters.
Activation maps are downsampled late in the network. This is motivated by the intuition that larger activation maps may improve accuracy.

A key module of SqueezeNet is the Fire module. A Fire module consists of a squeeze layer and a subsequent expand layer. The squeeze layer reduces the number of input channels to the 3×3 filters in the expand layer. In our work, we use some metrics and visualization techniques to analyze the role of [...]

By Xuejing Lei|May 3rd, 2020|Computer Vision and Scene Analysis, News, Research|Comments Off|

Permalink Gallery
MCL Research on Source-Distribution-Aimed Generative Model

MCL Research on Source-Distribution-Aimed Generative Model

There are typically two types of statistical models in mechine learning, discriminative models and generative models. Different from discriminative models that aim at drawing decision boundaries, generative models target at modeling the data distribution in the whole space. Generative models tackle a more difficult task than discriminative model because it needs to model complicated distributions. For example, generative models should capture correlations such as “Things look like boats are likely to appear near things that look like water” while discriminative model differentiates “boat” from “not boat”.

Image generative models have become popular in recent years since Generative Adversarial Network (GANs), can generate realistic natural images. They, however, have no clear relationship to probability distributions and suffer from difficult training process and mode dropping problem. Although difficult training process and mode dropping problems may be alleviated by using different loss functions [1], the underlying relationship to probability distributions remains vague in GANs. It encourages us to develop a SOurce-Distribution-Aimed (SODA) generative model that aims at providing clear probability distribution functions to describe data distribution.
There are two main modules in our SODA generative model. One is finding proper source data representations and the other is determining the source data distribution in each representation. One proper representation for source data is joint spatial-spectral representation proposed by Kuo, et.al. [2, 3]. By transforming between spectral domain and spatial domain, a rich set of spectral and spatial representations can be obtained. Spectral representations are vectors of Saab coefficients while spatial representations are pixels in an image or Saab coefficients that are arranged based on their pixel order in spatial domain. Spectral representation at the last stage give a global view of an image while the spatial representations describe details in [...]

By Xuejing Lei|April 27th, 2020|Computer Vision and Scene Analysis, News, Research|Comments Off|

Permalink Gallery
MCL Research on Image Super-resolution

MCL Research on Image Super-resolution

Image super-resolution (SR) is a classic problem in computer vision (CV), which aims at recovering a high-resolution image from a low-resolution image. As a type of supervised generative problem, image SR attracts wide attention due to its strong connection with other CV topics, such as object recognition, object alignment, texture synthesis and so on. Besides, it has extensive applications in real world, for example, medical diagnosis, remote sensing, biometric information identification, etc.

For the state-of-the-art approaches for SR, typically there are two mainstreams: 1) example-based learning methods, and 2) Deep Learning (CNN-based) methods. Example-based methods either exploit external low-high resolution exemplar pairs [1], or learn internal similarity of the same image with different resolution scales [2]. In order to tackle model overfitting and generativity, some dictionary strategies are normally applied for encoding (e.g. Sparse coding, SC). However, features used in example-based methods are usually traditional gradient-related or just handcraft, which may affect model efficiency. While CNN-based SR methods (e.g. SRCNN [3]) does not really distinguish between feature extraction and decision making. Lots of basic CNN models/blocks are applied to SR problem, e.g. GAN, residual learning, attention network, and provide superior SR results. Nevertheless, the non-explainable process and exhaustive training cost are serious drawbacks of CNN-based methods.

By taking advantage of reasonable feature extraction [4], we utilize spatial-spectral compatible features to express exemplar pairs. In addition, we formulate a Successive-Subspace-Learning-based (SSL-based) method to partition data into subspaces by feature statistics, and apply regression in each subspace for better local approximation. Moreover, some adaptation is also manipulated for better data fitting. In the future, we aim at providing such a SSL-based explainable method with high efficiency for SR problem.

— By Wei Wang

Reference:

[1] Timofte, Radu, Vincent De Smet, and [...]

By Wei Wang|April 6th, 2020|Computer Vision and Scene Analysis, News, Research|Comments Off|

Permalink Gallery
MCL Research on Statistics-based Attention

MCL Research on Statistics-based Attention

Object detection and recognition has always been one of the key challenges in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Many approaches to the task have been implemented over multiple decades, including handcrafted features, machine learning algorithms and deep learning.

Recently breakthrough results have been made via Deep Learning with loads of labelled data for supervised training, while Deep Learning is notorious for lacking in scalability and interpretability. With the advantage of recent proposed scalable and interpretable Pixelhop model [1] and pixelhop++ [2], a new object detection pipeline can be proposed via Object Proposal-> Feature Extraction using SSL -> Classification, thus a machine learning based Statistic-based attention is the key to generate object proposals.

Apart from Deep Learning, object proposal via visual saliency on single images such as DRFI [3] can be a good start for a machine learning based object proposal. To further take advantage of the statistics from training data, we formulate the weakly supervised object proposal problem into object search with features capable of matching, such as SURF [4]. In the future, we aim to improve these results by further exploration with a query-retrieval based saliency proposal method along with adapted bag of word features.

-By Hongyu Fu

[1] Yueru Chen and C-C Jay Kuo, “Pixelhop: A successive subspace learning (ssl) method for object recognition,” Journal of Visual Communication and Image Representation, p. 102749, 2020.

[2]Yueru Chen , Mozhdeh Rouhsedaghat , Suya You, Raghuveer Rao and C.-C. Jay Kuo, [...]

By Wei Wang|March 22nd, 2020|News, Research|Comments Off|

Permalink Gallery
MCL Research on SSL-based Graph Learning

MCL Research on SSL-based Graph Learning

In this research, we proposed an effective and explainable graph vertex classification method, called GraphHop. Unlike the graph convolutional network (GCN) that is based on the end-to-end optimization, the GraphHop method generates an effective feature set for each vertex in an unsupervised and feedforward manner. GraphHop determines the local-to-global attributes of each vertex through successive one-hop information exchange, called the GraphHop unit. The GraphHop method is mathematically transparent. It can be explained using the recently developed “successive subspace learning (SSL)” framework [1, 2], which is mathematically transparent. Unlike GCN that is based on the end-to-end optimization of an objective function using back propagation, GraphHop generates an effective feature set for each vertex in an unsupervised and feedforward manner. Since no backpropagation is required in the feature learning process, the training complexity of GraphHop is significantly lower. By following the traditional pattern recognition paradigm, the GraphHop method decouples the feature extraction task and the classification task into two separate modules, where the feature extraction module is completely un-supervised. In the feature extraction module, GraphHop determines the local-to-global attributes of each vertex through successive one-hop information exchange, called the GraphHop unit. To control the rapid increase of the dimension of vertex attributes, the Saab transform is adopted for dimension reduction inside the GraphHop unit. Multiple Graph-Hop units are cascaded to obtain the higher order proximity information of a vertex. In the classification module, vertex attributes of multiple GraphHop units are extracted and ensembled for the classification task. There are many machine learning tools to be considered. In the experiments, we choose the random forest classifier because of its good performance and low complexity. To demonstrate the effectiveness of the GraphHop method, we apply it to three real-world [...]

By Wei Wang|March 17th, 2020|News, Research|Comments Off|

Permalink Gallery
MCL Research on Efficient Text Classification

MCL Research on Efficient Text Classification

A novel text data dimension reduction technique, called the tree-structured multi-linear principal component analysis (TMPCA), is proposed in this work. Being different from traditional text dimension reduction methods that deal with the word-level representation, the TMPCA technique reduces the dimension of input sequences and sentences to simplify the following text classification tasks. It is shown mathematically and experimentally that the TMPCA tool demands much lower complexity (and, hence, less computing power) than the ordinary principal component analysis (PCA). Furthermore, it is demonstrated by experimental results that the support vector machine (SVM) method applied to the TMPCA-processed data achieves commensurable or better performance than the state-of-the-art recurrent neural network (RNN) approach.

by Yuanhang Su

By Xuejing Lei|March 26th, 2018|News, Research|Comments Off|

Permalink Gallery
MCL Research on Image Splicing Localization

MCL Research on Image Splicing Localization

With the advent of Web 2.0 and ubiquitous adoption of low-cost and high-resolution digital cameras, users upload and share images on a daily basis. This trend of public image distribution and access to user-friendly editing software such as Photoshop and GIMP has made image forgery a serious issue. Splicing is one of the most common types of image forgery. It manipulates images by copying a region from one image (i.e., the donor image) and pasting it onto another image (i.e., the host image). Forgers often use splicing to give a false impression that there is an additional object present in the image, or to remove an object from the image. Image splicing can be potentially used in generating false propaganda for political purposes. For example, during the 2004 U.S. presidential election campaign, an image that showed John Kerry and Jane Fonda speaking together at an anti-Vietnam war protest was released and circulated. It was discovered later that this was a spliced image, and was created for political purposes. The spliced image and the two original authentic images that were used to create the spliced image can be seen above.

Many of the current splicing detection algorithms only deduce whether a given image has been spliced and do not attempt to localize the spliced area. Relatively few algorithms attempt to tackle the splicing localization problem, which refers to the problem of determining which pixels in an image have been manipulated as a result of a splicing operation.

Ronald Salloum and Professor Jay Kuo are currently working on an image splicing localization research project. They are exploring the use of deep learning and data-driven techniques to develop an effective solution to the problem of image splicing localization. They [...]

By Xuejing Lei|February 19th, 2018|News, Research|Comments Off|

12 3 Next

Research