
MCL Research on Image Steganalysis

In image forensics, steganography and steganalysis are like the two ends of the same coin. image steganography is a technique to conceal secret messages in the images by slightly modifying the pixel values. Corresponding to image steganography, steganalysis is the process to reveal the presence of the hidden message in images. Recently, steganalysis are focusing on defending content-adaptive steganographic schemes, for example WOW, HILL and S-UNIWARD, etc. Fig.1 [1] illustrates the modifications of cover image from different steganographic method.  Content-adaptive steganography is lean to do modifications on complex texture regions, which makes embedding traces less detectable for steganalyzers.

Traditionally, hand crafted features together with machine learning classifiers have good performance on steganalysis, such as Spatial Rich Model and its variants. After the emerging of neural networks, different CNN architectures are utilized in the steganalysis literature. Because of the important property that CNNs are able to extract complex statistical dependencies from high dimensional input and learn hierarchical representations, CNN-based features usually achieve better performance than traditional hand-crafted features. However, CNN based models are suffering from long training time, large model size and enormous consumption of computation resources.

We would like to utilize green learning methodology in steganalysis field, by incorporating Saab transform as feature extraction module in the future. Saab transform has shown its capability of extracting the high frequency representations in a feedforward way and preserving the light-weighted model size at the same time.



[1] Tang, Weixuan, et al. “Adaptive steganalysis based on embedding probabilities of pixels.” IEEE Transactions on Information Forensics and Security 11.4 (2015): 734-745.

[2] C.-C. J. Kuo and Y. Chen, “On data-driven saak transform,” Journal of Visual Communication and Image Representation, vol. 50, pp. 237–246, 2018.

[3] C.-C. J. Kuo, M. Zhang, S. Li, J. [...]

MCL Research on Object Tracking

Video object tracking is one of the fundamental computer vision problems and has found rich applications in video surveillance, autonomous navigation, robotics vision, etc. In the setting of online single object tracking (SOT), a tracker is given a bounding box on the target object at the first frame and then predicts its boxes for all remaining frames. Online tracking methods can be categorized into two categories, unsupervised and supervised. Traditional trackers are unsupervised. Recent deep-learning-based (DL-based) trackers demand supervision. Unsupervised trackers are attractive since they do not need annotated boxes to train supervised trackers. The performance of trackers can be measured in terms of accuracy (higher success rate), robustness (automatic recovery from tracking loss), and speed (higher FPS).

We examine the design of an unsupervised high-performance tracker and name it UHP-SOT (Unsupervised High-Performance Single Object Tracker) in this work. UHP-SOT consists of three modules: 1) appearance model update, 2) background motion modeling, and 3) trajectory-based box prediction. Previous unsupervised trackers pay attention to efficient and effective appearance model update. Built upon this foundation, an unsupervised discriminative-correlation-filters-based (DCF-based) tracker STRCF [1] is adopted by UHP-SOT as the baseline in the first module. Yet, the use of the first module alone has shortcomings such as failure in tracking loss recovery and being weak in box size adaptation. We propose ideas for background motion modeling and trajectory-based box prediction to address the mentioned problems. The baseline tracker gets initialized at the first frame. For the following frames, UHP-SOT gets proposals from all three modules and chooses one of them as the final prediction based on a fusion strategy, as shown in Fig. 1. Fig. 2 shows example results on sequences from the OTB-2015 [2] benchmark. Our tracker runs [...]

MCL Research on Object Detection

Object detection is one of the most essential and challenging tasks in computer vision, while most state-of-the-art object detection methods adopt an end-to-end deep neural network, we aim at an interpretable framework that has low complexity, high efficiency in training, and high performance. The method is built upon the PixelHop framework, as shown in fig 1. The term “hop” denotes the neighborhood of a pixel. Pixelhop conducts spectral analysis on neighborhoods of different sizes centered on a pixel through a sequence of cascaded dimension reduction units. The neighborhoods of an object contain representative patterns of the objects such as salient contours and, as a result, they have distinctive spectral signatures at a certain scale that matches the object size, thus bounding boxes and  class labels can be predicted based on supervised learning with Saab coefficients in proper hops as the representations.

Our method takes YOLO’s problem formulation as reference and ensembles three major modules to finish the object detection task. As shown in fig.1, by proper settings of Pixelhop, we divide all the objects into three different scales, i.e. large(as shown blue), medium (as shown in green), and small (as shown in red), and have hops with proper receptive field (RF) responsible for proposing corresponding anchor boxes for different scales (as shown comparing with the “cat” example). With the Saab coefficients at each hop, we propose anchor boxes at each spatial location, and for each anchor box we train module 1 to predict its confidence score, module 2 to predict its class label, module 3 to predict its box regression. Eventually for each image our model will first propose potential boxes and use non max suppression based on confidence score to keep the best proposed [...]

MCL Research on SSL-based Image Classification

Image classification has been studied for many years as a fundamental problem in computer vision. With  the development of convolutional neural networks (CNNs) and the availability of larger scale datasets, we see a rapid success in the classification using deep learning for both low- and high-resolution images. Although being effective, one major challenge associated with deep learning is that its underlying mechanism is not transparent.

Being inspired by deep learning, the successive subspace learning (SSL) methodology was proposed by Kuo in a sequence of papers. Different from deep learning, SSL-based methods learn feature representations in an unsupervised feedforward manner using multi-stage principle component analysis (PCA). Joint spatial-spectral representations are obtained at different scales through multi-stage transforms. Three variants of the PCA transform were developed. They are the Saak transform [1], the Saab transform [2], and the channel-wise (c/w) Saab transform [4]. Two SSL-based image classification pipelines, PixelHop [3] and PixelHop++ [4], were designed based on the Saab transform and c/w Saab transform respectively. Both follow the  traditional  pattern  recognition  paradigm  and  partition  the classification  problem  into  two  cascaded  modules: 1) feature extraction and 2) classification. Every step in PixelHop/PixelHop++ is explainable, and the whole solution is mathematically transparent.

To further improve the performance, we propose a SSL-based two-stage sequential image classification pipeline, named E-PixelHop method. The motivation is that for a multi-class classification problem, it is easier to distinguish between classes of dissimilarity than those of similarity. For example, one should distinguish between cats and cars better than between cats and dogs. Along this line, one can build a hierarchical relation among multiple classes based on their semantic meaning to improve classification performance. Instead of manually constructing the hierarchical learning structure before classification, E-PixelHop resolves [...]

MCL Research on Texture Synthesis

Automatic   synthesis   of   visually   pleasant   texture   that resembles  exemplary  texture  finds  applications  in  computer  graphics.  Texture  synthesis  has  been  studied  for several  decades  since  it  is  also  of  theoretical  interest  in texture analysis and modeling. Texture can be synthesized pixel-by-pixel or patch-by-patch based on an exemplary  pattern.  For  the  pixel-based  synthesis,  a  pixel conditioned on its squared neighbor was synthesized using the  conditional  probability  and  estimated  by  a  statistical method. Generally,  patch-based  texture  synthesis yields  higher  quality  than  pixel-based  texture  synthesis. Yet, searching the whole image for patch-based synthesis is  extremely  slow.  To  speed  up  the  process,  small patches of the exemplary texture can be stitched together to  form  a  larger  region. Although  these  methods can produce texture of higher quality, the diversity of produced textures is limited. Besides texture synthesis in the spatial domain, texture images from the spatial domain can be transformed to the spectral domain with certain filters (or kernels), thus exploiting the statistical correlation of filter responses for texture synthesis. Commonly used kernels include the Gabor filters and the steerable pyramid filter banks.

We  have  witnessed  amazing  quality  improvement  of synthesized  texture  over  the  last  five  to  six  years  due to  the  resurgence  of  neural  networks.  Texture  synthesis based on deep learning (DL), such as Convolutional Neural Networks  (CNNs)  and  Generative  Adversarial  Networks(GANs), yield visually pleasing results. DL-based methods learn transform kernels from numerous training data through end-to-end optimization. However, these methods have two main shortcomings: 1) a lack of mathematical  transparency  and  2)  a  higher  training  and  inference complexity. To address these drawbacks, we investigate a non-parametric and interpretable texture synthesis method, called NITES [1].

NITES  consists  of  three  steps.  First,  it  analyzes  the exemplary texture to obtain its joint spatial-spectral [...]

MCL Research on Image Denoising

As a fundamental Computer vision problem, image denoising aims at reducing noise images to improve resolutions.  As a sub-topic of image restoration, image denoising not only has wide applications in practical problems, but also can be important pre-processing procedures for other CV or NLP problems.

Traditionally, algorithms by patch-wise denoising, like Non-local Mean and BM3D, usually assume the noise are Gaussian noise try to reduce noise by the randomness of noise and signal preservation across similar patches. After CNN architecture introduced to CV field, similar to other image restoration problems like super-resolution, deblurring, and dehazing, denoising problem also developed out CNN-based methods, with two main streams: one focus more on pixel-wise restoration and the other cares more about overall pleasure. Besides, combining different image restoration problems together, that building a more general model which can work on multiple image restoration problems gradually obtained more attention.

With better performance achieved in denoising problem, more and more algorithms suffer from the model size and reference speed. We would like to introduce SSL principle to tackle denoising problem with comparable performance while with higher efficiency in the future.

Image credit: Dabov, Kostadin, et al. “Image denoising by sparse 3-D transform-domain collaborative filtering.” IEEE Transactions on image processing 16.8 (2007): 2080-2095.

Welcome MCL New Member – Xinyu Wang

In Summer 2021, we have a new MCL member, Xinyu Wang, joining our big family. Here is a short interview with Xinyu with our great welcome.

1. Could you briefly introduce yourself and your research interests?

My name is Xinyu Wang, I am a Master student in Electrical Engineering, and it’s my second year at USC. I am new here as a summer intern. My research interests mainly include machine learning and robotics, and I will work on image forensics related topics this summer at MCL.

MCL has a group of motivated and intelligent people, who are full of passion about their research. And I am impressed by the open and friendly atmosphere here. People are encouraged to show their ideas and help each other, and everyone is friendly and supportive.

3. What is your future expectation and plan in MCL?

This summer, I am working with Yao on image forensics topics using the green learning method, under the supervision of Professor Kuo. I believe this will be a great opportunity for me to further explore machine learning and working on this interesting topic. I also hope to make new friends here and make lasting connections with MCL members.

Welcome MCL New Member – Peida Han

In Summer 2021, we have a new MCL member, Peida Han, joining our big family. Here is a short interview with Peida with our great welcome.

1. Could you briefly introduce yourself and your research interests?

My name is Peida Han, and I am a first year master student in Computer Science (artificial intelligence) at USC. I received my Bachelor’s degree in Computer Science and Engineering from the Ohio State University in 2016. I previously worked on some machine learning based projects such as an autonomous aerial system on drones in my undergrad. With strong interest in image processing for practical real life applications, I am currently working on the Breast Cancer Image segmentation project in MCL.

2. What is your impression about MCL and USC?

My impression about MCL is that the lab members are friendly and motivating. I feel everyone is approachable and the whole group like to help each other out. In addition, everyone is dedicated to their work and I am inspired to work hard and learn from them. USC provides valuable resources from the perspectives of both academia and industry. My impression of USC is that students have access to resources easily and professors have high standards of course quality, and there are many other valuable resources. I am glad to be a part of the Trojan family.

3. What is your future expectation and plan in MCL?

My expectation in MCL is to explore my potentials in pure research, especially in the image processing field. And I am glad I can be involved in the research of image segmentation and hope that could be helpful to society. I learnt great a lot from Prof. Kuo and I hope I can contribute my own efforts in MCL. It [...]

MCL Research on SSL-based Image Anomaly Localization

Image anomaly localization is an important problem in image processing and computer vision, with numerous applications in many areas, such as industrial manufacturing inspection, medical image diagnosis and even video surveillance analysis. The goal of image anomaly localization is to locate the anomaly or anomalous region on the pixel level. Like most other anomaly detection problems, we formulate image anomaly localization as an unsupervised task. More specifically, it means training set only contains normal images, and no anomalous images and corresponding labeled masks are available during model training. This is because anomalous examples are either too expensive to collect or too few to be modeled, which makes it an extremely challenging yet attracting problem.

To tackle this problem, we propose a new image anomaly localization method, called AnomalyHop [1], based on the successive subspace learning (SSL) framework. This is also the first work that applies SSL to the anomaly localization problem. AnomalyHop consists of three modules: 1) feature extraction via successive subspace learning (SSL), 2) normality feature distributions modeling via various Gaussian models, and 3) anomaly map generation and fusion. As compared with previous deep-learning-based image anomaly localization methods, AnomalyHop is mathematically transparent, easy to train, and fast in its inference speed. Besides that, its area under the ROC curve (ROC-AUC) performance on the MVTec AD dataset is 95.9%, which is the state-of-the-art performance.

-By Kaitai Zhang and Bin Wang


[1] Zhang, K., Wang, B., Wang, W., Sohrab, F., Gabbouj, M., & Kuo, C. C. J. (2021). AnomalyHop: An SSL-based Image Anomaly Localization Method. arXiv preprint arXiv:2105.03797.

Congratulations to Kaitai Zhang for Passing His Defense

Congratulations to Kaitai Zhang for passing his defense on May 19, 2021. His Ph.D. thesis is entitled “Data-Driven Image Analysis, Modeling, Synthesis and Anomaly Localization Techniques”. Here we invite Kaitai to share a brief introduction of his thesis and some words he would like to say at the end of the Ph.D. study journey.

1) Abstract of Thesis

Emerging Deep learning and machine learning techniques have brought impressive improvements for numerous topics in image processing and computer vision fields. In this thesis, we introduce our research on Data-Driven Image Analysis, Modeling, Synthesis and Anomaly Localization Techniques: 1) image anomaly detection and localization; 2) texture analysis, modeling and synthesis.

For the first part, we will focus on image anomaly detection and localization tasks. Image anomaly detection is a binary classification problem to determine whether an input contains an anomaly, and image anomaly localization is to get pixel-precise segmentation of regions that appear anomalous. Detecting and localizing anomalies is a critical and long-standing problem in image processing and computer vision, and has applications in many real-world scenarios like medical image diagnosis and automated manufacturing inspection. In this talk, I will introduce two of our recent works, PEDENet and AnomalyHop. PEDENet is a neural network-based framework that jointly learns image local feature and density estimation model. AnomalyHop employs successive subspace learning (SSL) framework, and utilizes various Gaussian Descriptors to learn normality feature distributions. Both of them achieve state-of-the-art performance on MVTec AD dataset, and provide either smaller model size or faster inference speed.

In the second part, our previous works in texture analysis, modeling and synthesis will be reviewed. For dynamic texture synthesis, two effective techniques will be proposed and proved effective. The enhanced model could encode coherence of local features as well as the [...]

