Image classification has been studied for many years as a fundamental problem in computer vision. With  the development of convolutional neural networks (CNNs) and the availability of larger scale datasets, we see a rapid success in the classification using deep learning for both low- and high-resolution images. Although being effective, one major challenge associated with deep learning is that its underlying mechanism is not transparent.

Being inspired by deep learning, the successive subspace learning (SSL) methodology was proposed by Kuo et.al. in a sequence of papers. Different from deep learning, SSL-based methods learn feature representations in an unsupervised feedforward manner using multi-stage principle component analysis (PCA). Joint spatial-spectral representations are obtained at different scales through multi-stage transforms. Three variants of the PCA transform were developed. They are the Saak transform [1], the Saab transform [2], and the channel-wise (c/w) Saab transform [4]. Two SSL-based image classification pipelines, PixelHop [3] and PixelHop++ [4], were designed based on the Saab transform and c/w Saab transform respectively. Both follow the  traditional  pattern  recognition  paradigm  and  partition  the classification  problem  into  two  cascaded  modules: 1) feature extraction and 2) classification. Every step in PixelHop/PixelHop++ is explainable, and the whole solution is mathematically transparent.

To further improve the performance, we propose a SSL-based two-stage sequential image classification pipeline, named E-PixelHop method. The motivation is that for a multi-class classification problem, it is easier to distinguish between classes of dissimilarity than those of similarity. For example, one should distinguish between cats and cars better than between cats and dogs. Along this line, one can build a hierarchical relation among multiple classes based on their semantic meaning to improve classification performance. Instead of manually constructing the hierarchical learning structure before classification, E-PixelHop resolves the confusing groups by automatically partitioning the samples into different subspaces based on their baseline predictions. Experiments show the significance of confusing group retraining that the overall classification performance can be boosted by successively resolving each difficult subspace separately.

 

Reference:

  • C.-C. J. Kuo and Y. Chen, “On data-driven saak transform,” Journal of Visual Communication and Image Representation, vol. 50, pp. 237–246, 2018.
  • C.-C. J. Kuo, M. Zhang, S. Li, J. Duan, and Y. Chen, “Interpretable convolutional neural networks via feedforward design,” Journal of Visual Communication and Image Representation, 2019.
  • Chen and C.-C. J. Kuo, “Pixelhop: A successive subspace learning (ssl) method for object recognition,” Journal of Visual Communication and Image Representation, vol. 70, p. 102749, 2020.
  • Chen, M. Rouhsedaghat, S. You, R. Rao, and C.-C. J. Kuo, “Pix- elhop++: A small successive-subspace-learning-based (ssl-based) model for image classification,” arXiv preprint arXiv:2002.03141, 2020.