CNN technology provides state-of-the-art solutions to many image processing and computer vision problems. Given a CNN architecture, all of its parameters are determined by the stochastic gradient descent (SGD) algorithm through backpropagation (BP). The BP training demands a high computational cost. Furthermore, most CNN publications are application-oriented. There is a limited amount of progress after the classical result in [1]. Examples include explainable CNNs [2,3,4] and feedforward designs without backpropagation [5,6].

The determination of CNN model parameters in the one-pass feedforward (FF) manner was recently proposed by Kuo et al. in [6]. It derives network parameters of a target layer based on statistics of output data from its previous layer. No BP is used at all. This feedforward design provides valuable insights into the CNN operational mechanism. Besides, under the same network architecture, its training complexity is significantly lower than that of the BP design CNN. FF-designed and BP-designed CNNs are denoted by FF-CNNs and BP-CNNs, respectively.

We focus on solving the image classification problem based on the feedforward-designed convolutional neural networks (FF-CNNs) [6]. An ensemble method that fuses the output decision vectors of multiple FF-CNNs to solve the image classification problem is proposed. To enhance the performance of the ensemble system, it is critical to increasing the diversity of FF-CNN models. To achieve this objective, we introduce diversities by adopting three strategies: 1) different parameter settings in convolutional layers, 2) flexible feature subsets fed into the Fully-connected (FC) layers, and 3) multiple image embeddings of the same input source. Furthermore, we partition input samples into easy and hard ones based on their decision confidence scores. As a result, we can develop a new ensemble system tailored to hard samples to further boost classification accuracy.  Although the ensemble idea can be applied to both BP-CNNs and FF-CNNs, it is more suitable for FF-CNNs since FF-CNNs are weaker classifiers of extremely low complexity.

By  Yueru Chen, Yijing Yang, Wei Wang and C.-C. Jay Kuo


[1] George Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303–314, 1989.

[2] Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu, “Interpretable convolutional neural networks,” arXiv preprint arXiv:1710.00935, 2017.

[3] C.-C. Jay Kuo, “Understanding convolutional neural networks with a mathematical model,” Journal of Visual Communication and Image Representation, vol. 41, pp. 406–413, 2016.

[4] C.-C. Jay Kuo, “The CNN as a guided multilayer RECOS transform [lecture notes],” IEEE Signal Processing Magazine, vol. 34, no. 3, pp. 81–89, 2017.

[5] C.-C. Jay Kuo and Yueru Chen, “On data-driven Saak transform,” Journal of Visual Communication and Image Representation, vol. 50, pp. 237–246, 2018.

[6] C-C Jay Kuo, Min Zhang, Siyang Li, Jiali Duan, and Yueru Chen, “Interpretable convolutional neural networks via feedforward design,” arXiv preprint arXiv:1810.02786, 2018.