Object detection and recognition has always been one of the key challenges in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Many approaches to the task have been implemented over multiple decades, including handcrafted features, machine learning algorithms and deep learning.

Recently breakthrough results have been made via Deep Learning with loads of labelled data for supervised training, while Deep Learning is notorious for lacking in scalability and interpretability. With the advantage of recent proposed scalable and interpretable Pixelhop model [1] and pixelhop++ [2], a new object detection pipeline can be proposed via Object Proposal-> Feature Extraction using SSL ->  Classification, thus a machine learning based Statistic-based attention is the key to generate object proposals.

Apart from Deep Learning, object proposal via visual saliency on single images such as DRFI [3] can be a good start for a machine learning based object proposal. To further take advantage of the statistics from training data, we formulate the weakly supervised object proposal problem into object search with features capable of matching, such as SURF [4]. In the future, we aim to improve these results by further exploration with a query-retrieval based saliency proposal method along with adapted bag of word features.


-By Hongyu Fu

[1] Yueru Chen and C-C Jay Kuo, “Pixelhop: A successive subspace learning (ssl) method for object recognition,” Journal of Visual Communication and Image Representation, p. 102749, 2020.

[2]Yueru Chen , Mozhdeh Rouhsedaghat , Suya You, Raghuveer Rao and C.-C. Jay Kuo, PIXELHOP++: A SMALL SUCCESSIVE-SUBSPACE-LEARNING-BASED (SSL-BASED) MODEL FOR IMAGE CLASSIFICATION, arXiv:2002.03141v1

[3] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, S. Li, “Salient object detection: A discriminative regional feature integration approach”, Proc. IEEE Conf. CVPR, pp. 2083-2090, Jun. 2013.

[4]H. Bay, “SURF: Speeded Up Robust Features”, IECCV 2006, vol. 3951, pp. 404-417, 2006.


Image credits:

Image 1 showing the SURF matching is from OPENCV https://docs.opencv.org/master/dc/dc3/tutorial_py_matcher.html

Image 2 showing the architecture of Pixelhop++ is from [2].