Object detection and recognition is critical to image understanding, and there has been a long competition between supervised and unsupervised approaches in visual attention extraction. We are interested in an unsupervised approach and our method contains two main complimentary parts: Spectral Clustering Segmentation and Contour Detection.
Spectral Clustering has been a mature method for image segmentation, during which images are viewed as graph. For a standard spectral clustering pipeline, usually with each pixel as a vertex, a pixelwise affinity matrix is calculated from the graph, then the Laplacian matrix of the affinity matrix, and with predefined number of clusters K, Kmeans clustering is conducted with the first K smallest eigenvectors of the Laplacian matrix to give the final segmentation results. In our current method, Pointhop features are adapted instead of the biological features like colors or textures to construct the graph for input image, which is the core contribution to the progress. For each input image, Pointhop features are extracted with channel-wise Saab, and K-neighbors Graph is constructed with the feature map, then combined with the following standard spectral clustering process. To evaluate Segments from Spectral Clustering on Pixelhop, Contour Detection is introduced as complementary middle level features. Here, structure edge [1] detection results are used for contour detection, for each segment from the spectral clustering, the largest closed contours within the segment are evaluated by heuristic rules to check whether a reasonable object or not.
During this process, most objects proposed are parts of a main object, e.g. eyes, face, hand, harms of a human, then during the post process adjacent objects proposed are merged to construct bigger objects, and a full Rectangle Tree of Objects can be constructed for each input image.
By Hongyu Fu