3D point cloud segmentation requires the understanding of both the global geometric structure and the fine-grained details of each point. According to the segmentation granularity, 3D point cloud segmentation methods can be classified into three categories: semantic segmentation (scene level), instance segmentation (object level) and part segmentation (part level). Our research focuses on solving the semantic segmentation problem. Given a point cloud, the goal of semantic segmentation is to separate it into several subsets according to the semantic meanings of points. There are two common scenes, urban scene and indoor scene. Efficient semantic segmentation of large-scale 3D point clouds is a fundamental and essential capability for real-time intelligent systems, such as autonomous driving and augmented reality. We mainly try to do the large-scale 3D indoor scene semantic segmentation more efficiently. The representative large-scale public is the indoor S3DIS dataset [1].

A key challenge is that the raw point clouds acquired by depth sensors are typically irregularly sampled, unstructured and unordered.  Recently, the pioneering work PointNet [2] was proposed for directly processing 3D point clouds by learning per-point features using shared multilayer perceptrons (MLPs) and max pooling. Its following works try to capture wider context information for each point. Although these approaches achieve impressive results for object recognition and semantic segmentation, almost all of them are limited to extremely small 3D point clouds and cannot be directly extended to larger scale without preprocessing such as block partition.

We design a different data preprocessing method to learn large scale data directly. Each room in the dataset is treated as an input sample and feed into an unsupervised feature extractor to obtain point-wise features. The unsupervised feature extractor is developed upon our previous work PointHop [3]. It is extremely light-weighted and extremely fast compared with deep learning methods. We even outperform PointNet on the S3DIS dataset with only 20% data of it.

 

References:

[1] I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese. “3d semantic parsing of large-scale indoor spaces,” IEEE CVPR, 2016.

[2] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. CVPR, 2017.

[3] M. Zhang, H. You, P. Kadam, S. Liu, and C.-C. J. Kuo, “Pointhop: An explainable machine learning method for point cloud classification,” IEEE Transactions on Multimedia, 2020.