Given a point cloud set, the goal of semantic segmentation is to label every point as one of the semantic categories. Semantic segmentation of large-scale point clouds finds a wide range of real-world applications such as autonomous driving in an out-door environment and robotic navigation in an in- or out-door environment. As compared with the point cloud classification problem that often targets at small-scale objects, a high-performance point cloud semantic segmentation method demands a good understanding of the complex global structure as well as the local neighborhood of each point. Meanwhile, efficiency measured by computational complexity and memory complexity is important for practical real-time systems.

State-of-the-art point cloud classification and segmentation methods are based on deep learning. Raw point clouds captured by the LiDAR sensors are irregular and unordered. They cannot be directly processed by deep learning networks designed for 2D images. This problem was addressed by the pioneering work on the PointNet. PointNet and its follow-ups achieved impressive performance in small-scale point cloud classification and segmentation tasks, but they can’t be generalized to handle large-scale point cloud directly due to the memory and time constraints.

An efficient solution to semantic segmentation of large-scale indoor scene point clouds is proposed in this work. It is named GSIP (Green Segmentation of Indoor Point clouds) [1], and its performance is evaluated on a representative large-scale benchmark — the Stanford 3D Indoor Segmentation (S3DIS) dataset. GSIP has two novel components: 1) a room-style data pre-processing method that selects a proper subset of points for further processing, and 2) a new feature extractor which is extended from PointHop. For the former, sampled points of each room form an input unit. For the latter, the weaknesses of PointHop’s feature extraction when extending it to large-scale point clouds are identified and fixed with a simpler processing pipeline. As compared with PointNet, which is a pioneering deep-learning-based solution, GSIP is green since it has significantly lower computational complexity and a much smaller model size. Furthermore, experiments show that GSIP outperforms PointNet in segmentation performance for the S3DIS dataset.

— Min Zhang

[1] Min Zhang, Pranav Kadam, Shan Liu, and C.-C. Jay Kuo, “GSIP: Green Semantic Segmentation of Large-Scale Indoor Point Clouds.” arXiv preprint arXiv:2109.11835 (2021).