MCL Research on 3D Perception with Large Foundational Models

By Mahtab Movahhedrad | July 14, 2024 | News

Understanding and retrieving information in 3D scenes poses a significant challenge in artificial intelligence (AI) and machine learning (ML), particularly in grasping complex spatial relationships and detailed properties of objects in 3D spaces. Multiple tasks are suggested to assess 3D understanding, such as 3D object retrieval, 3D captioning, 3D question answering, 3D vision grounding, etc.

Existing methods can be roughly divided into two categories. The first category utilizes large 2D foundational models for feature extraction and maps 2D pixel-wise features to 3D point-wise features for 3D tasks. For example, the 3D-CLR model [1] extracts 2D features from multiview images with the CLIP-LSeg model [2] and maps the 2D features to 3D points in a reconstructed neural radiance field compact representation. The reasoning process is performed via a set of neural reasoning operators. The 3D-LLM model [3] utilizes 2D vision-language models (VLM) as the backbone. It extracts 2D features with the ConceptFusion model [4] and maps them to 3D points. Then, the 3D information is injected into a large language model to generate text outputs.

Another group of methods directly handles 3D point clouds with a 3D encoder and tries to align the extracted 3D features with the features from other modalities. This group of methods may require the training of a 3D encoder and may need many computational resources. For example, the Uni3D [5] leverages a unified vanilla transformer structurally equivalent to a 2D Vision Transformer (ViT) as the backbone to extract 3D features. Downstream tasks can be achieved after feature alignment among different modalities. It is also possible to leverage pre-trained 3D encoders. Point-SAM [6] utilizes the point cloud encoder from the Uni3D to transform the input point cloud into embeddings. It starts by sampling a fixed number of centers using Farthest Point Sampling (FPS) and groups the k-nearest neighbors of each center into patches. It aims to segment anything in 3D worlds.

The approach to achieving comprehensive 3D understanding remains unclear, leaving significant room for improvement in this field. We may need to explore novel algorithms, enhance current techniques, or even develop entirely new frameworks to solve the problems.

Reference:

[1] Hong, Yining, et al. “3D Concept Learning and Reasoning from Multi-View Images,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[2] Li, Boyi, et al. “Language-driven semantic segmentation.” arXiv preprint arXiv:2201.03546 (2022).

[3] Hong, Yining, et al. “3d-llm: Injecting the 3d world into large language models.” Advances in Neural Information Processing Systems 36 (2023): 20482-20494.

[4] Jatavallabhula, Krishna Murthy, et al. “Conceptfusion: Open-set multimodal 3d mapping.” arXiv preprint arXiv:2302.07241 (2023).

[5] Zhou, Junsheng, et al. “Uni3d: Exploring unified 3D representation at scale.” arXiv preprint arXiv:2310.06773 (2023).

[6] Zhou, Yuchen, et al. “Point-SAM: Promptable 3D Segmentation Model for Point Clouds.” arXiv preprint arXiv:2406.17741 (2024).

Image credits:

The image showing the architecture of 3D-LLM is from [3].

The image showing the architecture of Uni3D is from [5].

About the Author: Mahtab Movahhedrad

Mahtab Movahhedrad received her B.S. and M.S. degree in Electrical Engineering from the University of Tabriz and Tehran polytechnics, Iran, respectively. She is currently a Ph.D. student in the Department of Electrical Engineering, University of Southern California, advised by Professor Kuo. She joined Media Communications Lab in Fall 2021. Her research interests include image processing, computer vision, and Machine learning.

MCL Research on 3D Perception with Large Foundational Models

Share This Story, Choose Your Platform!

About the Author: Mahtab Movahhedrad

You May Also Like

MCL Research on Eosinophilic esophagitis (EoE) Diagnosis

MCL Research on MRI Prostate Image Quality Assessment

MCL Research on Biomarker Prediction for Kidney Cancer

MCL Research on Wavelet-Based Green Learning

MCL Research on Motion YOLO

MCL Research on Multi-Stage XGBoost

Welcome New MCL Member James Zhan

Welcome New MCL Member Alek Yegazarian

Welcome New MCL Member Jimmy Xiao

Welcome New MCL Member Kevin Lim

Welcome New MCL Member Qi Cao

MCL Research on Image Classification

Congratulations to Wei Wang for Passing Her Defense!

Congratulations to Qingyang Zhou for Passing His Defense!

MCL Research on Prostate Segmentation

MCL Research on Green Image Super-resolution

MCL Research on Nuclei Segmentation

MCL Research on Seismic Data Processing

MCL Research on Image Denoising

MCL Research on Video-Text Retrieval

MCL Research on Enhanced Object Detection

MCL Research on Video Camouflaged Object Detection (VCOD)

MCL Research on Image Dehazing

Reunion of MCL Alumni at Southern California

MCL Research on Image Demosaicing

MCL Research on Transfer Learning

MCL Research on EDA

Professor Kuo Gave a Keynote at AIxMM 2025

Welcome New MCL Member Cynthia Huang

Congratulations to Ganning Zhao for Passing Her Defense!

Welcome to the Spring 2025 semester!

Happy New Year!

Merry Christmas!

MCL Research on Green Learning for Medical Imaging

Research on Green Image Segmentation

MCL’s Thanksgiving Luncheon

MCL Research on Radar Signal Processing: Jamming signal detection

Congratulations to Professor Kuo for Receiving NTU Distinguished Alumni Award

MCL Research on Feedforward Visual Attention

MCL Research on Word Embedding Dimension Reduction

Welcome New MCL Member Hong-En Chen

Welcome New MCL Member Laurence Palmer

Professor C.-C. Jay Kuo Named Inaugural Ming Hsieh Chair Holder

Welcome New MCL Member Alexander Jou

Welcome New MCL Member Qixin Hu

Welcome New MCL Member Youngrae Kim

Congratulations to Vasileios Magoulianitis for Passing His Defense

Congratulations to Zhanxuan Mei for Passing His Defense

MCL Research on Supervised Feature Learning

MCL Research on Green Saliency-guided Blind Image Quality Assessment (GSBIQA)

MCL Research on Green Raw Image Demosaicking

MCL Research on Green Saliency-guided Blind Image Quality Assessment (GSBIQA)

MCL Research on Prostate Lesion Detection from MRI Images

MCL Research on Green Image Super-resolution

Professor Kuo Attended ICME in Niagara Falls, Canada

MCL Research on Prostate MRI Image Segmentation

Professor Kuo Met MCL Alumni in Thailand

Professor Kuo visited Singapore

Professor Kuo met MCL Alumni in Taiwan

MCL Research on Nuclei Segmentation for Histological Images

MCL Research on Seismic Data Processing

MCL Research on Point Cloud Surface Reconstruction

Congratulations to Chengwei Wei for Passing His Defense

Congratulations on MCL Members Attending Ph.D. Hooding Ceremony

Welcome New MCL Member Dingyi Nie

MCL Research on Parsing Tree Construction

MCL Research on Video Camouflaged Object

MCL Research on Green Learning for Electronic Design Automation (EDA)