MCL Research on 3D Perception with Large Foundational Models

By Mahtab Movahhedrad | April 7, 2024 | News

Understanding and retrieving information in 3D scenes poses a significant challenge in artificial intelligence (AI) and machine learning (ML), particularly in grasping complex spatial relationships and detailed properties of objects in 3D spaces. While large foundational models such as CLIP [1] have made impressive progress in the 2D domain, their direct application to 3D scene understanding is not straightforward. Therefore, we aim to leverage existing large foundation models to understand 3D scenes and extract 3D information, avoiding building an entirely new 3D model from scratch.
The advancement of large foundation models has inspired recent studies to establish connections among images, text, and 3D data. The work in [2] extracts features from 3D objects with a 3D encoder and aligns them with features extracted by a visual encoder and a text encoder from corresponding rendered 2D images and text, enabling information retrieval of 3D objects from different modalities. However, the model mainly focuses on 3D objects and is limited in understanding complicated 3D spaces like 3D indoor scenes.
[3] Reconstructed compact 3D indoor scenes from multi-view images using DVGO and aligned them with semantic segmentation extracted from corresponding 2D multiview images using the CLIP-LSeg model. A visual reasoning pipeline is applied for reasoning tasks in 3D spaces.
Presently, we are going to develop a ML model that leverages the existing large foundation models to get a better understanding of 3D scenes with lower computational complexity.

Reference:
[1]Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. “Learning transferable visual models from natural language supervision,” In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
[2] Hegde, Deepti, Jeya Maria Jose Valanarasu, and Vishal Patel. “Clip goes 3D: Leveraging prompt tuning for language grounded 3D recognition,” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
[3] Hong, Yining, et al. “3D Concept Learning and Reasoning from Multi-View Images,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
Image credits:
The image showing the architecture of CG3D is from [2].
The image showing the architecture of 3D-CLR is from [3].

About the Author: Mahtab Movahhedrad

Mahtab Movahhedrad received her B.S. and M.S. degree in Electrical Engineering from the University of Tabriz and Tehran polytechnics, Iran, respectively. She is currently a Ph.D. student in the Department of Electrical Engineering, University of Southern California, advised by Professor Kuo. She joined Media Communications Lab in Fall 2021. Her research interests include image processing, computer vision, and Machine learning.

MCL Research on 3D Perception with Large Foundational Models

Share This Story, Choose Your Platform!

About the Author: Mahtab Movahhedrad

You May Also Like

Welcome New MCL Member James Zhan

Welcome New MCL Member Alek Yegazarian

Welcome New MCL Member Jimmy Xiao

Welcome New MCL Member Kevin Lim

Welcome New MCL Member Qi Cao

MCL Research on Image Classification

Congratulations to Wei Wang for Passing Her Defense!

Congratulations to Qingyang Zhou for Passing His Defense!

MCL Research on Prostate Segmentation

MCL Research on Green Image Super-resolution

MCL Research on Nuclei Segmentation

MCL Research on Seismic Data Processing

MCL Research on Image Denoising

MCL Research on Video-Text Retrieval

MCL Research on Enhanced Object Detection

MCL Research on Video Camouflaged Object Detection (VCOD)

MCL Research on Image Dehazing

Reunion of MCL Alumni at Southern California

MCL Research on Image Demosaicing

MCL Research on Transfer Learning

MCL Research on EDA

Professor Kuo Gave a Keynote at AIxMM 2025

Welcome New MCL Member Cynthia Huang

Congratulations to Ganning Zhao for Passing Her Defense!

Welcome to the Spring 2025 semester!

Happy New Year!

Merry Christmas!

MCL Research on Green Learning for Medical Imaging

Research on Green Image Segmentation

MCL’s Thanksgiving Luncheon

MCL Research on Radar Signal Processing: Jamming signal detection

Congratulations to Professor Kuo for Receiving NTU Distinguished Alumni Award

MCL Research on Feedforward Visual Attention

MCL Research on Word Embedding Dimension Reduction

Welcome New MCL Member Hong-En Chen

Welcome New MCL Member Laurence Palmer

Professor C.-C. Jay Kuo Named Inaugural Ming Hsieh Chair Holder

Welcome New MCL Member Alexander Jou

Welcome New MCL Member Qixin Hu

Welcome New MCL Member Youngrae Kim

Congratulations to Vasileios Magoulianitis for Passing His Defense

Congratulations to Zhanxuan Mei for Passing His Defense

MCL Research on Supervised Feature Learning

MCL Research on Green Saliency-guided Blind Image Quality Assessment (GSBIQA)

MCL Research on Green Raw Image Demosaicking

MCL Research on Green Saliency-guided Blind Image Quality Assessment (GSBIQA)

MCL Research on Prostate Lesion Detection from MRI Images

MCL Research on Green Image Super-resolution

Professor Kuo Attended ICME in Niagara Falls, Canada