ZhiruoZhou

  • Permalink Gallery

    MCL Research on Point Cloud Object Retrieval and Pose Estimation

MCL Research on Point Cloud Object Retrieval and Pose Estimation

Object pose estimation is an important problem in 3D scene understanding. Given a 3D point cloud object, it tries to estimate the 6-DOF pose comprising of rotation and translation with respect to a chosen coordinate system. The pose information can then be used for downstream tasks such as object grasping, obstacle avoidance, path planning, etc. which are commonly encountered in Robotics. In a complete scene understanding system, pose estimation usually comes after a 3D detection algorithm has localized and classified the object.

The pose estimation problem is similar to the problem of point cloud object registration which has been previously studied at MCL. In particular, the R-PointHop [1] method was proposed which successfully registers a source point cloud with a template. In the most recent work, we present a method termed PCRP that modifies R-PointHop for object pose estimation when a similar template object is unavailable. PCRP assumes a gallery set of pre-aligned point cloud objects and reuses the R-PointHop features to retrieve a similar object from the gallery. To do so, the pointwise features obtained using R-PointHop are aggregated into a global feature vector for nearest neighbor retrieval using the Vector of Locally Aggregated Descriptors (VLAD) [2]. Then, the input object’s pose is estimated by registering it with the retrieved object.

Though point cloud retrieval is extensively studied in contexts like shape retrieval or place recognition, retrieval in presence of different object poses is less talked of. In this work we show how the similar object can be retrieved even in presence of different object poses. This is achieved due to the rotation invariant features learned by R-PointHop. Another improvement over R-PointHop is the replacement of conventional eight octant partitioning based point attributes with more [...]

By |February 13th, 2022|News|Comments Off on MCL Research on Point Cloud Object Retrieval and Pose Estimation|

MCL Research on Point Cloud Compression

Point Cloud Compression (PCC) has received a lot of attention in recent years due to its wide applications such as virtual reality (VR), augmented reality (AR), and mixed reality (MR). Video-based PCC (V-PCC) and geometry-based PCC (G-PCC) are two distinct technologies developed by MPEG 3DG[1][2]. Deep-learning-based (DL-based) PCC is a strong competitor to them. Most DL methods generalize the DL-based image coding pipeline to the point cloud data [3][4]. They outperform G-PCC in the current MPEG 3DG standard in the dense point cloud compression. Yet, their performances are still inferior to that of V-PCC in the coding of dynamic point clouds.

We propose to design a learning-based PCC solution that could outperform those DL-based methods with lower complexity and less memory consumption. Our method uses geometry projection to generate 2D images and apply vector quantization-based 2D image codec to compress the projected map. For a point cloud sequence, we can do the projection in three steps. First, split the sequence into blocks by doing the octree partition. Second, project each 3D block into a plane and pack all the planes into a map. Third, encode/decode the 2D map and reconstruct the 3D point cloud sequence. They are demonstrated in Fig.1. We do the non-uniform sampling for the projected planes and pack all the planes to generate one depth map and one texture map in the reconstruction process. The two maps are shown in Fig.2.

Presently, we utilize the x264/x265 codec to code the maps. In the future, we will adopt a vector quantization-based image codec to compress the two maps.

— Qingyang Zhou

Reference

[1] S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Cesar, P. A. Chou, R. A. Cohen, M. Krivoku ́ca, S. Lasserre, Z. Li et [...]

By |February 6th, 2022|News|Comments Off on MCL Research on Point Cloud Compression|

MCL Research on Unsupervised Object Tracking

Video object tracking is one of the fundamental computer vision problems. It finds rich applications in video surveillance, autonomous navigation, robotics vision, etc. Given a bounding box on the target object at the first frame, a tracker has to predict object box locations and sizes for all remaining frames in online single object tracking (SOT). The performance of a tracker is measured by accuracy (higher success rate), robustness (automatic recovery from tracking loss), computational complexity and speed (a higher number of frames per second of FPS).

Online trackers can be categorized into supervised and unsupervised ones. Supervised trackers based on deep learning (DL) dominate the SOT field in recent years. DL trackers offer state-of-the-art tracking accuracy, but they do have some limitations. First, a large number of annotated tracking video clips are needed in the training, which is a laborious and costly task. Second, they demand large memory space to store the parameters of deep networks due to large model sizes. Third, the high computational power requirement hinders their applications in resource-limited devices such as drones or mobile phones. Advanced unsupervised SOT methods often use discriminative correlation filters (DCFs) which could run fast on CPU with Fast Fourier Transform and has extra small model size. There is a significant performance gap between unsupervised DCF trackers and supervised DL trackers. It is attributed to the limitations of DCF trackers such as failure to recover from tracking loss and inflexibility in object box adaptation.

To address the above issues with a green solution, previously we proposed UHP-SOT (Unsupervised High-Performance Single Object Tracker) which used STRCF as the baseline and incorporated two new modules – background motion modeling and trajectory-based object box prediction. Our new work UHP-SOT++ is an [...]

By |December 12th, 2021|News|Comments Off on MCL Research on Unsupervised Object Tracking|

MCL Research on Point Cloud Odometry

Odometry is the process of using motion sensors to estimate the change in position of an object over time. It has been widely studied in the context of mobile robots, autonomous vehicles, drones, and other moving agents. Traditional odometry based on motion sensors such as Inertial Measurement Unit (IMU) and magnetometers is prone to error accumulation over time, known as odometry drift. Visual odometry makes use of camera images and/or point cloud scans collected over time to determine the position and orientation of the moving object. Several visual odometry systems that integrate monocular, stereo vision, point clouds and IMU have been developed for object localization.
We propose an unsupervised learning method for visual odometry from LiDAR point clouds called Green Point Cloud Odometry (GPCO). GPCO follows the traditional scan matching based approach to solve the odometry problem by incrementally estimating the motion between two consecutive point cloud scans. The GPCO method can be divided into four steps. First, a geometry-aware sampling method selects a small subset of points from the input point clouds. To do so, the eigen features of points in a local neighborhood are considered followed by random point sampling. Next, the 3D view surrounding the moving object is partitioned into four parts representing the front, rear, left and right-side view. The view-partitioning step divides the sampled points into four disjoint sets. The features of the sampled points are derived using the PointHop++ [1] method. Matching points between two consecutive point clouds are found in each view using nearest neighbor rule in the feature space. Finally, the motion between the two scans is estimated using Singular Value Decomposition (SVD). The motion is updated to the estimates from previous times and the process [...]

By |December 5th, 2021|News|Comments Off on MCL Research on Point Cloud Odometry|

MCL Thanksgiving Luncheon

MCL had the annual Thanksgiving Luncheon at Shiki Seafood Buffet on November 25, 2021. The Thanksgiving Luncheon has been a tradition of MCL for more than 20 years. It’s a good chance for the whole group to gather and have a lunch together as a warm and happy family. All of us enjoyed the dilicious food and the wonderful time chatting with each other. It’s also a good opportunity to have a rest after a busy semester and get connected with other people during this hard time. Thank Professor Kuo for holding this event and thank Min for organizing it.

Happy Thanksgiving to everyone!

By |November 28th, 2021|News|Comments Off on MCL Thanksgiving Luncheon|

MCL Research on Large-Scale Indoor Image Segmentation

Given a point cloud set, the goal of semantic segmentation is to label every point as one of the semantic categories. Semantic segmentation of large-scale point clouds finds a wide range of real-world applications such as autonomous driving in an out-door environment and robotic navigation in an in- or out-door environment. As compared with the point cloud classification problem that often targets at small-scale objects, a high-performance point cloud semantic segmentation method demands a good understanding of the complex global structure as well as the local neighborhood of each point. Meanwhile, efficiency measured by computational complexity and memory complexity is important for practical real-time systems.

State-of-the-art point cloud classification and segmentation methods are based on deep learning. Raw point clouds captured by the LiDAR sensors are irregular and unordered. They cannot be directly processed by deep learning networks designed for 2D images. This problem was addressed by the pioneering work on the PointNet. PointNet and its follow-ups achieved impressive performance in small-scale point cloud classification and segmentation tasks, but they can’t be generalized to handle large-scale point cloud directly due to the memory and time constraints.

An efficient solution to semantic segmentation of large-scale indoor scene point clouds is proposed in this work. It is named GSIP (Green Segmentation of Indoor Point clouds) [1], and its performance is evaluated on a representative large-scale benchmark — the Stanford 3D Indoor Segmentation (S3DIS) dataset. GSIP has two novel components: 1) a room-style data pre-processing method that selects a proper subset of points for further processing, and 2) a new feature extractor which is extended from PointHop. For the former, sampled points of each room form an input unit. For the latter, the weaknesses of PointHop’s feature extraction when [...]

By |November 21st, 2021|News|Comments Off on MCL Research on Large-Scale Indoor Image Segmentation|

MCL Research on Depth Estimation from Images

The target of the depth estimation is to estimate a high quality dense depth map from a single RGB input image. The depth map is an image containing distance information of surface of scene objects from the camera. Depth estimation is crucial for scene understanding, since for accurate scene analysis, more information is helpful. By using the depth estimation, we will have not only the color information from RGB images, but also distance information.

Currently, most depth estimation methods use deep learning by encoder and decoder structure, which are time and computation resources consuming, for example AdaBins[1]. We aim to design a successive subspace learning based method with less computation resources and mathematically explainable, while keeping high performance.

We use the NYU-Depth-v2 dataset[2] for training. We have proposed the method shown in the second image and get some results. In the second image, P represent the RGB images after conversion and D represent the correspondent depth images. In the future, we aim to improve the results by refine the model.

— By Ganning Zhao

Reference:

[1] Bhat, S. F., Alhashim, I., & Wonka, P. (2021). Adabins: Depth estimation using adaptive bins. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4009-4018).

[2] Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012, October). Indoor segmentation and support inference from rgbd images. In European conference on computer vision (pp. 746-760). Springer, Berlin, Heidelberg.

 

Image credits:

Image showing the architecture of AdaBins is from [1].

Image showing the architecture of our current method.

By |November 14th, 2021|News|Comments Off on MCL Research on Depth Estimation from Images|

Welcome New MCL Member Mahtab Movahhedrad

We are so happy to welcome a new graduate member of MCL, Mahtab Movahhedrad. Here is an interview with Mahtab:

Could you briefly introduce yourself and your research interests?

My name is Mahtab Movahhedrad. I’m from Tabriz, Iran. I started my Ph.D. at USC in spring 2021. I received my bachelor’s and my master’s degree from the University of Tabriz and, Tehran polytechnic respectively. I have been engaged in many different fields of study such as integrated electronics, photonics, metamaterials, and Fourier optics. Now, I plan to gear my Ph.D. Studies more towards signal processing and computer vision and do research in the same area.

What is your impression about MCL and USC?

I have been exploring my passion for computer vision and multimedia for some time now and I think MCL is the best research group in USC when it comes to this field. My motivation to join as a Ph.D. student to MCL is its diverse and outstanding research group in Multimedia. There are about 30 PhD students and several post-doctoral research fellows/visiting scholars from all over the world working in the lab. Doing my PhD at USC, among erudite professors and interested students would create a challenging and at the same time amicable atmosphere for me to broaden my knowledge and collaborate with other scientists.

What is your future expectation and plan in MCL?

My future expectation from the MCL is to learn about green AI and the chance to do research in this field. This method has the potential to accelerate global efforts to protect the environment and conserve resources. I find the concept quite interesting and can’t wait to explore more. I also hope to make lasting connections and effective collaborations with individuals [...]

By |November 7th, 2021|News|Comments Off on Welcome New MCL Member Mahtab Movahhedrad|

MCL Research on Unsupervised Nuclei Image Segmentation

Nuclei segmentation has been extensively used in biological image analysis for reading histology images from microscopes. The population of nuclei, their shape and density play a significant role in clinical practice for cancer diagnosis and its aggressiveness assessment. The reading and annotation of those images is a fairly laborious and time consuming task, being carried out only from expertised pathologists. As such, the computer aided automation of this process is of high significance, since it reduces the physicians’ work load and provides a more objective segmentation output. Yet, different challenges such as color and intensity variations that result from images of different organs and acquisition settings, hinder the performance improvement of the algorithms.

In past years, both supervised and unsupervised solutions have been proposed to tackle those challenges. Most of the recent approaches [1] adopt deep learning (DL) networks, to directly learn from pathologists’ annotated segmentation masks. However, most of the available datasets have very few samples, relatively to the DL requirements, and also their annotations have reportedly a low inter-annotator rate of agreement. Therefore, it is quite challenging for DL models to generalize well to unseen images from multiple organs. At this point, the motivation behind using an unsupervised method should be obvious. We propose a data driven and parameter free methodology [2], named CBM, for the nuclei segmentation task that requires no training labels. The pipeline begins with a data-driven Color (C) transform, to highlight the nuclei cell regions over the background. Then, a data-driven Binarization (B) process is built on the bi-modal assumption that each local region has two histogram peaks, one corresponding to the background and one to the nuclei areas. We use a locally adaptive threshold to binarize each histology [...]

By |October 31st, 2021|News|Comments Off on MCL Research on Unsupervised Nuclei Image Segmentation|
  • Permalink Gallery

    Congratulations to MCL Alum, Professor Kyoung Mu Lee, to be Appointed as EiC of IEEE PAMI

Congratulations to MCL Alum, Professor Kyoung Mu Lee, to be Appointed as EiC of IEEE PAMI

The 2nd Ph.D. alumnus of MCL, Professor Kyoung Mu Lee of Seoul National University (SNU), has been appointed as the Editor-in-Chief (EiC) of the prestigious IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) from 2022 to 2024. TPAMI publishes articles on areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence. It is one of the premier journals in all of computer science. Its excellence, combined with its focus on computer vision and machine learning, positions it as one of IEEE’s flagship journals. Its impact factor is 16.39 in 2020.

Kyoung Mu Lee received his Ph. D. degree from the University of Southern California in 1993. He is currently a professor in the department of ECE and the director of the Interdisciplinary Graduate Program in Artificial Intelligence at Seoul National University (SNU). His primary research areas are Computer Vision and Machine Learning. He has served as an Editorial board member of many journals, including an Associate Editor in Chief (AEiC) of the IEEE TPAMI, Area Editor of the Computer Vision and Image Understanding (CVIU), and Associate Editor (AE) of the IEEE TPAMI, the Machine Vision Application (MVA), the IPSJ Transactions on Computer Vision and Applications (CVA), and the IEEE Signal Processing Letter. He is an Advisory Board Member of the Computer Vision Foundation (CVF) and an Editorial Advisory Board Member for Academic Press/Elsevier. He has served as a General co-Chair of the prestigious ICCV2019, ACM MM2018, ACCV2018, and an Area Chair of CVPR, ICCV, and ECCV many times. He was a Distinguished Lecturer of the Asia-Pacific Signal and Information Processing Association (APSIPA) for 2012-2013. He is currently serving as the president of the [...]

By |October 24th, 2021|News|Comments Off on Congratulations to MCL Alum, Professor Kyoung Mu Lee, to be Appointed as EiC of IEEE PAMI|