News

MCL Research on Video Camouflaged Object

Camouflage object detection (COD)is a challenging task that aims to identify targets “seamlessly” concealed within their surrounding environment, presenting a more challenging task compared to traditional object detection[1]. While under the Video camouflage object detection (VCOD), the intrinsic high variations and increased complexity in the scene poses new obstacles for the detection with videos. We proposed a green method, termed GreenCOD, that leverages gradient boosting and deep features extracted from pre-trained Deep Neural Networks (DNNs), efficiently detects the camouflage objects without back-propagation. In this quarter, based on the GreenCOD model, we further move on to explore how to deal with object detection in camouflage videos in a light-weighted, explainable way.

Inspired by the GreenCOD pipeline, our architecture integrates the EfficientNetB4 backbone in the feature extraction module for each frame. Initially, the input frames are first reshaped into a standard size of 672x672x3 and the processed through the 8-block EfficientNetB4 backbone for feature extraction under different size of reception fields. To well consider the information across different reception spatial sizes, all the features are resized and concatenated to form a rich set of features. A hierarchical architecture is implemented in decision learning module.

Then XGBoost is trained based on the initial prediction maps and temporal information. The short term temporal information is considered by extracting the motion among consecutive frames. The motion flow maps are extracted at a higher resolution, then followed by the utilization of neighborhood reconstruction. This approach ensures that each prediction location takes into account the information from a corresponding 4×4 window in the motion map. The initial result show a satisfying result on VCOD problems.

[1] Fan, Deng-Ping, et al. “Camouflaged object detection.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. [...]

By |April 21st, 2024|News|Comments Off on MCL Research on Video Camouflaged Object|
  • Permalink Gallery

    MCL Research on Green Learning for Electronic Design Automation (EDA)

MCL Research on Green Learning for Electronic Design Automation (EDA)

Recently, machine learning and AI have been applied to several electronic design automation (EDA) tasks [1], such as performance prediction, decision-making for designs, and automated design. The data-driven optimization processes provide an alternative approximation solution for NP-complete problems in EDA. However, there are still several challenges to applying machine learning algorithms in EDA problems. First, due to the protection of intellectual property (IP), it is difficult to access huge amounts of public datasets as training data. The deep learning framework relies on pre-trained models and fine-tuning techniques on small datasets. However, this approach demands high computational resources and large model size. Second, the end-to-end optimization in deep learning is viewed as a black box that lacks interpretability for making decisions in hardware designs. As a result, we aim to develop a green learning algorithm to mitigate the high demand for large amounts of training data with explainable results for EDA problems.

 

Currently, we propose a green learning architecture to address the IR-drop prediction problem. We parse netlist files into 2D format and extract features by our green learning framework automatically. Then, we select discriminative features as the input of XGboost to regress the IR-drop value. We aim to estimate the IR-drop value accurately while keeping a small model size and Flops number in an energy-efficient way.

 

Reference

[1] Huang, Guyue, et al. “Machine learning for electronic design automation: A survey.” ACM Transactions on Design Automation of Electronic Systems (TODAES) 26.5 (2021): 1-46.

By |April 14th, 2024|News|Comments Off on MCL Research on Green Learning for Electronic Design Automation (EDA)|
  • Permalink Gallery

    MCL Research on 3D Perception with Large Foundational Models

MCL Research on 3D Perception with Large Foundational Models

Understanding and retrieving information in 3D scenes poses a significant challenge in artificial intelligence (AI) and machine learning (ML), particularly in grasping complex spatial relationships and detailed properties of objects in 3D spaces. While large foundational models such as CLIP [1] have made impressive progress in the 2D domain, their direct application to 3D scene understanding is not straightforward. Therefore, we aim to leverage existing large foundation models to understand 3D scenes and extract 3D information, avoiding building an entirely new 3D model from scratch.
The advancement of large foundation models has inspired recent studies to establish connections among images, text, and 3D data. The work in [2] extracts features from 3D objects with a 3D encoder and aligns them with features extracted by a visual encoder and a text encoder from corresponding rendered 2D images and text, enabling information retrieval of 3D objects from different modalities. However, the model mainly focuses on 3D objects and is limited in understanding complicated 3D spaces like 3D indoor scenes.
[3] Reconstructed compact 3D indoor scenes from multi-view images using DVGO and aligned them with semantic segmentation extracted from corresponding 2D multiview images using the CLIP-LSeg model. A visual reasoning pipeline is applied for reasoning tasks in 3D spaces.
Presently, we are going to develop a ML model that leverages the existing large foundation models to get a better understanding of 3D scenes with lower computational complexity.

Reference:
[1]Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. “Learning transferable visual models from natural language supervision,” In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
[2] Hegde, Deepti, Jeya Maria Jose Valanarasu, and Vishal [...]

By |April 7th, 2024|News|Comments Off on MCL Research on 3D Perception with Large Foundational Models|

MCL Research on Green Image Coding

Image coding is a fundamental multimedia technology. Popular image coding
techniques include hybrid and deep learning-based frameworks. Our green image
coding (GIC) aims to explore a new path to a low-complexity and high-efficiency
coding framework.
The main characteristics of the proposed GIC are multi-grid representation and vector
quantization (VQ). Natural images have rich frequency components and high energy.
Multi-grid representation uses resampling techniques to decompose the image into
multiple layers so that the energy and frequency components are distributed to
different layers. Then, we use vector quantization to handle each layer.
Based on the research on our previous GIC framework [1][2]. We identified two key
issues of the multi-grid representation + vector quantization framework. First, unlike
the traditional coding framework, vector quantization does not need transform.
Because transform decorrelates the input signals, while vector quantization needs
correlation exists among different dimensions of the input signals. Second, when
applying multi-grid representation in compression, the key is bitrate allocation, i.e.,
how to assign bits to different layers. For traditional frameworks, bitrate allocation or
rate control happens in different blocks or frames with a relative parallel relationship.
But for our multi-gid representation. The bitrate allocation is a sequential operation.
Because the bitrate of the current layer has a significant influence on the next. For this
issue, we use a slope matching technique to do the bitrate allocation. Specifically,
when the RD slope of the current layer decreases to the start slope of the next, we stop
coding the current layer and switch to the next.

[1] Wang Y, Mei Z, Zhou Q, et al. Green image codec: a lightweight learning-based
image coding method[C]//Applications of Digital Image Processing XLV. SPIE, 2022,
12226: 70-75.
[2] Wang Y, Mei Z, Katsavounidis [...]

By |March 31st, 2024|News|Comments Off on MCL Research on Green Image Coding|

MCL Research on Green Point Cloud Surface Reconstruction

Surface reconstruction from point cloud scans plays a pivotal role in 3D vision and graphics, finding diverse applications in areas such as AR/VR games, cultural heritage preservation, and building information modeling (BIM). This task is inherently challenging due to the ill-posed nature of reconstructing continuous surfaces from discrete points. Moreover, real-world point cloud scans introduce quite a few obstacles such as varying densities and sensor noise. These properties make the problem a long standing one, which keeps driving researchers to look for more effective solutions.
Early research focused on combinatorial methods [1]–[2], which inferred the connectivity between points directly. The mainstream of surface reconstruction adopts an implicit surface approach [3]–[4]. That is, the surface is represented as an unknown continuous function which is solved by the associated partial differential equations (PDEs). Although these methods offer good quality, they often require oriented normals or additional constraints. Recently, people develop deep learning (DL) models [5]–[6] to solve this problem based on a supervised learning framework. DL methods exploits the training data to learn an implicit function representation.
Despite their high reconstruction quality, the generalizability and complexity remain to be challenges for DL models. In scenarios such as point cloud compression, quality assessment, and dynamic point cloud processing, there is a growing need for low-complexity, low-latency surface reconstruction methods. However, existing PDE and DL-based methods tend to sacrifice simplicity for high reconstruction quality, leaving a gap for low-complexity solutions.

We adopt the unsupervised framework, propose a lightweight high-performance method, and name it green point-cloud surface reconstruction (GPSR). It can be categorized to the family of implicit surface reconstruction methods. The main idea lies in building a signed distance field (SDF) through approximated heat diffusion and fine tuning it iteratively. [...]

By |March 24th, 2024|News|Comments Off on MCL Research on Green Point Cloud Surface Reconstruction|

MCL Research on POS Tagging Prediction

Part of speech (POS) tagging is one of the basic sequence labeling tasks. It aims to tag every word of a sentence with its part-of-speech attribute. As POS offers a fundamental syntactic attribute of words, POS tagging is useful for many downstream tasks, such as speech recognition, syntactic parsing, and machine translation. POS tagging is a crucial preliminary step in building interpretable NLP models. POS tagging has been successfully solved with complex sequence-to-sequence models based on deep learning (DL) technology, such as LSTM and Transformers. Additionally, considering recent advancements in Large Language Models (LLMs), LLMs possess the capability to perform the POS tagging task as versatile models. However, DL models demand higher computational and storage costs. Notably, the POS tagging task itself doesn’t inherently require such elevated computational and storage costs. There is a need for lightweight high-performance POS taggers to offer efficiency while ensuring efficacy for downstream tasks. 

We propose a novel word-embedding-based POS tagger and name it GWPT to meet this demand. Following the green learning (GL) methodology (Kuo & Madni, 2022), GWPT contains three cascaded modules: 1) representation learning, 2) feature learning, and 3) decision learning. The last two modules of GWPT adopt the standard procedures, i.e., the discriminant feature test (DFT) (Yang et al.,2022) for feature selection and the XGBoost classifier in making POS prediction. The main novelty of this work lies in the representation learning module of GWPT. GWPT derives the representation of a word based on its embedding. Both non-contextual embeddings and contextual embeddings can be used. GWPT partitions dimension indices into low-, medium-, and high-frequency three sets. It discards dimension indices in the low-frequency set and considers the N-gram representation for dimension indices in the medium- and high-frequency [...]

By |March 17th, 2024|News|Comments Off on MCL Research on POS Tagging Prediction|

MCL Research on Saliency Detection Method

Saliency detection research predominantly falls within two categories: human eye fixation prediction, which involves the prediction of human gaze locations on images where attention is most concentrated [1], and salient object detection (SOD) [2], which aims to identify salient object regions within an image. Our study specifically focuses on the former saliency prediction, which predicts human gaze from visual stimuli.

 

Saliency detection constitutes a crucial task in predicting human gaze patterns from visual stimuli. The escalating demand for research in saliency detection is driven by the growing necessity to incorporate such techniques into various computer vision tasks and to understand human visual system. Many existing saliency detection methodologies rely on deep neural networks (DNNs) to achieve good performance. However, the extensive model sizes associated with these approaches impede their integration with other modules or deployment on mobile devices.

 

To address this need, our study introduces a novel saliency detection method, named “GreenSaliency”, which eschews the use of DNNs while emphasizing a small model footprint and low computational complexity. GreenSaliency comprises two primary steps: 1) multi-layer hybrid feature extraction, and 2) multi-path saliency prediction. Empirical findings demonstrate that GreenSaliency achieves performance levels comparable to certain deep-learning-based (DL-based) methods, while necessitating a considerably smaller model size and significantly reduced computational complexity.

 

 

 

[1] Zhaohui Che, Ali Borji, Guangtao Zhai, Xiongkuo Min, Guodong Guo, and Patrick Le Callet, “How is gaze influenced by image transformations? dataset and model”, IEEE Transactions on Image Processing, 29, 2287–300.

 

[2] Dingwen Zhang, Junwei Han, Yu Zhang, and Dong Xu, “Synthesizing supervision for learning deep saliency network without human annotation”, IEEE transactions on pattern analysis and machine intelligence, 42(7), 1755–69.

By |March 10th, 2024|News|Comments Off on MCL Research on Saliency Detection Method|

Welcome New MCL Member Xuechun Hua

We are so happy to welcome a new MCL member, Xuechun Hua joining MCL this semester. Here is a quick interview with Xuechun:

1. Could you briefly introduce yourself and your research interests?

My name is Xuechun Hua, I am a second year graduate student at Viterbi pursuing a Computer Science master degree. I finished my undergraduate study in Nanjing University(NJU) and focused on game theory. I developed my research interest in interpretable machine learning and computer vision.

What is your impression about MCL and USC?

USC has offered me an exceptional environment that fosters both my personal and academic development. The atmosphere at MCL is one of serious academic pursuit combined with a strong sense of collaboration. I am deeply thankful for the extensive support I’ve received from senior members of MCL. Professor Kuo stands out as an endlessly energetic researcher, who greatly inspires us with his guidance and enthusiasm for exploration.

3. What is your future expectation and plan in MCL?

I will be collaborating with Qingyang on a project focused on mesh reconstruction from point clouds, under the mentorship of Professor Kuo. Moving forward, I aim to enhance my research skills within MCL and contribute to advancements in 3D representation and greening learning.

By |March 3rd, 2024|News|Comments Off on Welcome New MCL Member Xuechun Hua|

MCL Research on Transfer Learning

Transfer learning is an approach to extract features from source domain and transfer the knowledge of source domain to the target domain, so as to improve the performance of target learners while relieving the pressure to collect a lot of target-domain data[1]. It has been put into wide applications, such as image classification and text classification.

Domain adaptation on digital number datasets, namely MNIST, USPS and SVHN, is the task of transferring learning models across different data sets, aiming to conduct cross-domain feature alignment and build a generalized model that is able to predict labels among various digital number datasets.
Presently, we have trained a green-learning based transfer learning model between MNIST and USPS. The first step is preprocessing and feature extraction, including feature processing for different dataset in order to make them visually similar, raw Saab feature extraction [2] and LNT feature transformation [3] followed by cosine similarity check to find discriminant features. The second step is joint subspace generation, in which for each label in source domain, k-means with number of clusters as 1, 2, 4, 8 are performed separately in order to generate 10, 20, 40 and 80 subspaces, then assign target features to generated subspaces. The third step is to utilize assigned datas to conduct weakly supervised learning to predict label
for rest target data samples. Our goal is to compare and analyze the performance of our green-learning based transfer learning models with other models. In future, we aim to conduct transfer learning among these three digital number datasets mutually and improve the accuracy by improving cross-domain feature alignment.

[1] Zhuang, Fuzhen, et al. “A comprehensive survey on transfer learning.” Proceedings of the IEEE 109.1 (2020): 43-76.
[2] Y. Chen, M. Rouhsedaghat, [...]

By |February 25th, 2024|News|Comments Off on MCL Research on Transfer Learning|

MCL Research on Image Demosaicing

In the world of digital images, turning raw sensor data into colorful pictures that we can see is a complex process. Demosaicing, a crucial step in this process, helps convert data from sensors into full-color images. It all started with Bayer color filter arrays, named after Bryce Bayer, which are grids of tiny sensors, each covered by a color filter—usually red, green, or blue.

But making this conversion isn’t easy. Real-world challenges like sensor noise and blurry motion can mess up the final image. And getting accurate data for training computers to do this job is time-consuming.

Because processing images can be slow, especially on devices like phones, we’re looking into simpler methods that still give good results. Recently we’re experimenting with a new regressor named “LQBoost”

LQBoost operates by conducting least-square regressions in successive iterations, gradually narrowing down the gap between its predictions and the actual targets. By focusing on minimizing the residuals—differences between predicted and actual values—LQBoost iteratively enhances its accuracy. Additionally, it employs a thresholding process to prune samples with residuals approaching zero, streamlining the regression process.

Taking LQBoost a step further, we integrate Local Neighborhood Transformation (LNT) to enrich the feature set and capture intricate data structures more effectively. This integration allows for a more nuanced understanding of the data, leading to improved predictions.

Before applying LQBoost to our demosaicing task, we perform a crucial preprocessing step. By clustering the dataset and utilizing cluster purity, we initialize the regression process effectively. This step ensures that each cluster receives an accurate initial prediction, setting the stage for LQBoost to refine these predictions through iterative regression.

Our goal is to create a demosaicing model that’s both accurate and fast. We’ve tested it thoroughly using standard image datasets, making [...]

By |February 18th, 2024|News|Comments Off on MCL Research on Image Demosaicing|