News

  • Permalink Gallery

    Congratulations on MCL Members Attending Ph.D. Hooding Ceremony

Congratulations on MCL Members Attending Ph.D. Hooding Ceremony

Three MCL members attended the Viterbi PhD hooding ceremony on Wednesday, May 8, 2024, in the Bovard Auditorium. They were Zhanxuan Mei, Chengwei Wei and Wei Wang. Congratulations to them for their accomplishments in completing their PhD program at USC!

Zhanxuan Mei received his Bachelor’s degree in Electrical Engineering from Beijing Institute of Technology, China, in June 2018. He joined the Media Communication Lab in the summer of 2020. His research interests include image processing and video processing.

Chengwei Wei received his bachelor’s degree at Central South University, China in Jun 2018. He joined the Media Communications Lab in the summer of 2019. His research interests include signal processing, natural language processing, and machine learning. His thesis is titled “Syntax-aware Natural Language Processing Techniques and Their Applications”. 

Wei Wang received her bachelors in Applied Physics from Northeastern University (CN), and her MS degree in Materials Engineering from the University of Southern California in 2014 and 2016, respectively.. Her research interests include deep learning and image processing.

Congratulations to them all! Let us wish them all the best in the future!

By |May 12th, 2024|News|Comments Off on Congratulations on MCL Members Attending Ph.D. Hooding Ceremony|

Welcome New MCL Member Dingyi Nie

We are so happy to welcome a new MCL member, Dingyi Nie joining MCL this semester. Here is a quick interview with Dingyi:

1. Could you briefly introduce yourself and your research interests?

My name is Dingyi Nie. I am a current Master of Science student in Computer Science at USC. I am joining MCL as a research intern starting from April 2024. My research interests mainly include digital signal processing and machine learning, particularly real-world AI. In my spare time, I enjoy music and sports. I’m a keyboard and drum player, and I play soccer and volleyball.

2. What is your impression about MCL and USC?

I got to know Professor Jay Kuo and his MCL from my professor who teaches multimedia systems. I personally have a long-standing interest in multimedia, DSP and machine learning. I find MCL’s focus on Green Learning particularly fascinating because it seamlessly integrates these areas. I am very excited to explore its potential as an emerging tool. I love LA as a city of culture and diversity, and I feel that USC is a good reflection of it. I am excited to interact with all the creative people here.

3. What are your future expectations and plans in MCL?

Starting from the fall semester, I will be working with Yixing Wu on a project exploring Green Learning solutions for irregularly sampled time series modeling. I look forward to building connections with all the members in the lab.

By |May 5th, 2024|News|Comments Off on Welcome New MCL Member Dingyi Nie|

MCL Research on Parsing Tree Construction

Syntactic parsing is a natural language processing technique used to analyze the grammatical structure of a sentence. There are typically two syntactic parsings, dependency parsing and constituency parsing. Fig. 1 shows the parse trees corresponding to dependency parsing and constituency parsing, respectively. Dependency parsing identifies the dependency relationships between the words in a sentence and creates a directed graph representing these dependency relationships. In dependency parsing, each word in the sentence is represented as a node in the graph, and the dependency relationships between the words are represented as edges. The edges are labeled with the type of dependency relationship between the words, such as subject, object, or modifier. The resulting graph is called a dependency tree or a dependency graph. Constituency parsing is the process of analyzing a sentence to identify its syntactic structure and hierarchical organization based on the grammatical rules of a language. In constituency parsing, a sentence is divided into a hierarchy of phrases, each of which has a specific grammatical structure and serves a particular function within the sentence. These phrases are called constituents, and they can include nouns, verbs, adjectives, prepositions, and other parts of speech.

In this project, we aim to propose a simple but effective constituency parsing construction method. The constituency parse tree is first converted to the binary tree where an example is shown in Fig. 2. The core idea behind the method is that once we know the interval height between adjacent words, the binarized constituency parse tree can be constructed [1]. Instead of directly predicting the height, necessitates a complex model for concise prediction, presently, we have trained a binary classifier to compare the height of the intervals pairwisely. Then the height of an [...]

By |April 28th, 2024|News|Comments Off on MCL Research on Parsing Tree Construction|

MCL Research on Video Camouflaged Object

Camouflage object detection (COD)is a challenging task that aims to identify targets “seamlessly” concealed within their surrounding environment, presenting a more challenging task compared to traditional object detection[1]. While under the Video camouflage object detection (VCOD), the intrinsic high variations and increased complexity in the scene poses new obstacles for the detection with videos. We proposed a green method, termed GreenCOD, that leverages gradient boosting and deep features extracted from pre-trained Deep Neural Networks (DNNs), efficiently detects the camouflage objects without back-propagation. In this quarter, based on the GreenCOD model, we further move on to explore how to deal with object detection in camouflage videos in a light-weighted, explainable way.

Inspired by the GreenCOD pipeline, our architecture integrates the EfficientNetB4 backbone in the feature extraction module for each frame. Initially, the input frames are first reshaped into a standard size of 672x672x3 and the processed through the 8-block EfficientNetB4 backbone for feature extraction under different size of reception fields. To well consider the information across different reception spatial sizes, all the features are resized and concatenated to form a rich set of features. A hierarchical architecture is implemented in decision learning module.

Then XGBoost is trained based on the initial prediction maps and temporal information. The short term temporal information is considered by extracting the motion among consecutive frames. The motion flow maps are extracted at a higher resolution, then followed by the utilization of neighborhood reconstruction. This approach ensures that each prediction location takes into account the information from a corresponding 4×4 window in the motion map. The initial result show a satisfying result on VCOD problems.

[1] Fan, Deng-Ping, et al. “Camouflaged object detection.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. [...]

By |April 21st, 2024|News|Comments Off on MCL Research on Video Camouflaged Object|
  • Permalink Gallery

    MCL Research on Green Learning for Electronic Design Automation (EDA)

MCL Research on Green Learning for Electronic Design Automation (EDA)

Recently, machine learning and AI have been applied to several electronic design automation (EDA) tasks [1], such as performance prediction, decision-making for designs, and automated design. The data-driven optimization processes provide an alternative approximation solution for NP-complete problems in EDA. However, there are still several challenges to applying machine learning algorithms in EDA problems. First, due to the protection of intellectual property (IP), it is difficult to access huge amounts of public datasets as training data. The deep learning framework relies on pre-trained models and fine-tuning techniques on small datasets. However, this approach demands high computational resources and large model size. Second, the end-to-end optimization in deep learning is viewed as a black box that lacks interpretability for making decisions in hardware designs. As a result, we aim to develop a green learning algorithm to mitigate the high demand for large amounts of training data with explainable results for EDA problems.

 

Currently, we propose a green learning architecture to address the IR-drop prediction problem. We parse netlist files into 2D format and extract features by our green learning framework automatically. Then, we select discriminative features as the input of XGboost to regress the IR-drop value. We aim to estimate the IR-drop value accurately while keeping a small model size and Flops number in an energy-efficient way.

 

Reference

[1] Huang, Guyue, et al. “Machine learning for electronic design automation: A survey.” ACM Transactions on Design Automation of Electronic Systems (TODAES) 26.5 (2021): 1-46.

By |April 14th, 2024|News|Comments Off on MCL Research on Green Learning for Electronic Design Automation (EDA)|
  • Permalink Gallery

    MCL Research on 3D Perception with Large Foundational Models

MCL Research on 3D Perception with Large Foundational Models

Understanding and retrieving information in 3D scenes poses a significant challenge in artificial intelligence (AI) and machine learning (ML), particularly in grasping complex spatial relationships and detailed properties of objects in 3D spaces. While large foundational models such as CLIP [1] have made impressive progress in the 2D domain, their direct application to 3D scene understanding is not straightforward. Therefore, we aim to leverage existing large foundation models to understand 3D scenes and extract 3D information, avoiding building an entirely new 3D model from scratch.
The advancement of large foundation models has inspired recent studies to establish connections among images, text, and 3D data. The work in [2] extracts features from 3D objects with a 3D encoder and aligns them with features extracted by a visual encoder and a text encoder from corresponding rendered 2D images and text, enabling information retrieval of 3D objects from different modalities. However, the model mainly focuses on 3D objects and is limited in understanding complicated 3D spaces like 3D indoor scenes.
[3] Reconstructed compact 3D indoor scenes from multi-view images using DVGO and aligned them with semantic segmentation extracted from corresponding 2D multiview images using the CLIP-LSeg model. A visual reasoning pipeline is applied for reasoning tasks in 3D spaces.
Presently, we are going to develop a ML model that leverages the existing large foundation models to get a better understanding of 3D scenes with lower computational complexity.

Reference:
[1]Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. “Learning transferable visual models from natural language supervision,” In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
[2] Hegde, Deepti, Jeya Maria Jose Valanarasu, and Vishal [...]

By |April 7th, 2024|News|Comments Off on MCL Research on 3D Perception with Large Foundational Models|

MCL Research on Green Image Coding

Image coding is a fundamental multimedia technology. Popular image coding
techniques include hybrid and deep learning-based frameworks. Our green image
coding (GIC) aims to explore a new path to a low-complexity and high-efficiency
coding framework.
The main characteristics of the proposed GIC are multi-grid representation and vector
quantization (VQ). Natural images have rich frequency components and high energy.
Multi-grid representation uses resampling techniques to decompose the image into
multiple layers so that the energy and frequency components are distributed to
different layers. Then, we use vector quantization to handle each layer.
Based on the research on our previous GIC framework [1][2]. We identified two key
issues of the multi-grid representation + vector quantization framework. First, unlike
the traditional coding framework, vector quantization does not need transform.
Because transform decorrelates the input signals, while vector quantization needs
correlation exists among different dimensions of the input signals. Second, when
applying multi-grid representation in compression, the key is bitrate allocation, i.e.,
how to assign bits to different layers. For traditional frameworks, bitrate allocation or
rate control happens in different blocks or frames with a relative parallel relationship.
But for our multi-gid representation. The bitrate allocation is a sequential operation.
Because the bitrate of the current layer has a significant influence on the next. For this
issue, we use a slope matching technique to do the bitrate allocation. Specifically,
when the RD slope of the current layer decreases to the start slope of the next, we stop
coding the current layer and switch to the next.

[1] Wang Y, Mei Z, Zhou Q, et al. Green image codec: a lightweight learning-based
image coding method[C]//Applications of Digital Image Processing XLV. SPIE, 2022,
12226: 70-75.
[2] Wang Y, Mei Z, Katsavounidis [...]

By |March 31st, 2024|News|Comments Off on MCL Research on Green Image Coding|

MCL Research on Green Point Cloud Surface Reconstruction

Surface reconstruction from point cloud scans plays a pivotal role in 3D vision and graphics, finding diverse applications in areas such as AR/VR games, cultural heritage preservation, and building information modeling (BIM). This task is inherently challenging due to the ill-posed nature of reconstructing continuous surfaces from discrete points. Moreover, real-world point cloud scans introduce quite a few obstacles such as varying densities and sensor noise. These properties make the problem a long standing one, which keeps driving researchers to look for more effective solutions.
Early research focused on combinatorial methods [1]–[2], which inferred the connectivity between points directly. The mainstream of surface reconstruction adopts an implicit surface approach [3]–[4]. That is, the surface is represented as an unknown continuous function which is solved by the associated partial differential equations (PDEs). Although these methods offer good quality, they often require oriented normals or additional constraints. Recently, people develop deep learning (DL) models [5]–[6] to solve this problem based on a supervised learning framework. DL methods exploits the training data to learn an implicit function representation.
Despite their high reconstruction quality, the generalizability and complexity remain to be challenges for DL models. In scenarios such as point cloud compression, quality assessment, and dynamic point cloud processing, there is a growing need for low-complexity, low-latency surface reconstruction methods. However, existing PDE and DL-based methods tend to sacrifice simplicity for high reconstruction quality, leaving a gap for low-complexity solutions.

We adopt the unsupervised framework, propose a lightweight high-performance method, and name it green point-cloud surface reconstruction (GPSR). It can be categorized to the family of implicit surface reconstruction methods. The main idea lies in building a signed distance field (SDF) through approximated heat diffusion and fine tuning it iteratively. [...]

By |March 24th, 2024|News|Comments Off on MCL Research on Green Point Cloud Surface Reconstruction|

MCL Research on POS Tagging Prediction

Part of speech (POS) tagging is one of the basic sequence labeling tasks. It aims to tag every word of a sentence with its part-of-speech attribute. As POS offers a fundamental syntactic attribute of words, POS tagging is useful for many downstream tasks, such as speech recognition, syntactic parsing, and machine translation. POS tagging is a crucial preliminary step in building interpretable NLP models. POS tagging has been successfully solved with complex sequence-to-sequence models based on deep learning (DL) technology, such as LSTM and Transformers. Additionally, considering recent advancements in Large Language Models (LLMs), LLMs possess the capability to perform the POS tagging task as versatile models. However, DL models demand higher computational and storage costs. Notably, the POS tagging task itself doesn’t inherently require such elevated computational and storage costs. There is a need for lightweight high-performance POS taggers to offer efficiency while ensuring efficacy for downstream tasks. 

We propose a novel word-embedding-based POS tagger and name it GWPT to meet this demand. Following the green learning (GL) methodology (Kuo & Madni, 2022), GWPT contains three cascaded modules: 1) representation learning, 2) feature learning, and 3) decision learning. The last two modules of GWPT adopt the standard procedures, i.e., the discriminant feature test (DFT) (Yang et al.,2022) for feature selection and the XGBoost classifier in making POS prediction. The main novelty of this work lies in the representation learning module of GWPT. GWPT derives the representation of a word based on its embedding. Both non-contextual embeddings and contextual embeddings can be used. GWPT partitions dimension indices into low-, medium-, and high-frequency three sets. It discards dimension indices in the low-frequency set and considers the N-gram representation for dimension indices in the medium- and high-frequency [...]

By |March 17th, 2024|News|Comments Off on MCL Research on POS Tagging Prediction|

MCL Research on Saliency Detection Method

Saliency detection research predominantly falls within two categories: human eye fixation prediction, which involves the prediction of human gaze locations on images where attention is most concentrated [1], and salient object detection (SOD) [2], which aims to identify salient object regions within an image. Our study specifically focuses on the former saliency prediction, which predicts human gaze from visual stimuli.

 

Saliency detection constitutes a crucial task in predicting human gaze patterns from visual stimuli. The escalating demand for research in saliency detection is driven by the growing necessity to incorporate such techniques into various computer vision tasks and to understand human visual system. Many existing saliency detection methodologies rely on deep neural networks (DNNs) to achieve good performance. However, the extensive model sizes associated with these approaches impede their integration with other modules or deployment on mobile devices.

 

To address this need, our study introduces a novel saliency detection method, named “GreenSaliency”, which eschews the use of DNNs while emphasizing a small model footprint and low computational complexity. GreenSaliency comprises two primary steps: 1) multi-layer hybrid feature extraction, and 2) multi-path saliency prediction. Empirical findings demonstrate that GreenSaliency achieves performance levels comparable to certain deep-learning-based (DL-based) methods, while necessitating a considerably smaller model size and significantly reduced computational complexity.

 

 

 

[1] Zhaohui Che, Ali Borji, Guangtao Zhai, Xiongkuo Min, Guodong Guo, and Patrick Le Callet, “How is gaze influenced by image transformations? dataset and model”, IEEE Transactions on Image Processing, 29, 2287–300.

 

[2] Dingwen Zhang, Junwei Han, Yu Zhang, and Dong Xu, “Synthesizing supervision for learning deep saliency network without human annotation”, IEEE transactions on pattern analysis and machine intelligence, 42(7), 1755–69.

By |March 10th, 2024|News|Comments Off on MCL Research on Saliency Detection Method|