USC Media Communications Lab

Permalink Gallery
MCL Research on Video-Text Retrieval

MCL Research on Video-Text Retrieval

Image-text retrieval is a fundamental task in image understanding. This task aims to retrieve the most relevant information from another modality based on the given image or text. Recent approaches focus on training large neural networks to bridge the gap between visual and textual domains. However, these models are computationally expensive and not explainable regarding how the data from different modalities are aligned. End-to-end optimized models, such as large neural networks, can only output the final results, making it difficult for humans to understand the reasoning behind the model’s predictions.

Hence, we propose a green learning solution, Green Multi-Modal Alignment (GMA), for computational efficiency and mathematical transparency. We reduce trainable parameters to 3% compared to fine-tuning the whole image and text encoders. The model is composed of three modules, including (1) Clustering, (2) Feature Selection, and (3) Alignment. The clustering process divides the whole dataset into subsets by choosing similar image and text pairs, reducing the training sample’s divergence. The second module, feature selection, reduces the feature dimension and mitigates the computational requirements. The importance of each feature can be interpreted as statistical evidence supporting our reasoning. The alignment is conducted by linear projection, which guarantees the inverse projection in both direction retrievals, namely image-to-text and tex-to-image retrievals.

Experimental results show that our model can outperform the SOTA retrieval models in text-to-image and image-to-text retrieval on the Flick30k and MS-COCO datasets. Besides, our alignment process can incorporate visual and text encoder models trained separately and generalize well to unseen image-text pairs.

By Mahtab Movahhedrad|March 30th, 2025|News|Comments Off|

Permalink Gallery
MCL Research on Enhanced Object Detection

MCL Research on Enhanced Object Detection

Enhancing image feature extraction to boost image classification accuracy has been a significant research focus at the MCL lab. Initially, PixelHop++ was developed to efficiently extract image features and perform accurate image classification. Subsequently, the Least-Squares Normal Transform (LNT) was introduced to further enhance image features, improving classification results with PixelHop++ on standard image databases such as MNIST and FMNIST. Despite achieving commendable performance, further refinements remain desirable to push accuracy limits even higher.

To address this, we propose a novel pipeline of four distinct experimental setups involving different pooling strategies—absolute maximum pooling and variance pooling—at hops 1 and 2. We extract LNT features specifically from hops 1, 3, and 4 for each experiment. At hop-1, pooling (either max or variance) generates 10 LNT features per channel, resulting in a total of 250 features. Hop-3 involves transforming the (N, 3, 3, Feature) tensor to produce 90 LNT features. From hop-4, 10 additional LNT features are acquired following a DFT-based feature selection. These 350 LNT features from hops 1, 3, and 4 are concatenated alongside selected hop-4 features. Finally, features aggregated from all four experimental setups are combined, and a 10-class classifier is trained on these comprehensive feature sets, demonstrating an improvement in classification performance.

By Mahtab Movahhedrad|March 23rd, 2025|News|Comments Off|

Permalink Gallery
MCL Research on Video Camouflaged Object Detection (VCOD)

MCL Research on Video Camouflaged Object Detection (VCOD)

Video Camouflaged Object Detection (VCOD) focuses on identifying and segmenting objectsconcealed within the background scenes. These camouflaged objects closely resemble theirsurroundings by mimicking similar color patterns and textures, which poses significant challengescompared to conventional detection tasks.To address this problem, we have proposed a motion-enhanced approach that progressivelyrefines the detection results with multi-resolution search and motion-guided boosting. The videoframe is first screened under image level, and the inter-frame motions and background models are then corrected, considering all the video sequences. This method provides stable performance under popular VCOD datasets.

By Mahtab Movahhedrad|March 16th, 2025|News|Comments Off|

Permalink Gallery
MCL Research on Image Dehazing

MCL Research on Image Dehazing

Image dehazing plays a crucial role in digital imaging by removing atmospheric distortions such as haze and fog, thereby enhancing scene clarity for applications ranging from photography to autonomous driving. Traditionally, methods like the Dark Channel Prior (DCP) have been used to estimate haze effects, leveraging the observation that most non-hazy images contain very dark pixels in at least one color channel.

In a significant advancement, researchers have now introduced a novel approach that combines the strength of DCP with the efficiency of the GUSL pipeline. In this new method, DCP serves as the foundational technique to provide an initial estimate of the haze, while the GUSL pipeline is employed to predict and correct the residual errors left by the DCP. This two-step process refines the dehazing process by capturing subtle details that DCP alone might miss.

The GUSL pipeline utilizes unsupervised representation learning for robust feature extraction, followed by supervised feature learning to enhance computational efficiency and output quality. This approach not only improves the overall dehazing performance but also maintains a lightweight design suitable for real-time applications on resource-constrained devices.

By integrating DCP with residue prediction through GUSL, the new method delivers superior image clarity with reduced computational overhead, making it an attractive solution for modern imaging challenges in mobile and edge computing environments.

By Mahtab Movahhedrad|March 9th, 2025|News|Comments Off|

Permalink Gallery
Reunion of MCL Alumni at Southern California

Reunion of MCL Alumni at Southern California

Professor C.-C. Jay Kuo and MCL alumni at Southern California had a reunion at theCapital Seafood Irvine Spectrum on 1 March (Saturday) for lunch. Victor Liangorganized the event. The attendees included Professor Kuo, Victor Liang, Kyle Lai, JingZhang, Joe Lin, and Chung-Ting Huang’s family. They had a wonderful time sharingtheir recent activities.

By Mahtab Movahhedrad|March 2nd, 2025|News|Comments Off|

Permalink Gallery
MCL Research on Image Demosaicing

MCL Research on Image Demosaicing

Demosaicing is a critical process in digital imaging. Since each pixel on a typical sensor captures only one color channel, red, green, or blue, the complete color image must be reconstructed from incomplete data. Conventional approaches, including deep learning models, have made impressive strides in quality, yet their significant computational requirements often limit their deployment on resource-constrained edge devices.Mahtab Movahhedrad of the MCL Lab has introduced an innovative approach to digital imaging that promises to transform how devices reconstruct full-color images from partial sensor data. The new method, dubbed green U-shaped image demosaicing (GUSID), leverages green learning (GL) principles to offer a lightweight, transparent, and efficient alternative to traditional, computationally heavy deep learning techniques. GUSID takes a distinct path. Instead of relying on deep neural networks, it uses unsupervised representation learning for robust feature extraction, followed by supervised feature learning to enhance computational efficiency and maintain high-quality output. This dual-stage process allows GUSID to minimize computational overhead while delivering competitive accuracy. Its compact design and support for parallelized training make it particularly well-suited for real-time vision applications on devices with limited processing capabilities. As digital imaging continues to evolve, breakthroughs like GUSID not only enhance performance but also pave the way for future innovations in edge computing and real-time processing. The MCL Lab is poised to lead this exciting frontier, proving that sometimes, a smarter, leaner approach can make all the difference.

By Mahtab Movahhedrad|February 23rd, 2025|News|Comments Off|

Permalink Gallery
MCL Research on Transfer Learning

MCL Research on Transfer Learning

Transfer learning aims to reduce the size of the labeled training samples by leveraging existing knowledge from one domain, called the source domain, and using the learned

knowledge to construct models for another domain, called the target domain. In particular, unsupervised domain adaptation (UDA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.Most existing UDA methods rely on deep learning, primarily pre-trained models, adversarial networks, and transformers.

We propose an interpretable and lightweight transfer learning (ILTL) method. It consists of two modules. The first module deals with image-level alignment to ensure visually similar images across domains, which performs image processing to minimize structural differences between the source and target images.The second module focuses on feature-level alignment, which identifies the discriminant feature subspace, uses the feature distance to transfer source labels to target samples, and then conducts class-wise alignment in the feature subspace. ILTL can be performed in multiple rounds to enhance the alignment of source and target features. We benchmark ILTL and deep-learning-based methods in classification accuracy, model sizes, and computational complexity in two transfer learning datasets. Experiments show that ILTL can achieve similar accuracy with smaller model sizes and lower computational complexity, while its interpretability provides a deeper understanding of the transfer learning mechanism.

By Mahtab Movahhedrad|February 16th, 2025|News|Comments Off|

Permalink Gallery
MCL Research on EDA

MCL Research on EDA

The IR-drop (voltage drop) analysis in integrated circuits is essential as the circuit designs become more complex and compact. Due to the complicated and tiny designs, the power delivery network (PDN) cannot deliver a target voltage to each cell, causing reliability issues and performance drops. Therefore, IR-drop estimation is a crucial step to ensure the functionality of the circuits.

Traditionally, IR-drop estimation relies on solving linear equations based on Kirchhoff’s current and voltage laws. Nevertheless, as the designs become more complex, the computational cost and simulation time increase significantly. In this research, an energy-efficient and high-performance static IR-drop estimation called GIRD (Green IR-Drop) is proposed. GIRD processes the IC design input in three steps. First, the input netlist data are converted to multi-channel maps. Their joint spatial-spectral representations are determined with PixelHop. Next, discriminant features are selected using the relevant feature test (RFT). Finally, the selected features are fed to the XGBoost (eXtreme Gradient Boosting trees) regressor. Both PixelHop and RFT are green learning tools. GIRD yields a low carbon footprint due to its smaller model sizes and lower computational complexity. Besides, its performance scales well with small training datasets. Experiments on synthetic and real circuits are given to demonstrate the superior performance of GIRD. The model size and the complexity, measured by the Floating Point Operations (FLOPs) of GIRD, are only 1/1000 and 1/100 of deep-learning methods, respectively.

By Mahtab Movahhedrad|February 9th, 2025|News|Comments Off|

Permalink Gallery
Professor Kuo Gave a Keynote at AIxMM 2025

Professor Kuo Gave a Keynote at AIxMM 2025

Professor C.-C. Jay Kuo, Director of MCL, was invited to give a keynote at the AIxMMconference held in Laguna Hills, California, USA, on February 4 (Tuesday). The title ofProfessor Kuo’s keynote is “Mobile/Edge Visual Analytics via Green AI.” The abstract ofis given below.“Mobile/edge visual analytics will prevail in the modern AI era. Most researchers focuson deep-learning-based model compression to achieve this goal. Model compressioncan reduce the model size by 50-80% with slight performance degradation. Modelcompression relies on an existing larger model. The training cost of such a large modelremains. The compression step also demands resources. I have worked on green AIsince 2014, published many papers on this topic, and coined this emerging field “greenlearning.” Green learning demands low power consumption in both training andinference. It has attractive characteristics, such as small model sizes, fewer trainingsamples, mathematical transparency, ease of incremental learning, etc. It can reducethe model size of its deep-learning counterpart by 95-99%. The training can beconducted from scratch. The resulting model is inherently smaller. It is ideal for mobileand edge devices. Green learning relies on signal-processing disciplines such as filterbanks, linear algebra, subspace learning, probability theory, etc. Although it exploitsoptimization, it avoids end-to-end system optimization, a non-convex optimizationproblem. Instead, it adopts modularized optimization, and each optimization problemcan be cast as convex optimization. In this example, I will use several examples todemonstrate the advantages of green learning in visual analytics for mobile/edgedevices.”Professor Kuo received quite a few questions immediately after the talk. He alsoparticipated a panel discussion in the afternoon, 1-2:30 pm.

By Mahtab Movahhedrad|February 2nd, 2025|News|Comments Off|

Permalink Gallery
Welcome New MCL Member Cynthia Huang

Welcome New MCL Member Cynthia Huang

We are so happy to welcome a new MCL member, Cynthia Huang joining MCL this semester. Here is a quick interview with Cynthia:

1. Could you briefly introduce yourself and your research interests?

I am Cynthia Huang, a junior undergraduate student majoring in Computer Engineering and Computer Science at USC. My research interests include machine learning, AutoML, and computer vision. Previously, I have worked on AutoML, specifically on simultaneous optimization of neural network architecture and weights. I hope to continue deepening my exploration of these fields in the future.

2. What is your impression about MCL and USC?

My impression of MCL is that it is a collaborative and inspiring group, where many talented individuals come together with exciting ideas. I truly appreciate the team spirit and supportive environment, where everyone is open to sharing insights and learning from one another. I find great joy in engaging in research discussions, exchanging perspectives, and brainstorming solutions to challenging problems in MCL. I am very excited about the opportunity to collaborate with everyone and look forward to contributing to meaningful research in the future!

3. What are your future expectations and plans in MCL?

Currently, I am working on semantic segmentation using green learning. In the future, I look forward to continuing research discussions and collaborations with everyone at MCL. I hope to further explore new ideas, exchange knowledge, and contribute to exciting research. Additionally, I am excited to connect with more members of the MCL community.

By Mahtab Movahhedrad|January 26th, 2025|News|Comments Off|

Previous 1 234 5 Next

News