News

MCL Research on Green Image Coding

One of the key components of Green Image Coding (GIC) is multi-grid control, which enables efficient and scalable bit allocation across the framework’s hierarchical layers. Unlike traditional hybrid codecs designed for single-layer encoding, GIC decomposes images into multiple hierarchical layers via resampling, referred to as a multi-grid representation. This decomposition effectively redistributes energy and reduces intra-layer content diversity, but it also creates a complex high-dimensional optimization challenge when attempting to allocate bits optimally across these various layers.

To make this problem tractable, we establish a theoretical foundation by defining the relationship between local and global rate-distortion (RD) models. We demonstrate that the global RD model can be derived from the local RD model of an individual layer by applying specific offsets to both rate and distortion. Notably, the distortion offset is a constant value determined by up-sampling processes and is unrelated to the compression process itself. This theoretical breakthrough reduces an intractable high-dimensional problem into a set of manageable sequential decisions.

Based on these findings, GIC implements a practical slope-matching-based rate control strategy. This strategy allocates bits across multiple grids by matching the slopes of consecutive RD curves. A primary advantage of this design is its modularity; the rate control module only requires information from two consecutive layers to function. This allows the module to be easily duplicated for any number of layers in the encoder, effectively decomposing the global rate-distortion optimization into a sequence of local optimizations to ensure a scalable balance between bit rate and image distortion.

By |February 1st, 2026|News|Comments Off on MCL Research on Green Image Coding|

MCL Research on Feature Learning for Image Classification

Image classification is a central problem in computer vision and is most often solved using deep learning models. While these models achieve strong performance, they are typically large, complicated, and difficult to interpret. To address these limitations, we aim to explore an alternative paradigm: Green Learning, which focuses on building efficient and interpretable models.

One important direction of our work is introducing supervision into the feature extraction process. A key component of our approach is the LDA filter block, a feedforward mechanism that uses Linear Discriminant Analysis (LDA) to construct convolution filters without relying on backpropagation. These LDA filters align image patches with class-discriminative directions.

In addition, we propose spectral LNT, a new variant of the least-squares normal transform (LNT) that leverages the spatial–spectral structure of feature maps by applying localized linear combinations of features. The resulting LNT kernels can be naturally interpreted as LNT projections over local receptive fields. Moreover, we adopt a pyramid sparse-coding structure to extract sparse-coding features from a Gaussian pyramid of the input image. This further enriches the feature representation and leads to improved classification accuracy.

By |January 25th, 2026|News|Comments Off on MCL Research on Feature Learning for Image Classification|
  • Permalink Gallery

    Congratulations to Jiaxin Yang for passing his Qualifying Exam

Congratulations to Jiaxin Yang for passing his Qualifying Exam

Congratulations to Jiaxin Yang for passing his Qualifying Exam! His thesis proposal is titled “Green U-Shaped Learning for Medical Image Analysis: Methodologies and Applications.” His Qualifying Exam Committee members include Jay Kuo (Chair), Justin Haldar, Peter Beerel, Vasileios Magoulianitis, and Michael Khoo (Outside Member).

Artificial intelligence (AI) has rapidly transformed medical imaging by enabling more accurate, efficient, and personalized diagnosis and treatment. In particular, AI-driven medical image segmentation plays a critical role in clinical decision-making for diseases such as prostate cancer and renal cell carcinoma. However, most existing deep learning–based segmentation models rely heavily on backpropagation and large-scale computation, making them energy-intensive, difficult to interpret, and challenging to deploy in resource-constrained clinical settings.

To address these limitations, this work introduces Green U-shaped Learning (GUSL), a novel feed forward machine learning framework for 3D medical image segmentation without backpropagation. GUSL is designed to be efficient, interpretable, and environmentally sustainable, while maintaining competitive segmentation performance.

The proposed framework adopts cascaded multi-stage segmentation strategies tailored to different anatomical tasks. For prostate segmentation, a two-stage coarse-to-fine approach first localizes the prostate gland and then refines its boundaries, effectively mitigating severe class imbalance and anatomical variability, as shown in Figure 1. For kidney and kidney tumor segmentation, a progressive multi-stage cascade dynamically resizes and crops task-specific regions of interest, enabling the model to focus on anatomically relevant structures and improve segmentation accuracy, as shown in Figure 2.

Extensive experiments across multiple prostate and kidney datasets demonstrate that GUSL achieves state-of-the-art performance in prostate and kidney organ segmentation, and competitive results in kidney tumor and mass segmentation. Beyond accuracy, GUSL consistently shows substantial reductions in model size, computational cost (FLOPs), energy consumption, and carbon footprint, highlighting its advantages over conventional deep learning approaches.

These results position [...]

By |January 20th, 2026|News|Comments Off on Congratulations to Jiaxin Yang for passing his Qualifying Exam|
  • Permalink Gallery

    Congratulations to Jintang Xue for passing his Qualifying Exam

Congratulations to Jintang Xue for passing his Qualifying Exam

We congratulate Jintang Xue for passing his Qualifying Exam! His thesis proposal is titled “Towards Efficient and Interpretable Language Representations for Multimodal Reasoning Systems.” His Qualifying Exam Committee members include Jay Kuo (Chair), Antonio Ortega, Ashutosh Nayyar, Vasileios Magoulianitis, and Robin Jia (Outside Member). Here is a brief summary of his thesis proposal:

Modern multimodal reasoning systems increasingly rely on large-scale neural models to jointly process perception and language. While such models achieve strong performance, they often suffer from high computational cost and limited interpretability. We explore a representation-centric approach that explicitly designs language representations to improve both reasoning capability and efficiency.

The first part of the work focuses on multimodal 3D scene understanding. We introduce an object-centric framework that augments each object with a natural language description capturing both intrinsic attributes and inter-object relationships. These descriptions are integrated into large language model–based pipelines through a dual-level strategy, including embedding-level fusion and prompt-level injection. By explicitly encoding relational semantics in language, the proposed approach significantly enhances grounding, captioning, and question-answering performance in complex 3D scenes, particularly for relational reasoning tasks.

The second part of the work investigates efficient and interpretable language representations. We propose a weakly supervised feature selection framework for word embedding dimensionality reduction, which preserves semantic similarity while substantially reducing computational and storage costs. Unlike black-box compression methods, the proposed approach directly identifies the most informative embedding dimensions, improving both efficiency and interpretability.

Together, this work demonstrates that explicitly structured language representations can serve as a powerful and practical alternative to purely scale-driven modeling, enabling multimodal reasoning systems that are more efficient,  interpretable, and controllable.

By |January 15th, 2026|News|Comments Off on Congratulations to Jintang Xue for passing his Qualifying Exam|

Congratulations to Kevin Yang for Passing His Defense

Congratulations to Kevin Yang for passing his Defense! Kevin’s thesis is titled “Interpretable and Efficient Multi-Modal Data Interplay: Algorithms and Applications.” His Dissertation Committee includes Jay Kuo (Chair), Antonio Ortega, and Jesse Thomason (Outside Member). Here is a brief summary of his thesis:

This research addresses the fundamental trade-offs in multimodal learning between interpretability, data efficiency, and computational overhead. As large-scale vision-language models grow increasingly complex, their “black-box” nature and intensive resource requirements present significant barriers to practical deployment. This dissertation introduces four modular frameworks to transition these systems toward explainable, resource-efficient architectures.

First, the Efficient Human-Object Interaction (EHOI) detector decomposes complex interaction tasks into manageable sub-problems, providing transparent intermediate results to support interpretable decision-making. Second, Green Multimodal Alignment (GMA) enhances image-text retrieval by leveraging object detection and semantic clustering, allowing for precise regional interest mapping. Building on these principles, the third work introduces an optimized Video-Text Alignment (VTA) architecture that leverages contrastive learning and specialized data preprocessing to reduce computational costs during inference drastically. Finally, the Semantic and Visual Defect Detector (SVD-Det) bridges the gap between academic research and industrial application. By aligning features across modalities, SVD-Det achieves state-of-the-art performance in AI-generated video detection while maintaining a lightweight structure suitable for real-world use.

Ultimately, these contributions offer a sustainable roadmap for high-performing AI. By prioritizing modularity and transparency, this research establishes an efficient pipeline capable of processing complex, real-world data for both academic inquiry and industrial-scale deployment.

By |January 14th, 2026|News|Comments Off on Congratulations to Kevin Yang for Passing His Defense|

Congratulations to Haiyi Li for passing her Qualifying Exam

Congratulations to Haiyi Li for passing her Qualifying Exam! Her thesis proposal is titled “Interpretable and Lightweight Transfer Learning: Methodologies and Applications.” Her Qualifying Exam Committee members include Jay Kuo (Chair), Antonio Ortega, Anand Joshi, Justin Haldar, and Jernej Barbic (Outside Member). Here is a summary of her thesis proposal:

Transfer learning leverages knowledge from a labeled source domain to improve performance in an unlabeled or sparsely labeled target domain. Unsupervised domain adaptation (UDA) addresses this challenge by mitigating domain shift without target labels. Although deep learning–based UDA methods achieve strong performance, they typically require heavy computation, lack interpretability, and are prone to overfitting, limiting their practicality in resource-constrained settings.

This research proposes a sequence of green learning–oriented transfer learning frameworks that emphasize efficiency, generalizability, and interpretability without relying on deep neural networks. We first introduce Green Image Label Transfer (GILT), a lightweight and transparent statistical alignment framework that decomposes UDA into three interpretable phases: joint discriminant subspace learning, source-to-target label transfer, and supervised learning in the target domain. GILT demonstrates effective cross-domain label transfer with low computational cost and compact models.

Building on GILT, we propose Interpretable and Lightweight Transfer Learning (ILTL), which employs a two-stage cascaded alignment strategy combining image-level and feature-level alignment in a shared discriminative subspace. Through multi-round label transfer and class-wise refinement, ILTL achieves competitive accuracy while further improving interpretability and efficiency.

To capture relational structure among samples, we extend this paradigm to a Graph-based Label Transfer (GLT) framework. GLT integrates statistical alignment with adaptive label-wise graph learning and entropy-aware iterative label propagation using a non-parametric GraphHop mechanism. A multi-fold validation–driven entropy filtering strategy enables reliable pseudo-label selection, resulting in robust and transparent transfer under domain shift.

Finally, to address the computational bottleneck of decision learning [...]

By |January 13th, 2026|News|Comments Off on Congratulations to Haiyi Li for passing her Qualifying Exam|

MCL Research on Image Super-Resolution

Single Image Super-Resolution (SISR) is a classic problem in image processing that entails reconstructing a high-resolution (HR) image from a low-resolution (LR) input. Since the introduction of deep learning (DL) approaches, end-to-end methods have become the dominant paradigm in SISR. However, their performance gains typically come at the expense of increased model complexity and a substantial growth in parameter count.

To address these computational and interpretability challenges, we propose Green U-Shaped Learning (GUSL). This method departs from the black-box nature of deep neural networks by establishing a transparent, multi-stage pipeline. The framework utilizes residual correction across multiple resolution levels, mimicking a U-shaped structure where global structural information is captured at lower resolutions, and high-frequency local details are refined at higher resolutions. This progressive approach not only better manages the ill-posed nature of super-resolution through stage-wise regularization but also significantly reduces the parameter count and training footprint compared to conventional deep learning models.

Furthermore, to tackle the inherent spatial heterogeneity of the SISR task, where content varies significantly between smooth, homogeneous backgrounds (“Easy” samples) and complex, high-frequency textures (“Hard” samples), we adopt a divide-and-conquer strategy. By explicitly distinguishing between these two categories, we isolate the reconstruction tasks. This ensures that computational resources and modeling capacity are focused on regions with high residual errors, while simultaneously preventing the over-processing of simple areas.

By |January 4th, 2026|News|Comments Off on MCL Research on Image Super-Resolution|

Happy New Year!

As we eagerly await the arrival of 2026, we look forward to a year filled with joy, achievement, and the courage to pursue your aspirations. May the months ahead bring growth, discovery, and opportunities that inspire you to reach new heights. Let’s make this year unforgettable—graceful, bold, and distinctly ours. Cheers to an exciting journey ahead!

By |December 28th, 2025|News|Comments Off on Happy New Year!|

Merry Christmas!

As 2025 comes to a close, we look back with gratitude on a year marked by growth, achievement, and abundant blessings at MCL. This year, we bid a heartfelt farewell to our esteemed graduates, whose impactful research has left a lasting legacy as they embark on exciting new journeys. At the same time, we were delighted to welcome new members into the MCL family—bringing fresh perspectives, energy, and enthusiasm for innovative research.

Through dedication, collaboration, and resilience, our community reached significant milestones, including the publication of outstanding work in leading journals and conferences. These accomplishments reflect the collective passion, perseverance, and excellence of every member of MCL.

From all of us at MCL, we wish you a Merry Christmas filled with love, laughter, and cheer. May this festive season bring peace, happiness, and many moments to cherish.

By |December 21st, 2025|News|Comments Off on Merry Christmas!|

MCL Research on Video Quality Assessment

Blind video quality assessments (BVQA) has become a key part of video streaming pipelines, especially with the rise of short-form user generated content (UGC). Without a reference video, BVQA estimates the quality of the underlying video content using human guided metrics including mean opinion scores (MOS). Many state-of-the-art methods employ large CLIP modules leading to increasingly larger deep learning based pipelines. Given the rapid growth of UGC, lightweight models may save streaming platforms massive amounts of compute and power. 

We are focused on developing a green learning alternative by incorporating raw features from specific sub-domains. Our research has found that a fusion of raw features that capture global and local detail combined with semantic information appears to provide sufficient information to predict MOS, even with trivial temporal schemes such as mean pooling. Currently, we generate raw features from natural scene statistic models including BRISQUE and V-BLIINDs, while local and semantic information is captured by a pre-trained Swin-T model. We plan on further reducing our model size using alternative feature extractors including EfficientNet, MobileNet, or the discrete wavelet transform.

By |December 14th, 2025|News|Comments Off on MCL Research on Video Quality Assessment|