News

MCL Research on Variable-Length Word Embeddings

We propose Variable-Length Word Embeddings, a POS-aware and compute-efficient Word2Vec training framework. Traditional embeddings assign the same dimensionality to every token, even though different parts of speech contribute very differently to sentence meaning. In real text, nouns usually carry the main semantic content, verbs encode actions and relations, while many other categories (e.g., articles, prepositions, conjunctions) are comparatively low-information. This motivates a representation strategy that spends more capacity on important words and less capacity on the rest.

Our core idea is to use POS tags to organize training data and allocate embedding dimensions accordingly. We first POS-tag the entire corpus and split it into three views: a noun-only corpus, a noun+verb corpus, and a full corpus containing all tokens. Instead of training one uniform embedding space, we build embeddings in stages so that nouns become the backbone, verbs are learned relative to that backbone, and the remaining words are learned with minimal capacity.

We train nouns progressively with increasing dimensionality. Specifically, we learn noun embeddings at 50D, 100D, and 200D on the noun-only corpus. To make training across dimensions stable and efficient, each higher-dimensional model is initialized from the previous lower-dimensional embeddings using Lanczos interpolation (50D → 100D, and 100D → 200D), and then refined on the noun-only corpus. This produces high-capacity noun representations while preserving continuity across stages.

After obtaining the noun backbone at each dimension, we introduce verbs through a controlled adaptation step. Using the noun+verb corpus, we train verbs on top of the noun space, where noun vectors are soft-frozen (implemented with a reduced update factor) so they remain stable but can still adjust slightly. Verbs, in contrast, are fully trainable and learn to align with noun semantics. We apply this procedure at 50D [...]

By |March 8th, 2026|News|Comments Off on MCL Research on Variable-Length Word Embeddings|

MCL Research on Renal Image Segmentation

AI-driven medical imaging has emerged as a transformative force in modern healthcare, empowering clinicians to deliver more accurate, efficient, and personalized diagnostic and therapeutic strategies. By automatically analyzing CT, MRI, ultrasound, and other imaging modalities, AI systems can identify subtle patterns, precisely segment anatomical structures, and support clinical decision-making with enhanced accuracy and consistency. These advancements not only improve diagnostic performance but also streamline clinical workflows and ultimately elevate the overall quality of patient care.

Among the many tasks in medical image analysis, kidney and kidney tumor segmentation are pivotal for the management of renal diseases, particularly renal cell carcinoma. Precise delineation of kidneys and tumors is essential for quantitative tumor assessment, treatment planning, surgical navigation, postoperative monitoring, and radiomics research. Accurate segmentation enables clinicians to reliably estimate tumor burden, evaluate tumor–organ spatial relationships, and facilitate nephron-sparing surgical strategies, all of which directly influence patient outcomes. Given that manual segmentation is labor-intensive and prone to inter- and intra-observer variability, the development of automated, robust, and reliable segmentation methods has become increasingly critical for both routine clinical practice and large-scale research.

To address these limitations, we propose 3D-Cube Multi-Stage Green U-shaped Learning (GUSL), a novel multi-stage feed-forward machine learning framework for 3D medical image segmentation without backpropagation. GUSL is designed to be computationally efficient, interpretable, and environmentally sustainable, while maintaining competitive segmentation performance.

The proposed framework adopts a cascaded multi-stage segmentation strategy tailored to different anatomical tasks. As illustrated in Figure 1, distinct stages are designed for coarse-to-fine segmentation. First, the original CT volume is downsampled to a lower resolution, enabling efficient coarse localization of the kidney with reduced computational complexity. This low-resolution stage provides an approximate spatial position of the kidney, as highlighted in the red box. [...]

By |March 1st, 2026|News|Comments Off on MCL Research on Renal Image Segmentation|

MCL Research on Renal Imaging Analysis

Our paper proposes a multi-stage Green U-shaped Learning (GUSL) framework for efficient and reliable IHC image quantification. As shown in the overall pipeline (Stage I–III), the system starts with the preprocessing of the input IHC image using normalization and PCA. In Stage I, marker-specific GUSL modules learn mpIF-informed intermediate representations, such as LAP2 and KI67-related cues, from co-registered training data. In Stage II, these representations are integrated by a dedicated GUSL module to generate a cell/background segmentation map in a coarse-to-fine and residual refinement manner. In Stage III, connected cell regions are extracted, and cell-level classification is performed to determine whether each cell is biomarker-positive or biomarker-negative. The entire framework follows a feedforward, modular design without end-to-end backpropagation, reducing computational cost while keeping the system transparent and interpretable.

Qualitative examples are shown in the second figure. Column (a) presents the input brightfield IHC images. Column (b) shows the ground-truth cell segmentation and biomarker labels. Columns (c) and (d) compare results from a representative deep learning baseline and our GUSL method. We observe that the proposed method produces clearer cell boundaries and more consistent positive/negative classification, especially in crowded regions and low-contrast areas. These visual results are consistent with our quantitative evaluation, which demonstrates competitive segmentation accuracy and improved quantification agreement, while using much lower model complexity and energy consumption.

By |February 22nd, 2026|News|Comments Off on MCL Research on Renal Imaging Analysis|

MCL Research on Whole Slide Image Analysis

Histopathologic analysis is a key confirmatory step in the cancer diagnosis pipeline, where pathologists analyze a tissue section of interest for abnormalities and the extent of disease progression. The digitization of these tissue slides has enabled the use of AI for Whole Slide Image (WSI) Analysis, primarily to simplify pathologists’ laborious tasks and improve diagnostic accuracy. Due to the large size of these images, they’re split into smaller patches, and after analysis of individual patches, the results are aggregated to get the result for the WSI. Such a paradigm is called Multiple Instance Learning (MIL).

Architectural patterns surrounding the tumor regions are key indicators of angiogenesis and help in prognosis prediction. Architectural patterns are classified into 9 types and are grouped into three categories based on the underlying vasculature. While some patterns are often seen, others are rare and common only in higher-grade tumors. This causes a data imbalance that may affect training. To overcome this challenge, we propose an ensemble classifier for architectural pattern classification.

To capture local details and global context, we employ a multi-resolution feature encoder. At each resolution, the Saab transform is applied to obtain joint spatial-spectral representations. Representation learning is followed by a pooling operation to obtain compact representations. The pooled features from each resolution are concatenated to obtain a single feature vector, which is used to select the most discriminant features for the target classification task. An XGBoost binary classifier trained on these selected features predicts a confidence score for each architectural pattern. The confidence scores from multiple one-vs-one classifiers are aggregated to predict the architectural pattern in each patch.

By |February 15th, 2026|News|Comments Off on MCL Research on Whole Slide Image Analysis|

MCL Research on Video Quality Assessment

We are proposing video quality assessment using green learning principles, with the objective of identifying visual distortions while minimizing computational and energy costs. Instead of relying on global frame analysis or large models, the approach emphasizes efficient, local feature extraction that captures distortion-related characteristics. By analyzing color variations, edges, textures, and structural changes at the patch level, the system is designed to detect degradations caused by compression, processing, or tampering in a scalable and sustainable manner.

To further improve efficiency and discriminative power, our group proposes a method named DFT to identify the most informative features. After spatial filtering, features are transformed into the frequency domain, where DFT is used to analyze their spectral behavior and assign importance scores. This allows the model to focus on frequency components that are most sensitive to distortions while discarding redundant information. The selected features are then used to train a lightweight machine learning model and evaluated on unseen videos, ensuring a balance between accuracy, interpretability, and green learning objectives.

By |February 8th, 2026|News|Comments Off on MCL Research on Video Quality Assessment|

MCL Research on Green Image Coding

One of the key components of Green Image Coding (GIC) is multi-grid control, which enables efficient and scalable bit allocation across the framework’s hierarchical layers. Unlike traditional hybrid codecs designed for single-layer encoding, GIC decomposes images into multiple hierarchical layers via resampling, referred to as a multi-grid representation. This decomposition effectively redistributes energy and reduces intra-layer content diversity, but it also creates a complex high-dimensional optimization challenge when attempting to allocate bits optimally across these various layers.

To make this problem tractable, we establish a theoretical foundation by defining the relationship between local and global rate-distortion (RD) models. We demonstrate that the global RD model can be derived from the local RD model of an individual layer by applying specific offsets to both rate and distortion. Notably, the distortion offset is a constant value determined by up-sampling processes and is unrelated to the compression process itself. This theoretical breakthrough reduces an intractable high-dimensional problem into a set of manageable sequential decisions.

Based on these findings, GIC implements a practical slope-matching-based rate control strategy. This strategy allocates bits across multiple grids by matching the slopes of consecutive RD curves. A primary advantage of this design is its modularity; the rate control module only requires information from two consecutive layers to function. This allows the module to be easily duplicated for any number of layers in the encoder, effectively decomposing the global rate-distortion optimization into a sequence of local optimizations to ensure a scalable balance between bit rate and image distortion.

By |February 1st, 2026|News|Comments Off on MCL Research on Green Image Coding|

MCL Research on Feature Learning for Image Classification

Image classification is a central problem in computer vision and is most often solved using deep learning models. While these models achieve strong performance, they are typically large, complicated, and difficult to interpret. To address these limitations, we aim to explore an alternative paradigm: Green Learning, which focuses on building efficient and interpretable models.

One important direction of our work is introducing supervision into the feature extraction process. A key component of our approach is the LDA filter block, a feedforward mechanism that uses Linear Discriminant Analysis (LDA) to construct convolution filters without relying on backpropagation. These LDA filters align image patches with class-discriminative directions.

In addition, we propose spectral LNT, a new variant of the least-squares normal transform (LNT) that leverages the spatial–spectral structure of feature maps by applying localized linear combinations of features. The resulting LNT kernels can be naturally interpreted as LNT projections over local receptive fields. Moreover, we adopt a pyramid sparse-coding structure to extract sparse-coding features from a Gaussian pyramid of the input image. This further enriches the feature representation and leads to improved classification accuracy.

By |January 25th, 2026|News|Comments Off on MCL Research on Feature Learning for Image Classification|
  • Permalink Gallery

    Congratulations to Jiaxin Yang for passing his Qualifying Exam

Congratulations to Jiaxin Yang for passing his Qualifying Exam

Congratulations to Jiaxin Yang for passing his Qualifying Exam! His thesis proposal is titled “Green U-Shaped Learning for Medical Image Analysis: Methodologies and Applications.” His Qualifying Exam Committee members include Jay Kuo (Chair), Justin Haldar, Peter Beerel, Vasileios Magoulianitis, and Michael Khoo (Outside Member).

Artificial intelligence (AI) has rapidly transformed medical imaging by enabling more accurate, efficient, and personalized diagnosis and treatment. In particular, AI-driven medical image segmentation plays a critical role in clinical decision-making for diseases such as prostate cancer and renal cell carcinoma. However, most existing deep learning–based segmentation models rely heavily on backpropagation and large-scale computation, making them energy-intensive, difficult to interpret, and challenging to deploy in resource-constrained clinical settings.

To address these limitations, this work introduces Green U-shaped Learning (GUSL), a novel feed forward machine learning framework for 3D medical image segmentation without backpropagation. GUSL is designed to be efficient, interpretable, and environmentally sustainable, while maintaining competitive segmentation performance.

The proposed framework adopts cascaded multi-stage segmentation strategies tailored to different anatomical tasks. For prostate segmentation, a two-stage coarse-to-fine approach first localizes the prostate gland and then refines its boundaries, effectively mitigating severe class imbalance and anatomical variability, as shown in Figure 1. For kidney and kidney tumor segmentation, a progressive multi-stage cascade dynamically resizes and crops task-specific regions of interest, enabling the model to focus on anatomically relevant structures and improve segmentation accuracy, as shown in Figure 2.

Extensive experiments across multiple prostate and kidney datasets demonstrate that GUSL achieves state-of-the-art performance in prostate and kidney organ segmentation, and competitive results in kidney tumor and mass segmentation. Beyond accuracy, GUSL consistently shows substantial reductions in model size, computational cost (FLOPs), energy consumption, and carbon footprint, highlighting its advantages over conventional deep learning approaches.

These results position [...]

By |January 20th, 2026|News|Comments Off on Congratulations to Jiaxin Yang for passing his Qualifying Exam|
  • Permalink Gallery

    Congratulations to Jintang Xue for passing his Qualifying Exam

Congratulations to Jintang Xue for passing his Qualifying Exam

We congratulate Jintang Xue for passing his Qualifying Exam! His thesis proposal is titled “Towards Efficient and Interpretable Language Representations for Multimodal Reasoning Systems.” His Qualifying Exam Committee members include Jay Kuo (Chair), Antonio Ortega, Ashutosh Nayyar, Vasileios Magoulianitis, and Robin Jia (Outside Member). Here is a brief summary of his thesis proposal:

Modern multimodal reasoning systems increasingly rely on large-scale neural models to jointly process perception and language. While such models achieve strong performance, they often suffer from high computational cost and limited interpretability. We explore a representation-centric approach that explicitly designs language representations to improve both reasoning capability and efficiency.

The first part of the work focuses on multimodal 3D scene understanding. We introduce an object-centric framework that augments each object with a natural language description capturing both intrinsic attributes and inter-object relationships. These descriptions are integrated into large language model–based pipelines through a dual-level strategy, including embedding-level fusion and prompt-level injection. By explicitly encoding relational semantics in language, the proposed approach significantly enhances grounding, captioning, and question-answering performance in complex 3D scenes, particularly for relational reasoning tasks.

The second part of the work investigates efficient and interpretable language representations. We propose a weakly supervised feature selection framework for word embedding dimensionality reduction, which preserves semantic similarity while substantially reducing computational and storage costs. Unlike black-box compression methods, the proposed approach directly identifies the most informative embedding dimensions, improving both efficiency and interpretability.

Together, this work demonstrates that explicitly structured language representations can serve as a powerful and practical alternative to purely scale-driven modeling, enabling multimodal reasoning systems that are more efficient,  interpretable, and controllable.

By |January 15th, 2026|News|Comments Off on Congratulations to Jintang Xue for passing his Qualifying Exam|

Congratulations to Kevin Yang for Passing His Defense

Congratulations to Kevin Yang for passing his Defense! Kevin’s thesis is titled “Interpretable and Efficient Multi-Modal Data Interplay: Algorithms and Applications.” His Dissertation Committee includes Jay Kuo (Chair), Antonio Ortega, and Jesse Thomason (Outside Member). Here is a brief summary of his thesis:

This research addresses the fundamental trade-offs in multimodal learning between interpretability, data efficiency, and computational overhead. As large-scale vision-language models grow increasingly complex, their “black-box” nature and intensive resource requirements present significant barriers to practical deployment. This dissertation introduces four modular frameworks to transition these systems toward explainable, resource-efficient architectures.

First, the Efficient Human-Object Interaction (EHOI) detector decomposes complex interaction tasks into manageable sub-problems, providing transparent intermediate results to support interpretable decision-making. Second, Green Multimodal Alignment (GMA) enhances image-text retrieval by leveraging object detection and semantic clustering, allowing for precise regional interest mapping. Building on these principles, the third work introduces an optimized Video-Text Alignment (VTA) architecture that leverages contrastive learning and specialized data preprocessing to reduce computational costs during inference drastically. Finally, the Semantic and Visual Defect Detector (SVD-Det) bridges the gap between academic research and industrial application. By aligning features across modalities, SVD-Det achieves state-of-the-art performance in AI-generated video detection while maintaining a lightweight structure suitable for real-world use.

Ultimately, these contributions offer a sustainable roadmap for high-performing AI. By prioritizing modularity and transparency, this research establishes an efficient pipeline capable of processing complex, real-world data for both academic inquiry and industrial-scale deployment.

By |January 14th, 2026|News|Comments Off on Congratulations to Kevin Yang for Passing His Defense|