Congratulations to Kevin Yang for passing his Defense! Kevin’s thesis is titled “Interpretable and Efficient Multi-Modal Data Interplay: Algorithms and Applications.” His Dissertation Committee includes Jay Kuo (Chair), Antonio Ortega, and Jesse Thomason (Outside Member). Here is a brief summary of his thesis:

This research addresses the fundamental trade-offs in multimodal learning between interpretability, data efficiency, and computational overhead. As large-scale vision-language models grow increasingly complex, their “black-box” nature and intensive resource requirements present significant barriers to practical deployment. This dissertation introduces four modular frameworks to transition these systems toward explainable, resource-efficient architectures.

First, the Efficient Human-Object Interaction (EHOI) detector decomposes complex interaction tasks into manageable sub-problems, providing transparent intermediate results to support interpretable decision-making. Second, Green Multimodal Alignment (GMA) enhances image-text retrieval by leveraging object detection and semantic clustering, allowing for precise regional interest mapping. Building on these principles, the third work introduces an optimized Video-Text Alignment (VTA) architecture that leverages contrastive learning and specialized data preprocessing to reduce computational costs during inference drastically. Finally, the Semantic and Visual Defect Detector (SVD-Det) bridges the gap between academic research and industrial application. By aligning features across modalities, SVD-Det achieves state-of-the-art performance in AI-generated video detection while maintaining a lightweight structure suitable for real-world use.

Ultimately, these contributions offer a sustainable roadmap for high-performing AI. By prioritizing modularity and transparency, this research establishes an efficient pipeline capable of processing complex, real-world data for both academic inquiry and industrial-scale deployment.