USC Media Communications Lab

Development of the objective metric for 3D Image Quality Assessment

Author: Hyunsuk Ko and C.-C. Jay Kuo

A new framework for objective quality evaluation of stereoscopic image pairs is the goal of this research. Quality assessment of stereoscopic image pairs is more complicated than that for 2D images since it is a multi-dimensional problem where the quality is affected by distortion types as well as the relation between the left and right views (e.g. different levels of distortion in two views). In this work, a new formula-based metric was introduced to provide results better than state-of-the art methods. However, the formula-based metric still has its limitation. For further improvement, we further propose a 2-stage fusion model. That is, we classify distortion types into groups and design a set of scorer to handle them separately. In Stage 1, each scorer generates its own score. In Stage 2, all intermediate scores are fused to predict the final quality index with nonlinear regression. Experimental results demonstrate that the proposed quality index outperforms several existing quality assessment methods by a significant margin over different databases.

Although research on advanced image/video quality index methods that are more consistent with the human visual experience has made a substantial amount of progress in the last decade, the study on 3D image/video quality is still in its early stage. The 3D quality assessment (QA) is a difficult problem since it is affected by 2D image quality, depth perception and visual discomfort such as eye strain or dizziness. It is particularly challenging when the stereoscopic image pair consists of two views with different quality levels (called asymmetric distortion). To address this problem, we need a deeper understanding of the human visual system (HVS), e.g. the binocular combination in stereovision, to build a robust index that [...]

By Hyunsuk Ko|November 21st, 2013|Visual Quality and Perceptual Coding|Comments Off|

Image/video quality assessment

Author: Yuchieh Lin, Tsung-Jung Liu, Weisi Lin, and C.-C. Jay Kuo
This is a joint work during my internship in Mediatek in summer 2013.

Peak signal-to-noise ratio (PSNR) has been widely used to assess the quality of distorted images or videos with respect to their original ones for a long history. Human visual experience is affected by several psycho-visual factors, but PSNR does not take these factors into account. Thus, image and video quality metrics (IQM and VQM) [1] are proposed to emulate perceptional visual quality.

For image quality assessment, we developed a framework [2] to integrate visual saliency model to existing IQMs, such as SSIM and FSIM. For video quality assessment, we are working on building a new video database. Through this database, we would develop an algorithm for video quality assessment

So far, image or video quality metrics are not extensively applied to practical usage. One important reason is that the performance and complexity of no-reference algorithms are still not satisfied. We want to develop and employ no-reference IQM and VQM to enhance existing applications. We are facing a trade-off between performance and complexity.

[1] Tsung-Jung Liu, Yu-Chieh Lin, Weisi Lin and C.-C. Jay Kuo, “Visual quality assessment: recent developments, coding applications and future trends,” APSIPA Transactions on Signal and Information Processing, 2013
[2] Joe Yuchieh Lin, Tsung Jung Liu, Weisi Lin and C.-C. Jay Kuo, “Visual-saliencyenhanced image quality assessment indices,” APSIPA Annual Summit and Conference, Kao-Hsiung, Taiwan, Oct. 29-Nov. 1, 2013.

By Yuchieh (Joe) Lin|November 21st, 2013|Visual Quality and Perceptual Coding|Comments Off|

Image Quality Assessment Using Multi-Method Fusion

Author: Tsung-Jung Liu, Weisi Lin, and C.-C. Jay Kuo

A new methodology for objective image quality assessment (IQA) with multi-method fusion (MMF) is presented. The research is motivated by the observation that there is no single method that can give the best performance in all situations. To achieve MMF, we adopt a regression approach. The new MMF score is set to be the nonlinear combination of scores from multiple methods with suitable weights obtained by a training process. In order to improve the regression results further, we divide distorted images into three to five groups based on the distortion types and perform regression within each group, which is called “context-dependent MMF” (CD-MMF). One task in CD-MMF is to determine the context automatically, which is achieved by a machine learning approach. To further reduce the complexity of MMF, we perform algorithms to select a small subset from the candidate method set. The result is very good even if only three quality assessment methods are included in the fusion process. The proposed MMF method using support vector regression is shown to outperform a large number of existing IQA methods by a significant margin when being tested in six representative databases.

Although the proposed MMF has excellent performance, one issue concerning context classification for CD-MMF needs to be resolved in the future. Since one image may consist of multiple distortion types, the strict classification of images into one specific context may lead to the wrong context category, and then affect the subsequent quality prediction. One possible and better solution to overcome this shortcoming is to use unsupervised classification for context determination. Another alternative is to attach beliefs to the classification of the context and weight the corresponding regressed predicted [...]

By Tsung-Jung Liu|November 21st, 2013|Visual Quality and Perceptual Coding|Comments Off|

Energy-Efficient Video-Sharing Servers

Author: Hang Yuan and C.-C. Jay Kuo

Large-scale video-sharing services, such as YouTube, have come to dominate the Internet traffic. To meet the rapid growth in both data and user demand in VSS, multi-layer infrastructures with parallel architectures have been deployed. In parallel video servers, each video file is divided into a number of blocks and spread across different disks. By breaking a relatively long and continuous video workload into smaller tasks, the system can serve more concurrent requests. The use of massive data centers for large-scale VSS has led to ever-increasing energy cost. In particular, video servers rely on the storage and bandwidth of the parallel disk system, which is among the heaviest energy consumer. It has been reported that such a storage system typically accounts for 27% of the total energy consumption in data centers.

My research focuses on energy management in this kind of parallel storage systems. In particular, we study how to make the best use of the low power modes in disks and how to optimize the usage of memory cache to improve energy efficiency. The goal is to not only minimize energy consumption but also control the impact of energy saving techniques on service delays. We developed a model that efficiently facilitates the selection of power modes for disks, and extended it to optimize cache utilization. Using the model, we can effectively minimize the energy consumption under different service delay constraints.

Our work can be extended in several areas. First, we are in the process of designing better data placement policies that can improve the performance of the algorithm. Second, we only optimized energy consumption for the idle periods of the disk, which prevented us from achieving more energy saving especially [...]

By Hang Yuan|November 21st, 2013|Biomedical and Information Processing|Comments Off|

Recommendation Systems for Social Media Networks

Author: Sanjay Purushotham, Yan Liu, and C.-C. Jay Kuo

Social networking sites such as Facebook, YouTube, and Lastfm, have become a popular platform for users to connect with friends and share contents (e.g., music, images, and news). The availability of social networks between people has significantly enriched the semantics of links and contents on the web. A fundamental question is whether and how social networks can help to improve recommendation systems, such as product recommendations, advertisement targeting, and scientific paper suggestions. In particular, given the rich content information available, will we have any additional gain by considering social networks? The answer to this question is of great interest to both academia and industries. Our goal in this research is to provide insights into this direction.

Most existing work has been focused on utilizing either content or social network information, but few have considered them jointly. In our work, we propose a hierarchical Bayesian model to integrate social network structure (using matrix factorization) and item content-information (using topic model) for item recommendation. We connect these two data sources through the shared user latent feature space. The matrix factorization of social network will learn the low-rank user latent feature space, while topic modeling provides a content representation of the items in the item latent feature space, in order to make social recommendations.

For future work, we will work on incorporating the attention of users into our recommendation system model. We will also work on improving the scalability and interpretability of our model.

[1] Sanjay Purushotham, Yan Liu, C.-C. Jay Kuo, “Collaborative [...]

By Sanjay Purushotham|November 21st, 2013|Research|Comments Off|

Environmental Sound Recognition

Author: Sachin Chachada and C.-C. Jay Kuo

Research on Environmental Sound Recognition (ESR) has significantly increased in the past decade. With a growing demand on example-based search such as content-based image and video search, ESR can be instrumental in efficient audio search applications. ESR can be also useful for automatic tagging of audio files with descriptors for keyword-based audio retrieval, robot navigation, home-monitoring system, surveillance, recognition of animal and bird species, etc. Among various types of audio signals, speech and music are two categories that have been extensively studied. In their infancy, ESR algorithms were a mere reflection of speech and music recognition paradigms. However, on account of considerably non-stationary characteristics of environmental sounds, these algorithms proved to be ineffective for large-scale databases. Recent publications have seen an influx of substantial new features and algorithms catering to ESR. However, the problem largely remains unsolved.

Owing to non-stationary characteristics of environmental sounds, recent works have focused on time-frequency features [1-3]. Efforts have also been made to incorporate non-linear classifiers for ESR [4]. A comprehensive coverage of recent developments can be found in [5]. These recently developed features perform well on sounds which exhibit non-stationarity but have to compete with conventional features like Mel-Frequency Cepstral Coefficients (MFCC) for other sounds. A set of features with simplicity of stationary methods and accuracy of non-stationary methods is still a puzzle piece. Moreover, considering the numerous types of environmental sounds, it is hard to fathom a single set of features suitable for all sounds. Another problem with using a single set of features is that different features need different processing schemes, and hence several meaningful combination of features, that would be otherwise functionally complementary to each other, are incompatible in practice. [...]

By Sachin Chachada|November 21st, 2013|Biomedical and Information Processing|Comments Off|

Fully Automated Segmentation of Mitochondria Based on Morphological Feature Learning

Author: Xue Wang, Jing Zhang, and C.-C. Jay Kuo

The importance of this research work lies in related morphological characters evaluation for mitochondria objects after accurate object extraction. Studies have shown that the fusion-fission dynamics of mitochondria is involved in many cellular processes, including maintenance of adenosine triphosphate (ATP) levels, redox signaling, oxidative stress generation, and cell death [1-4]. Therefore, mitochondrial morphology can reveal the physiological or pathological status of mitochondria and in a typical analysis, and researchers manually label the mitochondria morphological structures into several subtypes, such as fragmented, networked, and swollen structures [5]. However, although there exist a number of algorithms for mitochondria segmentation [6-8], they require careful manual tuning and optimization while the resultant segmented mitochondria objects are still not correctly classified into standardized morphological subtypes. The challenge is that the gray-level fluorescent intensity is the only clue to segment background from foreground mitochondrial objects.

To overcome the challenge, our work aims at applying computer vision techniques to achieve accurate segmentation based on texture feature extraction for morphological characters. A 2-stage segmentation system (as shown in Fig. 1) has been built to realize automated mitochondria segmentation.

In the Stage I where machine learning classifiers are trained for initial segmentation, the key to the success of this part is that the image signal can be transformed and represented by a linear combination of a subset of extracted texture features, and data grouping methods are applied to enhance the accuracy of classifiers. Our work shows that learning-based approaches fit our problem as they can overcome the existing challenges.

In the Stage II of mitochondria centerline extraction, the cost function is designed based on the human learning/labeling experience to judge the occurrence of connection for each pair of [...]

By Xue Wang|November 21st, 2013|Biomedical and Information Processing|Comments Off|

Gas Path Analysis Transient Fault Detection for Turbine Jet Engines

Author: Martin Gawecki and C.-C. Jay Kuo
This work is part funded by the Pratt &Whitney Institute for Collaborative Engineering, in collaboration with Pratt & Whitney, Korean Airlines, and INHA University.

As a supplement to previous work on vibration and acoustic approaches [1], Gas Path Analysis (GPA) is the study of a gas-turbine jet engine’s temperature, pressure, and rotational parameters with the aim of detecting and diagnosing problems that may occur during flight (see Figure 1), commonly known as the field of Prognostics and Health Management (PHM). General applications of GPA for steady-state engine operation have been used successfully for over two decades in a limited capacity, due to the technological restrictions of on-board computing, costly transmission capabilities, and a lack of comprehensive analysis tools. While this foundational problem is slowly being addressed, even fewer answers exist to the question of whether GPA can be used for the immediate recognition and identification of faults that may occur during engine transients (non-steady-state regions of operation, shown in gray in Figure 2). Our research aims to explore this problem in order to build real-time tools that improve the maintenance workflow and to help develop the next generation of “intelligent engines.”

Using a variety of data created using the state-of-the-art Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) [2, 3, 5], we propose a contextualized combination of state variable and empirical models for the purposes of feature extraction [4], coupled with a machine learning framework for the detection and identification of faults. In essence, we hope to use existing normal (non-fault) signatures of GPA parameters at steady-state as a reference for transient points of operation, in the hopes of finding and classifying abnormal behavior. Figure 3 outlines the feature extraction process [...]

By Martin Gawecki|November 21st, 2013|Biomedical and Information Processing|Comments Off|

Previous 1 23

Research

Recent Posts