USC Media Communications Lab

Permalink Gallery
MCL Research on Geographic Fake Images Detection

MCL Research on Geographic Fake Images Detection

Misinformation on the Internet and social media, ranging from fake news to fake multimedia such as images and videos, is a significant threat to our society. Effective misinformation detection has become a research focus, driven by commercial and government funding. With the fast-growing deep learning techniques, real-looking fake images can be easily generated using generative adversarial networks (GANs). The problem of fake satellite images detection was recently introduced. Fake satellite images could be generated with the intention of hiding important infrastructure and/or creating fake buildings to deceive others. Although it may be feasible to check whether these images are real or fake using another satellite, the cost is high. Furthermore, the general public and media do not have the proper resource to verify the authenticity of fake satellite images. Consequently, fake satellite images pose serious challenges to our society, as recognized by government organizations concerned about the political and military implications of such technology. Handcrafted features were used for fake satellite image detection, and its best detection performance measured by the F-1 score is 87%.

A new method, called PSL-DefakeHop, is proposed to detect fake satellite images based on the parallel subspace learning (PSL) framework in this work. The DefakeHop method was developed previously for the detection of Deepfake generated faces under the successive subspace learning (SSL) framework. PSL is proposed to extract features from responses of multiple single-stage filter banks (or called PixelHops), which operate in parallel, and it improves SSL that extracts features from multi-stage cascaded filter banks. PSL has two advantages. First, PSL preserves discriminant features often lie in high-frequency channels, which are however ignored by SSL. Second, decisions from multiple filter banks can be ensembled to further improve detection accuracy. To [...]

By Wei Wang|September 20th, 2021|News|Comments Off|

Permalink Gallery
Congratulations to Yeji Shen for Passing His Defense

Congratulations to Yeji Shen for Passing His Defense

Congratulations to Yeji Shen for passing his defense on Sep 7, 2021. His Ph.D. thesis is entitled “Labeling Cost Reduction Techniques for Deep Learning: Methodologies and Applications”. Here we invite Yeji to share a brief introduction of his thesis and some words he would like to say at the end of the Ph.D. study journey.

1) Abstract of Thesis

Deep learning has contributed to a significant performance boost of many computer vision tasks. Still, the success of most existing deep learning techniques relies on a large number of labeled data. While data labeling is costly, a natural question arises: is it possible to achieve better performance with the same budget of data labeling? We provide two directions to address the problem: more efficient utilization of the budget or supplementing unlabeled data with no labeling cost. Specifically, in this dissertation, we study three problems related to the topic of reducing the labeling cost: 1) active learning that aims at identifying most informative unlabeled samples for labeling; 2) weakly supervised 3D human pose estimation that utilizes a special type of unlabeled data, action-frozen people videos, to help improve the performance with few manual annotations; and 3) self-supervised representation learning on a large-scale dataset of images with text and user-input tags at no additional labeling cost.

In the first part of this talk, we will introduce our representation work which mainly focuses on the utilization of textual information in images. Text information inside images could provide valuable cues for image understanding. We propose a simple but effective representation learning framework, called the Self-Supervised Representation learning of Images with Texts (SSRIT). SSRIT exploits optical character recognition (OCR) signals in a self-supervision manner. SSRIT constructs a representation that is trained to predict whether [...]

By Wei Wang|September 12th, 2021|News|Comments Off|

Permalink Gallery
MCL Research on Domain Specific Word Embedding

MCL Research on Domain Specific Word Embedding

Word embeddings, also known as distributed word representations, learn real-valued vectors that encode words’ meaning. They have been widely used in many Natural Language Processing (NLP) tasks, such as text classification, part-of-speech tagging, parsing, and machine translation. Text classification is a task where the input texts have to be classified into different categories based on their content. Word embedding methods have been tailored to text classification for performance improvement.

In this research, two task-specific dependency-based word embedding methods are proposed for Text classification. In contrast with universal word embedding methods that work for generic tasks, we design task-specific word embedding methods to offer better performance in a specific task. Our methods follow the PPMI matrix factorization framework and derive word contexts from the dependency parse tree. As compared linear contexts, dependency-based contexts can find long-range contexts and exclude less informative contexts. One example is shown in Fig. 2, where the target word is ‘found’. Guided by the dependency par-sing tree, its closely related words (e.g. ‘he’, ‘dog’) can be easily identified. In contrast, less related words (e.g. ‘skinny’, ‘fragile’) are gathered by linear contexts.

Firstly, to construct robust and informative contexts, we use dependency relation which represents the word’s syntactic function to locate the keywords in the sentence and treat the keywords and the neighbor words in the dependency parse tree as contexts.

To further increase the text classification performance, we make our word embedding learns from word-context as well as word-class co-occurrence statistics. We combine the word-context and word-class mutual information into a single matrix for factorization.

It is shown by experimental results they outperform several state-of-the-art word embedding methods.

Image credits:

Image showing a simple example algorithm framework for text classification is from https://laptrinhx.com/nlp-multiclass-text-classification-machine-learning-model-using-count-vector-bow-tf-idf-2622024659/

By Wei Wang|September 7th, 2021|News|Comments Off|

Permalink Gallery
Welcome MCL New Member – Armin Bazarjani

Welcome MCL New Member – Armin Bazarjani

In Fall 2021, we have a new MCL member, Armin Bazarjani, joining our big family. Here is a short interview with Armin with our great welcome.

1. Could you briefly introduce yourself and your research interests?

I received both my BS and MS degrees in electrical engineering from USC. I am now working as a researcher in MCL as I also prepare to apply for PhD programs for the Fall 2022 cycle. My overarching interests are in machine learning and statistical pattern recognition, and I hope to one day be able to meaningfully apply these interests to the fields of computational sustainability and computational neuroscience. Other than my academic interests, I really enjoy hiking/backpacking, cooking, and generally being outdoors and away from the computer whenever I can manage the time.

2. Could you briefly introduce yourself and your research interests?

It should come as no surprise that I am a big fan of the culture at USC as I have received two degrees here now! From my experience, everybody at USC has been kind and supportive in all areas of scholastic pursuit. Additionally, many of the students here are more well-rounded as they don’t purely focus their identity on academic achievement only. I have only been with MCL for a few weeks now so it is difficult to say anything with relative assurity, especially because I started here during the COVID pandemic and haven’t been able to enjoy my lab members’ company in person. I will say, with the few interactions I have had, the lab seems very well driven and organized. I also really like the advising style of Professor Kuo as he seems to strike a good balance between how hands on he is and how [...]

By Wei Wang|August 29th, 2021|News|Comments Off|

Permalink Gallery
Welcome MCL New Member – Qingyang Zhou

Welcome MCL New Member – Qingyang Zhou

In Fall 2021, we have a new MCL member, Qingyang Zhou, joining our big family. Here is a short interview with Qingyang with our great welcome.

1. Could you briefly introduce yourself and your research interests?

Hello, I am Qingyang Zhou, a new member of MCL in 21fall. I received my Bachelor’s and Master’s degrees both in Shanghai, China. My research interest is multimedia signal processing, especially video coding and point cloud compression. I am passionate about these research areas and I have been working on different video coding standards for over 3 years before coming to USC. I am very happy to share with you my experience in this field.

2. Could you briefly introduce yourself and your research interests?

MCL is filled with productive and professional researchers. The lab offers us many different research directions from multimedia signal processing to computer vision. Professor Kuo is creative, hardworking, and full of insightful thoughts. I believe I would receive excellent academic training and become a good researcher at MCL. USC is also an excellent place with both a beautiful campus and strong academic background. I am very happy that I could continue my research at USC.

3. What is your future expectation and plan in MCL?

At MCL, I will continue to do research on multimedia-related areas. I have been doing this for years and there are still many questions waiting for me to solve. For the next few months, my attention will be mainly focused on Point Cloud Compression. For the next several years, I would like to get my academic skills improved and make good preparations before entering the industry or academia.

By Wei Wang|August 22nd, 2021|News|Comments Off|

Permalink Gallery
MCL Research on Point Cloud Registration

MCL Research on Point Cloud Registration

3D registration is an important step in point cloud processing. Given a set of point cloud scans, registration tries to align the point clouds in one reference frame so as to get the complete 3D scene of the environment. In a simple case, given two point clouds, usually referred to as source and target, the registration algorithm finds an optimal 3D transformation that aligns the source with the target. The 3D transformation consists of rotation and translation.

The classical Iterative Closest Point (ICP) algorithm and its variants have been a popular choice for registration since many years. More recently learning-based methods have been developed for point cloud registration. These methods have resolved some issues related to traditional methods such as noise resilience, outliers, difference in sampling densities, partial views, etc. But in turn, most of these methods rely on supervision in terms of ground truth rotation matrix and translation vector.

Inspired by the Successive Subspace Learning methodology and the PointHop classification method in particular, we propose an unsupervised point cloud registration method called R-PointHop [1]. R-PointHop first finds a local reference frame (LRF) for every point using its nearest neighbors and determines its local attributes. Next, it learns local-to-global hierarchical features by point downsampling, neighborhood expansion, attribute construction and dimensionality reduction steps. Then, point correspondence are found using nearest neighbor rule in the hierarchical feature space. Later, a subset of good correspondence is selected to estimate the 3D transformation. The use of LRF allows for the point features to be invariant with respect to rotation and translation, thus making R-PointHop more robust even in presence of large rotation angles. Experiments on the ModelNet40 and the Stanford Bunny dataset demonstrate the effectiveness of R-PointHop on the 3D [...]

By Zhiruo Zhou|August 15th, 2021|News|Comments Off|

Permalink Gallery
MCL Research on Image Generation

MCL Research on Image Generation

An image generative model learns the distribution of image samples from a certain domain and then generates new images that follow the learned distribution. The design of image generative models involves analysis and generation two pipelines. The former analyzes properties of training image samples while the latter generates new images after the training is completed. Only the generation unit is used for image generation in inference. There is a resurge of interests in generative models due to the amazing performance achieved by deep-learning-based (DL-based) methods in general and generative adversarial networks (GANs) in particular. Yet, DL-based methods attempt to solve a nonconvex optimization problem which is difficult to explain. GAN’s training may suffer from gradient vanishing, convergence difficulty and mode collapse. Furthermore, its implementation demands higher computational resources due to large model sizes.

The design of GANs demands that distributions of training and generated images be indistinguishable, which is implicitly achieved by training a generator/discriminator pair through end-to-end optimization of a cost function. In contrast with the GAN approach, we propose a novel and explainable image generation method with explicit sample distribution modeling in this work. For image analysis, we construct fine-to-coarse spatial-spectral subspaces using the PixelHop++ architecture and obtain sample distributions in each subspace. For image generation, we reverse the process by generating samples in the coarsest subspace and add more details to these samples gradually. Our solution, called GenHop (an acronym of Generative Pixelhop), offers an unconditional generative model. Based on MNIST and Fashion-MNIST two datasets, GenHop can generate visually pleasant images whose FID scores are comparable with those of DL-based generative models.

By Zhiruo Zhou|August 8th, 2021|News|Comments Off|

Permalink Gallery
MCL Research on Large-scale 3D Indoor Scene Semantic Segmentation

MCL Research on Large-scale 3D Indoor Scene Semantic Segmentation

3D point cloud segmentation requires the understanding of both the global geometric structure and the fine-grained details of each point. According to the segmentation granularity, 3D point cloud segmentation methods can be classified into three categories: semantic segmentation (scene level), instance segmentation (object level) and part segmentation (part level). Our research focuses on solving the semantic segmentation problem. Given a point cloud, the goal of semantic segmentation is to separate it into several subsets according to the semantic meanings of points. There are two common scenes, urban scene and indoor scene. Efficient semantic segmentation of large-scale 3D point clouds is a fundamental and essential capability for real-time intelligent systems, such as autonomous driving and augmented reality. We mainly try to do the large-scale 3D indoor scene semantic segmentation more efficiently. The representative large-scale public is the indoor S3DIS dataset [1].

A key challenge is that the raw point clouds acquired by depth sensors are typically irregularly sampled, unstructured and unordered. Recently, the pioneering work PointNet [2] was proposed for directly processing 3D point clouds by learning per-point features using shared multilayer perceptrons (MLPs) and max pooling. Its following works try to capture wider context information for each point. Although these approaches achieve impressive results for object recognition and semantic segmentation, almost all of them are limited to extremely small 3D point clouds and cannot be directly extended to larger scale without preprocessing such as block partition.

We design a different data preprocessing method to learn large scale data directly. Each room in the dataset is treated as an input sample and feed into an unsupervised feature extractor to obtain point-wise features. The unsupervised feature extractor is developed upon our previous work PointHop [3]. It is extremely [...]

By Zhiruo Zhou|August 1st, 2021|News|Comments Off|

Permalink Gallery
MCL Research on Image Steganalysis

MCL Research on Image Steganalysis

In image forensics, steganography and steganalysis are like the two ends of the same coin. image steganography is a technique to conceal secret messages in the images by slightly modifying the pixel values. Corresponding to image steganography, steganalysis is the process to reveal the presence of the hidden message in images. Recently, steganalysis are focusing on defending content-adaptive steganographic schemes, for example WOW, HILL and S-UNIWARD, etc. Fig.1 [1] illustrates the modifications of cover image from different steganographic method. Content-adaptive steganography is lean to do modifications on complex texture regions, which makes embedding traces less detectable for steganalyzers.

Traditionally, hand crafted features together with machine learning classifiers have good performance on steganalysis, such as Spatial Rich Model and its variants. After the emerging of neural networks, different CNN architectures are utilized in the steganalysis literature. Because of the important property that CNNs are able to extract complex statistical dependencies from high dimensional input and learn hierarchical representations, CNN-based features usually achieve better performance than traditional hand-crafted features. However, CNN based models are suffering from long training time, large model size and enormous consumption of computation resources.

We would like to utilize green learning methodology in steganalysis field, by incorporating Saab transform as feature extraction module in the future. Saab transform has shown its capability of extracting the high frequency representations in a feedforward way and preserving the light-weighted model size at the same time.

References:

[1] Tang, Weixuan, et al. “Adaptive steganalysis based on embedding probabilities of pixels.” IEEE Transactions on Information Forensics and Security 11.4 (2015): 734-745.

[2] C.-C. J. Kuo and Y. Chen, “On data-driven saak transform,” Journal of Visual Communication and Image Representation, vol. 50, pp. 237–246, 2018.

[3] C.-C. J. Kuo, M. Zhang, S. Li, J. [...]

By Zhiruo Zhou|July 25th, 2021|News|Comments Off|

Permalink Gallery
MCL Research on Object Tracking

MCL Research on Object Tracking

Video object tracking is one of the fundamental computer vision problems and has found rich applications in video surveillance, autonomous navigation, robotics vision, etc. In the setting of online single object tracking (SOT), a tracker is given a bounding box on the target object at the first frame and then predicts its boxes for all remaining frames. Online tracking methods can be categorized into two categories, unsupervised and supervised. Traditional trackers are unsupervised. Recent deep-learning-based (DL-based) trackers demand supervision. Unsupervised trackers are attractive since they do not need annotated boxes to train supervised trackers. The performance of trackers can be measured in terms of accuracy (higher success rate), robustness (automatic recovery from tracking loss), and speed (higher FPS).

We examine the design of an unsupervised high-performance tracker and name it UHP-SOT (Unsupervised High-Performance Single Object Tracker) in this work. UHP-SOT consists of three modules: 1) appearance model update, 2) background motion modeling, and 3) trajectory-based box prediction. Previous unsupervised trackers pay attention to efficient and effective appearance model update. Built upon this foundation, an unsupervised discriminative-correlation-filters-based (DCF-based) tracker STRCF [1] is adopted by UHP-SOT as the baseline in the first module. Yet, the use of the first module alone has shortcomings such as failure in tracking loss recovery and being weak in box size adaptation. We propose ideas for background motion modeling and trajectory-based box prediction to address the mentioned problems. The baseline tracker gets initialized at the first frame. For the following frames, UHP-SOT gets proposals from all three modules and chooses one of them as the final prediction based on a fusion strategy, as shown in Fig. 1. Fig. 2 shows example results on sequences from the OTB-2015 [2] benchmark. Our tracker runs [...]

By Zhiruo Zhou|July 18th, 2021|News|Comments Off|

Previous 18 192021 22 Next

News