MCL Research on Sentence Similarity Modeling

By Wei Wang | September 4, 2022 | News

Sentence similarity evaluation has a wide range of applications in natural language processing, such as semantic similarity computation, text generation evaluation, and information retrieval. As one of the word-alignment-based methods, Word Mover’s Distance (WMD)[1] formulates text similarity evaluation as a minimum-cost flow problem. It finds the most efficient way to align the information between text sequences through a flow network defined by word-level similarities. By assigning flows to individual words, WMD computes text dissimilarity as the minimum cost of moving words’ flows from one sentence to another based on pre-trained word embeddings.

However, a naive WMD method does not perform well on sentence similarity evaluation for several reasons.
– First, WMD assigns word flow based on words’ frequency in a sentence. This frequency-based word weighting scheme is weak in capturing word importance when considering the statistics of the whole corpus.
– Second, the distance between words solely depends on the embedding of isolated words without considering the contextual and structural information of input sentences. Since the meaning of a sentence depends on individual words as well as their interaction, simply considering the alignment between individual words is deficient in evaluating sentence similarity.

MCL proposed a new syntax-aware word flow calculation method, Syntax-aware Word Mover’s Distance (SynWMD)[2], for sentence similarity evaluation.
– Words are first represented as a weighted graph based on the co-occurrence statistics obtained by dependency parsing trees. Then, a PageRank-based algorithm is used to infer word importance.
– The word distance model in WMD is enhanced by the context extracted from dependency parse trees, which is illustrated in Figure 1. The contextual information of words and structural information of sentences are explicitly modeled as additional subtree embeddings.
– As shown in Table 1, we conduct extensive experiments on semantic textual similarity tasks and k-nearest neighbor sentence classification tasks to evaluate the effectiveness of the proposed SynWMD. The code for SynWMD is available at https: //github.com/amao0o0/SynWMD.

Ref:
[1] Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. In International conference on machine learning, pages 957–966. PMLR, 2015.
[2] Wei, Chengwei, Bin Wang, and C-C. Jay Kuo. “SynWMD: Syntax-aware Word Mover’s Distance for Sentence Similarity Evaluation.” arXiv preprint arXiv:2206.10029 (2022).

— by Chengwei Wei

About the Author: Wei Wang

Thesis Title: Explainable and Lightweight Machine Learning Models for Image Super-Resolution and Denoising, May 2025. Employment: University of Southern California, Los Angeles, California, USA The 181st PhD from MCL

MCL Research on Sentence Similarity Modeling

Share This Story, Choose Your Platform!

About the Author: Wei Wang

You May Also Like

Welcome New MCL Member James Zhan

Welcome New MCL Member Alek Yegazarian

Welcome New MCL Member Jimmy Xiao

Welcome New MCL Member Kevin Lim

Welcome New MCL Member Qi Cao

MCL Research on Image Classification

Congratulations to Wei Wang for Passing Her Defense!

Congratulations to Qingyang Zhou for Passing His Defense!

MCL Research on Prostate Segmentation

MCL Research on Green Image Super-resolution

MCL Research on Nuclei Segmentation

MCL Research on Seismic Data Processing

MCL Research on Image Denoising

MCL Research on Video-Text Retrieval

MCL Research on Enhanced Object Detection

MCL Research on Video Camouflaged Object Detection (VCOD)

MCL Research on Image Dehazing

Reunion of MCL Alumni at Southern California

MCL Research on Image Demosaicing

MCL Research on Transfer Learning

MCL Research on EDA

Professor Kuo Gave a Keynote at AIxMM 2025

Welcome New MCL Member Cynthia Huang

Congratulations to Ganning Zhao for Passing Her Defense!

Welcome to the Spring 2025 semester!

Happy New Year!

Merry Christmas!

MCL Research on Green Learning for Medical Imaging

Research on Green Image Segmentation

MCL’s Thanksgiving Luncheon

MCL Research on Radar Signal Processing: Jamming signal detection

Congratulations to Professor Kuo for Receiving NTU Distinguished Alumni Award

MCL Research on Feedforward Visual Attention

MCL Research on Word Embedding Dimension Reduction

Welcome New MCL Member Hong-En Chen

Welcome New MCL Member Laurence Palmer

Professor C.-C. Jay Kuo Named Inaugural Ming Hsieh Chair Holder

Welcome New MCL Member Alexander Jou

Welcome New MCL Member Qixin Hu

Welcome New MCL Member Youngrae Kim

Congratulations to Vasileios Magoulianitis for Passing His Defense

Congratulations to Zhanxuan Mei for Passing His Defense

MCL Research on Supervised Feature Learning

MCL Research on Green Saliency-guided Blind Image Quality Assessment (GSBIQA)

MCL Research on Green Raw Image Demosaicking

MCL Research on Green Saliency-guided Blind Image Quality Assessment (GSBIQA)

MCL Research on Prostate Lesion Detection from MRI Images

MCL Research on Green Image Super-resolution

Professor Kuo Attended ICME in Niagara Falls, Canada

MCL Research on 3D Perception with Large Foundational Models

MCL Research on Prostate MRI Image Segmentation

Professor Kuo Met MCL Alumni in Thailand

Professor Kuo visited Singapore

Professor Kuo met MCL Alumni in Taiwan

MCL Research on Nuclei Segmentation for Histological Images

MCL Research on Seismic Data Processing

MCL Research on Point Cloud Surface Reconstruction

Congratulations to Chengwei Wei for Passing His Defense

Congratulations on MCL Members Attending Ph.D. Hooding Ceremony

Welcome New MCL Member Dingyi Nie

MCL Research on Parsing Tree Construction

MCL Research on Video Camouflaged Object

MCL Research on Green Learning for Electronic Design Automation (EDA)

MCL Research on 3D Perception with Large Foundational Models

MCL Research on Green Image Coding

MCL Research on Green Point Cloud Surface Reconstruction

MCL Research on POS Tagging Prediction

MCL Research on Saliency Detection Method

Welcome New MCL Member Xuechun Hua

MCL Research on Transfer Learning

MCL Research on Image Demosaicing

MCL Research on LQBoost Regressor

MCL Research on SLMBoost Classifier

Congratulations to Xuejing Lei for Passing Her Defense

Congratulations to Yifan Wang for Passing His Defense

Welcome to Join MCL as an Intern Sanket Kumbhar