MCL Research on Word Embedding

By Wei Wang | January 7, 2019 | News

Word embedding has obtained its popularity among various NLP tasks including sentiment analysis [1], information retrieval [2] and machine translation [3]. The goal for word embedding is transferring word to vector representation which embeds both syntactic and semantic information. In the meantime, relationship between words can be distinguished though measurements of corresponding word vectors.

Although embedded vector representations of words offer impressive performance on many natural language processing (NLP) applications, the information of ordered input sequences is lost to some extent if only context-based samples are used in the training. For further performance improvement, two new post-processing techniques, called post-processing via variance normalization (PVN) and post-processing via dynamic embedding (PDE), are proposed in this work. The PVN method normalizes the variance of principal components of word vectors, while the PDE method learns orthogonal latent variables from ordered input sequences [4]. Our post-processing technique could improve the performance on both intrinsic evaluations tasks including word similarity, word analogy and outlier detection, and extrinsic evaluation tasks including sentiment analysis and machine translation.

In the meantime, we are also interested in word embedding evaluation tasks. It can be divided into two categories: intrinsic evaluation and extrinsic evaluation. We are trying to understand more on the properties of word embedding as well as their evaluation methods. It is still an on-going project.

Further developments include contextualized word embedding [5] and pre-trained language models [6] are quite popular last year. Lots of exciting work can be done along this direction and performance is much better than previous models. Also, bilingual or multi-lingual word embedding could also be an interesting research area.

–By Bin Wang, working with Fenxiao (Jessica) Chen, Angela Wang and Yunchen (Joe) Wang

Reference:

[1] Shin, B.; Lee, T.; and Choi, J. D. 2016. Lexicon integrated cnn models with attention for sentiment analysis. arXiv agereprint arXiv:1610.06272.

[2] Schutze, H.; Manning, C. D.; and Raghavan, P. 2008. Introduction to information retrieval, volume 39. Cambridge University Press.

[3] Ding, Y.; Liu, Y.; Luan, H.; and Sun, M. 2017. Visualizing and understanding neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, 1150–1159.

[4] Wang, Bin, Fenxiao Chen, Angela Wang, and C-C. Jay Kuo. “Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding.” arXiv preprint arXiv:1808.06305 (2018).

[5] Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L., 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365.

[6] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

About the Author: Wei Wang

Thesis Title: Explainable and Lightweight Machine Learning Models for Image Super-Resolution and Denoising, May 2025. Employment: University of Southern California, Los Angeles, California, USA The 181st PhD from MCL

MCL Research on Word Embedding

Share This Story, Choose Your Platform!

About the Author: Wei Wang

You May Also Like

MCL Research on Multi-Stage XGBoost

Welcome New MCL Member James Zhan

Welcome New MCL Member Alek Yegazarian

Welcome New MCL Member Jimmy Xiao

Welcome New MCL Member Kevin Lim

Welcome New MCL Member Qi Cao

MCL Research on Image Classification

Congratulations to Wei Wang for Passing Her Defense!

Congratulations to Qingyang Zhou for Passing His Defense!

MCL Research on Prostate Segmentation

MCL Research on Green Image Super-resolution

MCL Research on Nuclei Segmentation

MCL Research on Seismic Data Processing

MCL Research on Image Denoising

MCL Research on Video-Text Retrieval

MCL Research on Enhanced Object Detection

MCL Research on Video Camouflaged Object Detection (VCOD)

MCL Research on Image Dehazing

Reunion of MCL Alumni at Southern California

MCL Research on Image Demosaicing

MCL Research on Transfer Learning

MCL Research on EDA

Professor Kuo Gave a Keynote at AIxMM 2025

Welcome New MCL Member Cynthia Huang

Congratulations to Ganning Zhao for Passing Her Defense!

Welcome to the Spring 2025 semester!

Happy New Year!

Merry Christmas!

MCL Research on Green Learning for Medical Imaging

Research on Green Image Segmentation

MCL’s Thanksgiving Luncheon

MCL Research on Radar Signal Processing: Jamming signal detection

Congratulations to Professor Kuo for Receiving NTU Distinguished Alumni Award

MCL Research on Feedforward Visual Attention