Word embedding has obtained its popularity among various NLP tasks including sentiment analysis [1], information retrieval [2] and machine translation [3]. The goal for word embedding is transferring word to vector representation which embeds both syntactic and semantic information. In the meantime, relationship between words can be distinguished though measurements of corresponding word vectors.

Although embedded vector representations of words offer impressive performance on many natural language processing (NLP) applications, the information of ordered input sequences is lost to some extent if only context-based samples are used in the training. For further performance improvement, two new post-processing techniques, called post-processing via variance normalization (PVN) and post-processing via dynamic embedding (PDE), are proposed in this work. The PVN method normalizes the variance of principal components of word vectors, while the PDE method learns orthogonal latent variables from ordered input sequences [4]. Our post-processing technique could improve the performance on both intrinsic evaluations tasks including word similarity, word analogy and outlier detection, and extrinsic evaluation tasks including sentiment analysis and machine translation.

In the meantime, we are also interested in word embedding evaluation tasks. It can be divided into two categories: intrinsic evaluation and extrinsic evaluation. We are trying to understand more on the properties of word embedding as well as their evaluation methods. It is still an on-going project.

Further developments include contextualized word embedding [5] and pre-trained language models [6] are quite popular last year. Lots of exciting work can be done along this direction and performance is much better than previous models. Also, bilingual or multi-lingual word embedding could also be an interesting research area.

 

–By Bin Wang, working with Fenxiao (Jessica) Chen, Angela Wang and Yunchen (Joe) Wang

 

Reference:

[1] Shin, B.; Lee, T.; and Choi, J. D. 2016. Lexicon integrated cnn models with attention for sentiment analysis. arXiv agereprint arXiv:1610.06272.

[2] Schutze, H.; Manning, C. D.; and Raghavan, P. 2008. Introduction to information retrieval, volume 39. Cambridge University Press.

[3] Ding, Y.; Liu, Y.; Luan, H.; and Sun, M. 2017. Visualizing and understanding neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, 1150–1159.

[4] Wang, Bin, Fenxiao Chen, Angela Wang, and C-C. Jay Kuo. “Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding.” arXiv preprint arXiv:1808.06305 (2018).

[5] Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L., 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365.

[6] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.