Word embedding is a fundamental task in natural language processing. It converts each word into a representation in a vector space. A challenge with word embedding is that, as the vocabulary grows, the vector space’s dimension increases – leading to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications.
Jintang Xue, a PhD student at MCL, has proposed a dimension reduction method called WordFS [1] for pre-trained word embeddings. WordFS combines a post-processing algorithm (PPA) and weakly- supervised feature selection with limited word similarity pairs. It is simpler, more efficient, and more effective than existing approaches. Experimental results show it excels in word similarity tasks and generalizes well across downstream tasks. WordFS effectively reduces embedding dimensions with lower computational costs.
[1] Xue, Jintang, et al. “Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection.” arXiv preprint arXiv:2407.12342 (2024).