We propose Variable-Length Word Embeddings, a POS-aware and compute-efficient Word2Vec training framework. Traditional embeddings assign the same dimensionality to every token, even though different parts of speech contribute very differently to sentence meaning. In real text, nouns usually carry the main semantic content, verbs encode actions and relations, while many other categories (e.g., articles, prepositions, conjunctions) are comparatively low-information. This motivates a representation strategy that spends more capacity on important words and less capacity on the rest.
Our core idea is to use POS tags to organize training data and allocate embedding dimensions accordingly. We first POS-tag the entire corpus and split it into three views: a noun-only corpus, a noun+verb corpus, and a full corpus containing all tokens. Instead of training one uniform embedding space, we build embeddings in stages so that nouns become the backbone, verbs are learned relative to that backbone, and the remaining words are learned with minimal capacity.
We train nouns progressively with increasing dimensionality. Specifically, we learn noun embeddings at 50D, 100D, and 200D on the noun-only corpus. To make training across dimensions stable and efficient, each higher-dimensional model is initialized from the previous lower-dimensional embeddings using Lanczos interpolation (50D → 100D, and 100D → 200D), and then refined on the noun-only corpus. This produces high-capacity noun representations while preserving continuity across stages.
After obtaining the noun backbone at each dimension, we introduce verbs through a controlled adaptation step. Using the noun+verb corpus, we train verbs on top of the noun space, where noun vectors are soft-frozen (implemented with a reduced update factor) so they remain stable but can still adjust slightly. Verbs, in contrast, are fully trainable and learn to align with noun semantics. We apply this procedure at 50D and 100D, resulting in verbs that are naturally grounded in the noun-centered semantic space rather than drifting independently.
Finally, we expand to the full vocabulary in a lightweight way. At 50D only, we start from the trained noun+verb model, soft-freeze nouns and verbs, and then train on the full corpus to learn embeddings for all remaining tokens. This stage equips function words and other less-informative categories with usable representations while keeping the overall compute and memory cost low. The final output is a variable-length embedding set: nouns are represented in 200D, verbs in 100D, and other tokens in 50D.
In addition to variable dimensionality, we also incorporate a polysemy-aware labeling strategy for nouns. We cluster noun usages to assign each noun a discrete “meaning” identifier, enabling explicit separation of different senses. As a result, our final token format becomes word_POS_meaning, which encodes syntactic role (POS) and a coarse semantic sense label. This design reduces ambiguity for polysemous nouns and produces embeddings that are both more efficient and more semantically structured.

