Part of speech (POS) tagging is one of the basic sequence labeling tasks. It aims to tag every word of a sentence with its part-of-speech attribute. As POS offers a fundamental syntactic attribute of words, POS tagging is useful for many downstream tasks, such as speech recognition, syntactic parsing, and machine translation. POS tagging is a crucial preliminary step in building interpretable NLP models. POS tagging has been successfully solved with complex sequence-to-sequence models based on deep learning (DL) technology, such as LSTM and Transformers. Additionally, considering recent advancements in Large Language Models (LLMs), LLMs possess the capability to perform the POS tagging task as versatile models. However, DL models demand higher computational and storage costs. Notably, the POS tagging task itself doesn’t inherently require such elevated computational and storage costs. There is a need for lightweight high-performance POS taggers to offer efficiency while ensuring efficacy for downstream tasks. 

We propose a novel word-embedding-based POS tagger and name it GWPT to meet this demand. Following the green learning (GL) methodology (Kuo & Madni, 2022), GWPT contains three cascaded modules: 1) representation learning, 2) feature learning, and 3) decision learning. The last two modules of GWPT adopt the standard procedures, i.e., the discriminant feature test (DFT) (Yang et al.,2022) for feature selection and the XGBoost classifier in making POS prediction. The main novelty of this work lies in the representation learning module of GWPT. GWPT derives the representation of a word based on its embedding. Both non-contextual embeddings and contextual embeddings can be used. GWPT partitions dimension indices into low-, medium-, and high-frequency three sets. It discards dimension indices in the low-frequency set and considers the N-gram representation for dimension indices in the medium- and high-frequency sets. Furthermore, the final word features are selected from a subset of word representations using supervised learning. This approach helps mitigate the adverse impacts of noise or irrelevant features for POS tagging tasks while simultaneously reducing computational costs. Experimental results show that, as compared with DL-based POS taggers, GWPT offers highly competitive tagging accuracy with fewer model parameters and significantly lower complexity in training and inference.

Reference: 

Kuo, C.-C. J., & Madni, A. M. (2022). Green learning: Introduction, examples and outlook. Journal of Visual Communication and Image Representation, (p. 103685)

Yang, Y., Wang, W., Fu, H., Kuo, C.-C. J. et al. (2022). On supervised feature selection from high dimensional feature spaces. APSIPA Transactions on Signal and Information Processing, 11.