Knowledge graphs (KGs) store human knowledge in a graph-structured format, where nodes and edges denote entities and relations, respectively. Most KGs, such as wikidata [1], suffer from the incompleteness problem; namely, a large number of factual triples are missing, leading to performance degradation in downstream applications. Thus, there is growing interest in developing KG completion (KGC) methods to solve the incompleteness problem by inferring undiscovered factual triples based on existing ones.

Prior KGC work focuses on learning embeddings for entities and relations through a simple score function. Yet, a higher-dimensional embedding space is usually required for a better reasoning capability, which leads to larger model size and hinders applicability to real-world problems (e.g., large-scale KGs or mobile/edge computing). A lightweight modularized KGC solution, called GreenKGC, is proposed to address this issue. GreenKGC consists of three modules: representation learning, feature pruning, and decision learning, to extract discriminant KG features and make accurate predictions on missing relationships using classifiers and negative sampling. The overview of the model is shown in Fig. 1.

Experimental results demonstrate several advantages of GreenKGC. First, it only requires a low-dimensional space (e.g. d = 8) to achieve achieve competitive or even better performance against high-dimensional models with much smaller model sizes. Second, as compared with other classification-based methods, it requires a shorter inference time and provides better performance. Third, the feature pruning module is 20x faster than knowledge distillation methods in training powerful low-dimensional features. A comparison of GreenKGC and other KGC models under different dimensions is given in Fig. 2.

-By Yun Cheng Wang

 

Reference:

[1] Vrandečić, Denny, and Markus Krötzsch. “Wikidata: a free collaborative knowledgebase.” Communications of the ACM 57.10 (2014): 78-85.