Knowledge graphs (KG) model human readable knowledge using entity and relation triples. One major branch of KG research is representation learning, in which we try to learn low dimensional embeddings for entity and relations. Simple arithmetic operations between embeddings of entities and relations can represent complex real world knowledge or even discover new ones. KGs are rapidly evolving with the enormous amount of new information generated everyday. Since it is infeasible to retrain KG embeddings whenever we encounter a new entity or relation, modeling unseen entities and relations remains a challenging task.
There are two main directions of research to handle unseen entities. One direction is to infer the embedding of new entities from its neighboring entities and relations that are observed during training. Researchers have either relied on Graph Neural Networks or designed specialized aggregation functions to collect the unseen nodes’ neighborhood information. The other path is to leverage feature information in entity nodes metadata. Specifically, entity name and descriptions are often available in textual format upon querying the KG. Recent advances in transformer language models have made it possible to extract high quality feature representation for contextual information after a minimal amount of fine tuning of the model. When transformer language models such as BERT are applied to extract entity representations, the model is capable of generating embedding for any entity with textual name or descriptions. As a result, the unseen entity problem is therefore resolved.
RotatE has been one of the most effective yet simple KG embedding models invented recently. In RotatE, entities and relations are models as complex vectors. Each element of the relation vector serves as an element-wise phase shifter that transforms source entity to target entity. We propose a specialized aggregation function based on multiplication and fractional exponent of transformed representations of neighboring entities. The intuition of the proposed aggregation function is that because RotatE essentially depends on the operation on phases rather than on real and imaginary parts of the phasor, we can instead take the average of the phase angles. Following the Out-of-Sample KG embedding training scheme, we can make the model aware of the use of aggregation function during training and therefore produce embeddings tailored to the zero shot setting.