Research on graph representation learning has gained increasing attention among researchers because many speech/text data such social networks, linguistic (word co-occurrence) networks, biological networks and many other multi-media domain specific data can be well represented by graphs. Graph representation allows relational knowledge about interacting entities to be stored and accessed efficiently. Analyzing these graph data can provide significant insights into community detection, behavior analysis and many other useful applications for node classification, link prediction and clustering. To analyze the graph data, the first step is to find an accurate and efficient graph representation. The steps of graph embedding are shown in Figure 1. The input is a graph represented by an adjacency matrix. Graph representation learning aims to embed the matrix into a latent dimension that captures the intrinsic characteristics of the original graph. For each node u in the network, we embed it to a d dimensional space that represent the feature of that node, as shown in Figure 2.

Obtaining an accurate representation for the graph is challenging because of several factors. Finding the optimal dimension of the representation is not an easy task. Representation with higher number of dimensions might preserve more information of the original graph at the cost of more space and time. The choose of dimension can also be domain-specific and depends on the type of input graph. Choosing which property of the graph to embed is also challenging given the plethora of properties graphs have.

In our research, we first focus on node prediction task in deep learning models. Specifically, we explore node classification using tree-structured recursive neural networks. Then we switch our goal to improve the accuracy and efficiency of the deep-walk based matrix factorization method.


— By Fenxiao(Jessica) Chen