The success of deep learning and neural networks often comes at the price of a large number of labeled data. Weakly-supervised learning (WSL) is an important paradigm that leverages a large number of unlabeled data to address this limitation. The need for WSL has arisen in many machine learning problems and found wide applications in computer vision, natural language processing, and graph-based modeling, where getting labeled data is expensive and there exists a large amount of unlabeled data.

Among weakly-supervised graph learning methods, label propagation (LP) has demonstrated good adaptability, scalability, and efficiency for node classification. However, LP-based methods are limited in their capability of integrating multiple data modalities for effective learning. Due to the recent success of neural networks, there has been an effort of applying neural networks into graph-structured data. One pioneering technique, known as graph convolutional networks (GCNs), has achieved impressive node classification performance for citation networks. However, GCNs fail to exploit the label distribution in the graph structure and difficult to scale for large graphs.

In this work, we propose a scalable weakly-supervised node classification method on graph-structured data, called GraphHop, where the underlying graph contains attributes of all nodes but labels of few nodes. Our method is an iterative algorithm that overcomes the deficiencies in LP and GCNs. With proper initial label vector embeddings, each iteration contains two steps: 1) label aggregation and 2) label update. In Step 1, each node aggregates its neighbors’ label vectors obtained in the previous iteration. In Step 2, a new label vector is predicted for each node based on the label of the node itself and the aggregated label information obtained in Step 1. This iterative procedure exploits the neighborhood information and enables GraphHop to perform well in an extremely weakly-supervised learning setting and scale well for very large graphs. Experimental results show that GraphHop outperforms state-of-the-art graph learning methods on a wide range of tasks (e.g., multi-label and multi-class classification on citation networks, social graphs and commodity consumption graphs) in graphs of various sizes.