Statistics and Its Interface

Volume 17 (2024)

Number 3

Semi-supervised learning in unbalanced networks with heterogeneous degree

Pages: 501 – 516

DOI: https://dx.doi.org/10.4310/23-SII809

Authors

Li Ting (The Hong Kong Polytechnic University)

Ningchen Ying (The Hong Kong University of Science and Technology)

Xianshi Yu (University of Michigan)

Bing-Yi Jing (Southern University of Science and Technology)

Abstract

Community detection is a well-established area of research in network analysis. However, there has been limited discussion on how to improve prediction accuracy when some community labels are already known. In this paper, we introduce a novel algorithm called the weighted inverse Laplacian (WIL) for predicting labels in partially labeled undirected networks. Our algorithm is founded on the concept of the first hitting time of a random walk and is supported by information propagation and regularization frameworks. By combining two different normalization techniques, WIL is highly adaptable and can handle community imbalance and degree heterogeneity. Additionally, we propose a partially labeled degree-corrected block model (pDCBM) to describe the generation of partially labeled networks. Under this model, we prove that WIL guarantees a misclassification rate going to zero as the number of nodes goes to infinity, and it can handle greater imbalances than traditional Laplacian methods. Our simulations and empirical studies demonstrate that WIL outperforms other state-of-the-art methods, particularly in unbalanced and heterogeneous networks.

Keywords

semi-supervised learning, network data, unbalanced label, heterogeneous node

The full text of this article is unavailable through your IP address: 172.17.0.1

Received 23 November 2022

Accepted 27 July 2023

Published 19 July 2024