Contents Online
Statistics and Its Interface
Volume 17 (2024)
Number 3
Semi-supervised learning in unbalanced networks with heterogeneous degree
Pages: 501 – 516
DOI: https://dx.doi.org/10.4310/23-SII809
Authors
Abstract
Community detection is a well-established area of research in network analysis. However, there has been limited discussion on how to improve prediction accuracy when some community labels are already known. In this paper, we introduce a novel algorithm called the weighted inverse Laplacian (WIL) for predicting labels in partially labeled undirected networks. Our algorithm is founded on the concept of the first hitting time of a random walk and is supported by information propagation and regularization frameworks. By combining two different normalization techniques, WIL is highly adaptable and can handle community imbalance and degree heterogeneity. Additionally, we propose a partially labeled degree-corrected block model (pDCBM) to describe the generation of partially labeled networks. Under this model, we prove that WIL guarantees a misclassification rate going to zero as the number of nodes goes to infinity, and it can handle greater imbalances than traditional Laplacian methods. Our simulations and empirical studies demonstrate that WIL outperforms other state-of-the-art methods, particularly in unbalanced and heterogeneous networks.
Keywords
semi-supervised learning, network data, unbalanced label, heterogeneous node
Received 23 November 2022
Accepted 27 July 2023
Published 19 July 2024