Statistics and Its Interface

Volume 17 (2024)

Number 3

Robust subgroup analysis for network-linked data

Pages: 357 – 370

DOI: https://dx.doi.org/10.4310/23-SII774

Authors

Yu Xing (Northeast Normal University)

Wensheng Zhu (Northeast Normal University)

Kyongson Jon (Northeast Normal University)

Abstract

Modern applications often collect data with individuals connected by a network to effectively record relationship information between individuals. In this paper, we use both covariates and the network to identify subgroup structures from a heterogeneous population, where heterogeneity arises from unknown or unobserved latent factors. We propose a penalization based method for subgroup analysis based on the median regression model, which can automatically divide the samples into subgroups by penalizing pairwise difference of intercepts for individuals connected by an edge in the network. The proposed method can also be used to predict response variables for new subjects with only covariates by taking advantage of the network reconstructed after adding these new subjects. We suggest an implementation algorithm based on the local linear approximation to the nondifferentiable and nonconvex penalty function and establish the oracle properties of the proposed estimator under some regularity conditions. Our simulation studies show that the proposed method can effectively identify heterogeneous subgroups even when the network has errors or misspecified edges. Finally, the advantages of the proposed method are further illustrated by the analysis on a housing price data set from real estate transactions.

Keywords

subgroup analysis, network-linked data, heterogeneity, prediction

Received 3 November 2022

Accepted 4 January 2023

Published 19 July 2024