Contents Online
Statistics and Its Interface
Volume 17 (2024)
Number 3
Robust subgroup analysis for network-linked data
Pages: 357 – 370
DOI: https://dx.doi.org/10.4310/23-SII774
Authors
Abstract
Modern applications often collect data with individuals connected by a network to effectively record relationship information between individuals. In this paper, we use both covariates and the network to identify subgroup structures from a heterogeneous population, where heterogeneity arises from unknown or unobserved latent factors. We propose a penalization based method for subgroup analysis based on the median regression model, which can automatically divide the samples into subgroups by penalizing pairwise difference of intercepts for individuals connected by an edge in the network. The proposed method can also be used to predict response variables for new subjects with only covariates by taking advantage of the network reconstructed after adding these new subjects. We suggest an implementation algorithm based on the local linear approximation to the nondifferentiable and nonconvex penalty function and establish the oracle properties of the proposed estimator under some regularity conditions. Our simulation studies show that the proposed method can effectively identify heterogeneous subgroups even when the network has errors or misspecified edges. Finally, the advantages of the proposed method are further illustrated by the analysis on a housing price data set from real estate transactions.
Keywords
subgroup analysis, network-linked data, heterogeneity, prediction
Received 3 November 2022
Accepted 4 January 2023
Published 19 July 2024