Statistics and Its Interface

Volume 12 (2019)

Number 1

Network-incorporated integrative sparse linear discriminant analysis

Pages: 149 – 166

DOI: https://dx.doi.org/10.4310/SII.2019.v12.n1.a13

Authors

Xiaoyan Wang (College of Finance and Statistics, Hunan University, Changsha, Hunan, China; and Department of Biostatistics, Yale University, New Haven, Connecticut, U.S.A.)

Kuangnan Fang (School of Economics, Xiamen University, Xiamen, Fujian, China; and Fujian Key Lab of Science, Xiamen University, Xiamen, Fujian, China)

Qingzhao Zhang (Wang Yanan Institute for Studies in Economics, MOE Key Lab of Economics, and Fujian Key Lab of Statistics, Xiamen University, Xiamen, Fujian, China)

Shuangge Ma (Department of Biostatistics, Yale University, New Haven, Connecticut, U.S.A.)

Abstract

Linear discriminant analysis (LDA) has been extensively applied in classification. For high-dimensional data, results generated from a single dataset may be unsatisfactory because of the small sample size. Under the regression framework, integrative analysis, which pools and analyses raw data from multiple datasets, has presented superior performance than single dataset analysis and meta-analysis. In this study, we conduct integrative analysis for LDA (iLDA). A network structure for variables is constructed to accommodate their interconnections, which have not been considered in many of the existing classification studies. We adopt the $1$-norm group MCP method for simultaneous estimation and discriminative variable selection, and a Laplacian penalty to incorporate the network. The proposed method has intuitive formulations and can be computed using an effective coordinate descent algorithm. Simulation study shows that iLDA outperforms benchmarks with more accurate variable identification and classification. Analysis of three breast cancer datasets demonstrate that iLDA can improve prediction performance.

Keywords

integrative analysis, discriminant analysis, network

We would like to thank the editor and reviewers for their useful comments and suggestions, which have led to a significant improvement of this study. Wang’s work was supported by the National Natural Science Foundation of China (71601076), Humanity and Social Science Youth Foundation of Ministry of Education of China (16YJCZH104), and Social Science Foundation of Hunan Province (15YBA085). Zhang’s work was supported by the Fundamental Research Funds for the Central Universities (20720171064, 20720181003). Ma’s work was supported by the National Bureau of Statistics of China (2016LD01).

Received 3 February 2018

Published 26 October 2018