Statistics and Its Interface

Volume 10 (2017)

Number 3

Adaptive model-free sure independence screening

Pages: 399 – 406

DOI: https://dx.doi.org/10.4310/SII.2017.v10.n3.a4

Authors

Canhong Wen (Southern China Research Center of Statistical Science, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, Guangdong Province, China)

Shan Zhu (Southern China Research Center of Statistical Science, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, Guangdong Province, China)

Xin Chen (Department of Statistics and Applied Probability, National University of Singapore)

Xueqin Wang (Southern China Research Center of Statistical Science, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, Guangdong Province, China)

Abstract

Variable screening procedure is popularly used in ultrahigh-dimensional data analysis. It ranks the importance of the predictor variables by marginal correlations and then screens out the variables that are weakly correlated or uncorrelated with the response variables. Though demonstrated their effectiveness, the performance of most variable screening approaches depend on the pre-determined threshold of the size of selected predictor variables, which is some integer multiples of $\lceil n / \log(n) \rceil$ with $n$ being the sample size. To circumvent this issue, we propose a novel data-driven variable screening procedure that can automatically determine the threshold. In our proposal, we rank the importance of the predictor variables by the $p$-values using some modified independent tests, with the smaller $p$-values indicating higher correlation. Compared with the existing counterpart, extensive simulation studies and a real genetic data indicate the preference of our procedure.

Keywords

adaptive threshold, distance correlation, false discovery rate, sure independence screening, ultra-high dimensional data

Published 31 January 2017