Statistics and Its Interface

Volume 4 (2011)

Number 3

A clustered optimal ROC curve method for family-based genetic risk prediction

Pages: 373 – 380

DOI: https://dx.doi.org/10.4310/SII.2011.v4.n3.a11

Authors

Qing Lu (Department of Epidemiology, Michigan State University, East Lansing, Mich., U.S.A.)

Chengyin Ye (Department of Bioinformatics, College of Life Science, Zhejiang University, Hangzhou, Zhejiang, China)

Jun Zhu (Department of Bioinformatics, College of Life Science, Zhejiang University, Hangzhou, Zhejiang, China)

Abstract

Risk prediction that capitalizes on emerging genetic findings holds great promises for improving public health and clinical care. Statistical methods for genetic risk prediction research, and particularly for correlated data, are however still lacking. To address this, we have developed a clustered optimal ROC curve (CORC) method, in order to build predictive genetic tests using data from family-based genetic research. For the proposed method, we have extended the conventional optimal ROC curve method to handle multiple genetic markers, taking sample correlation into consideration, and implemented a forward selection algorithm to allow for high-dimensional data and the capture of possible epistasis. We have evaluated the CORC method using both simulations and a real-data application, showing that the method performed better than other existing methods under various pedigree structures and underlying disease models. In the real-data application, we applied the method to the large scale International Multi-Center ADHD Genetics Project dataset and formed a predictive genetic test for conduct disorder. The test reached a low to medium classification accuracy, with an AUC value of 0.6908.

Keywords

clustered ROC curve, predictive genetic test, high-dimensional data, genome-wide association study

Published 29 August 2011