Statistics and Its Interface

Volume 4 (2011)

Number 3

False-negative-rate based approach selecting top single-nucleotide polymorphisms in the first stage of a two-stage genome-wide association study

Pages: 359 – 371

DOI: https://dx.doi.org/10.4310/SII.2011.v4.n3.a10

Authors

Melissa L. Bondy (Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A.)

Richard S. Houlston (Section of Cancer Genetics, Institute of Cancer Research, Sutton, United Kingdom)

Zhuying Huang (Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A.)

Sanjay Shete (Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A.)

Jian Wang (Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A.)

Chih-Chieh Wu (Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A.)

Abstract

Genome-wide association (GWA) studies, where hundreds of thousands of single-nucleotide polymorphisms (SNPs) are tested simultaneously, are becoming popular for identifying disease loci for common diseases. Most commonly, a GWA study involves two stages: the first stage includes testing the association between all SNPs and the disease and the second stage includes replication of SNPs selected from the first stage to validate associations in an independent sample. The first stage is considered to be more fundamental since the second stage is contingent on the results of the first stage. Selection of SNPs from stage one for genotyping in stage two is typically based on an arbitrary threshold or controlling type I errors. These strategies can be inefficient and have the potential to exclude genotyping of disease-associated SNPs in stage two. We propose an approach for selecting top SNPs that uses a strategy based on the false-negative rate (FNR). Using the FNR approach, we proposed the number of SNPs that should be selected based on the observed p-values and a pre-specified multi-testing power in the first stage. We applied our method to simulated data and a GWA study of glioma (a rare form of brain tumor) data. Results from simulation and the glioma GWA indicate that the proposed approach provides an FNR-based way to select SNPs using pre-specified power.

Keywords

false negative rate, single nucleotide polymorphism, two-stage genome-wide association study

2010 Mathematics Subject Classification

60K35

Published 29 August 2011