Statistics and Its Interface

Volume 2 (2009)

Number 2

Robust genome-wide scans with genetic model selection using case-control design

Pages: 145 – 151

DOI: https://dx.doi.org/10.4310/SII.2009.v2.n2.a4

Authors

Nancy L. Geller (Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, Maryland, U.S.A.)

Jungnam Joo (Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, Maryland, U.S.A.)

Jing-Ping Lin (Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, Maryland, U.S.A.)

Mario Stylianou (Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, Maryland, U.S.A.)

Xin Tian (Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, Maryland, U.S.A.)

Myron A. Waclawiw (Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, Maryland, U.S.A.)

Colin O. Wu (Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, Maryland, U.S.A.)

Gang Zheng (Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, Maryland, U.S.A.)

Abstract

In a genome-wide association study with more than 100, 000 (100K) to 1 million single nucleotide polymorphisms (SNPs), the first step is usually a genome-wide scan to identify candidate chromosome regions for further analyses. The goal of the genome-wide scan is to rank all the SNPs based on their association tests or p-values and select the top SNPs. A good ranking procedure ranks the SNPs with true associations as near to the top as possible. This enhances the probability of selecting at least one SNP with a true association. However, if the disease-associated SNPs have moderate genetic effects, the probability that a large number of null SNPs will have extremely small p-values (or large test statistics) is high when screening more than 300K SNPs. Therefore, when selecting a small fraction of top SNPs (usually less than 5%), the probability of selecting at least one SNP with a true association is usually less than 80% unless the sample size is large. Robust statistics have been proposed to rank all the SNPs (e.g., MAX3 and MIN2). In this article we consider genome-wide scans with a genetic model selection and compare this proposed method to the existing approaches. Results from simulation studies are presented.

Keywords

case-control design, efficiency robustness, genetic model selection, genome-wide studies, MAX

2010 Mathematics Subject Classification

Primary 62G10, 62G35. Secondary 62G30, 62P10.

Published 1 January 2009