Statistics and Its Interface

Volume 17 (2024)

Number 3

Data integration of multiple genome-wide association studies under group homogeneous structure

Pages: 517 – 532

DOI: https://dx.doi.org/10.4310/23-SII810

Authors

Kai Li (Karyopharm Therapeutics Inc.)

Chi Song (The Ohio State University)

Yuan Jiang (Oregon State University)

Abstract

Nowadays, it’s common to have a large collection of datasets from similar scientific studies, with the famous example of multiple genome-wide association studies that are investigating the same human disease. To take advantage of these datasets, statisticians have developed data integration methods to combine datasets from multiple studies in order to increase statistical power. Most data integration methods to date can only combine compatible studies with the same explanatory variables; they also tend to ignore the grouping structure of the explanatory variables. However, incompatible studies with grouped explanatory variables arise frequently from multiple genome-wide association studies that employ different genotyping platforms. Therefore, we propose a new method called “gMeta” that can integrate incompatible datasets under a new group homogeneous structure by utilizing group regularization principles. gMeta not only promotes statistical powers by assuming homogeneity among group-level signals but also allows heterogeneous individual-level signals from different studies. Simulation studies illustrate the advantage of gMeta over separate analysis in terms of its homogeneity and enhanced statistical power for detecting weak signals. Finally, an integrative analysis of multiple genetic datasets on schizophrenia shows the applicability and efficacy of gMeta when it is applied to genome-wide association studies.

Keywords

group regularization, heterogeneity, homogeneity, incompatible studies, meta-analysis

Received 28 November 2022

Accepted 1 August 2023

Published 19 July 2024