Statistics and Its Interface

Volume 10 (2017)

Number 4

Estimation of directed subnetworks in ultra-high dimensional data for gene network problems

Pages: 657 – 676

DOI: https://dx.doi.org/10.4310/SII.2017.v10.n4.a10

Authors

Sung Won Han (School of Industrial Management Engineering, Korea University, Seongbuk-Gu, Seoul, Korea)

Sunghwan Kim (Department of Statistics, Keimyung University, Daegu, South Korea)

Junhee Seok (School of Electrical Engineering, Korea University, Seongbuk-gu, Seoul, Korea)

Jeewhan Yoon (Graduate School of Management of Technology, Korea University, Seongbuk-gu, Seoul, Korea)

Hua Zhong (Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, N.Y., U.S.A.)

Abstract

The next generation sequencing technology generates ultra-high dimensional data. However, it is computationally impractical to estimate an entire Directed Acyclic Graph (DAG) under such high dimensionality. In this paper, we discuss two different types of problems to estimate subnetworks in ultra high dimensional data. The first problem is to estimate DAGs of a subnetwork adjacent to a target gene, and the second problem is to estimate DAGs of multiple subnetworks without information about a target gene. To address each problem, we propose efficient methods to estimate subnetworks by using layer-dependent weights with BIC criteria or by using community detection approaches to identify clusters as subnetworks. We apply such approaches to the gene expression data of breast cancer in TCGA as a practical example.

Keywords

Bayesian network, directed acyclic graph, penalized likelihood, high dimension, subnetworks

Published 30 May 2017