Statistics and Its Interface

Volume 7 (2014)

Number 2

Canonical ensembles for potentially incompatible dependency networks with applications to medical data

Pages: 251 – 261

DOI: https://dx.doi.org/10.4310/SII.2014.v7.n2.a10

Authors

Shyh-Huei Chen (Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, U.S.A.)

Edward H. Ip (Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, U.S.A.)

Yuchung J. Wang (Department of Mathematical Sciences, Rutgers University, Camden, New Jersey, U.S.A.)

Abstract

A directed graph is either acyclic or cyclic. This paper focuses on the cyclic model, or dependency network, which represents a collection of univariate conditional distributions. The conditional approach allows a high level of flexibility in modeling because the dependency network is based on the notion that it is computationally convenient to estimate the local distribution of a variable given the remaining variables in a data set. However, the collection of conditional distributions individually estimated within a dependency network is generally not coherent with any joint distribution. The pseudo-Gibbs sampler (PGS) has often been used to estimate joint distributions for incompatible conditional models. We propose a new method for deriving a joint distribution from a given set of potentially incompatible univariate-conditional distributions such that the discrepancies between the given conditional distribution and those computed from the estimated joint distribution is minimized. The method is based on an ensemble of distributions, each of which can be derived from the canonical parameters of a set of given conditional distributions. Through simulation experiments and real data sets, we compare the performance of the ensemble method, the PGS, and a linear programming (LP)-based method. Our comparisons suggest that the ensemble method outperforms both the PGS and LP. The ensemble method is computationally efficient and scalable, and it therefore has the potential to open a new avenue for finding a nearly optimal solution for dependency networks of high dimensions.

Keywords

characterizing set of interactions, conditionally specified model, dependency network, ensemble method

Published 17 April 2014