Statistics and Its Interface

Volume 12 (2019)

Number 2

Two-sample test for compositional data with ball divergence

Pages: 275 – 282

DOI: https://dx.doi.org/10.4310/SII.2019.v12.n2.a8

Authors

Jin Zhu (Southern China Center for Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China)

Kunsheng Lv (Southern China Center for Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China)

Aijun Zhang (Department of Statistics and Actuarial Science, University of Hong Kong)

Wenliang Pan (Southern China Center for Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China)

Xueqin Wang (Southern China Center for Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China)

Abstract

In this paper, we try to analyze whether the intestinal microbiota structures between gout patients and healthy individuals are different. The intestinal microbiota structures are usually measured by so-called compositional data, composed of multiple components whose value are typically non-negative and sum up to a constant. They are frequently collected and studied in many areas such as petrology, biology, and medicine nowadays. The difficulties to do statistical inference with compositional data arise from not only the constant restriction on the component sum, but also high dimensionality of the components with possible many zero measurements, which are frequently appeared in the 16S rRNA gene sequences. To overcome these difficulties, we first define the Bhattacharyya distance between two compositions such that the set of compositions is isometrically embedded in some spherical surfaces. And then we propose a two-sample test statistic for compositional data by Ball Divergence, a novel but powerful measure for the discrepancy between two probability measures in separable Banach spaces. Our test procedure demonstrates its excellent performance in Monte Carlo simulation studies even when the simulated data consist of thousand components with a high proportion of zero measurements. We also find that our method can distinguish two intestinal microbiota structures between gout patients and healthy individuals while the existing method does not.

Keywords

ball divergence, two-sample test, compositional data

2010 Mathematics Subject Classification

Primary 62P10. Secondary 62G10.

Received 3 May 2018

Published 11 March 2019