Communications in Information and Systems

Volume 20 (2020)

Number 1

Identifying zero-inflated distributions with a new R package iZID

Pages: 23 – 44

DOI: https://dx.doi.org/10.4310/CIS.2020.v20.n1.a2

Authors

Lei Wang (Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China)

Hani Aldirawi (Department of Mathematics, Statistics, and Computer Science, University of Illinois, Chicago, Il., U.S.A.)

Jie Yang (Department of Mathematics, Statistics, and Computer Science, University of Illinois, Chicago, Il., U.S.A.)

Abstract

Count data with a large portion of zeros arise naturally in many scientific disciplines. When conducting one-sample Kolmogorov–Smirnov (KS) test for count data, the estimated p-value is biased due to plugging in sample estimates of unknown parameters. As a consequence, the result of a KS test could be too conservative. In the newly developed R package “iZID” for zero-inflated count data, we use bootstrapped Monte Carlo estimates to overcome the bias issue in estimating p-values, as well as bootstrapped likelihood ratio tests for zero-inflated model selection. Our new package also provides miscellaneous functions to simulate zero-inflated count data and calculate maximum likelihood estimates of unknown parameters. Compared with other R packages available so far, our package covers more types of zero-inflated distributions and provides adjusted p-value estimates after incorporating the influence of unknown model parameters. To facilitate the potential users, in this paper we provide detailed descriptions of functions in “iZID” and illustrate the use of them with executable R code.

Keywords

count data, hurdle model, Kolmogorov–Smirnov test, model selection, zero-inflated distribution

2010 Mathematics Subject Classification

Primary 62F10, 62G10. Secondary 62F40.

The third-listed author was supported in part by NSF grant DMS-1924859.

Received 9 September 2019

Published 17 April 2020