Statistics and Its Interface

Volume 6 (2013)

Number 2

Testing the statistical significance of an ultra-high-dimensional naïve Bayes classifier

Pages: 223 – 229

DOI: https://dx.doi.org/10.4310/SII.2013.v6.n2.a6

Authors

Baiguo An (Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun, Jilin Province, China)

Jianhua Guo (Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun, Jilin Province, China)

Hansheng Wang (Guanghua School of Management, Peking University, Beijing, China)

Abstract

The naïve Bayes approach is one of the most popular methods used for classification. Nevertheless, how to test its statistical significance under an ultra-high-dimensional (UHD) setup is not well understood. To fill this important theoretical gap, we propose a novel testing statistic with a standard normal asymptotic null distribution, even if the predictor dimension is considerably larger than the sample size. This makes the proposed method useful for UHD data analysis. Simulation studies are presented to demonstrate its finite sample performance and a text classification example is described for illustration.

Keywords

binary predictor, hypothesis testing, naïve Bayes, supervised learning, text classification, ultra-high-dimensional data

2010 Mathematics Subject Classification

62H30

Published 10 May 2013