Statistics and Its Interface

Volume 9 (2016)

Number 2

Energy bagging tree

Pages: 171 – 181

DOI: https://dx.doi.org/10.4310/SII.2016.v9.n2.a5

Authors

Taoyun Cao (Southern China Research Center of Statistical Science, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, China)

Xueqin Wang (Southern China Research Center of Statistical Science, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, China; and Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China)

Heping Zhang (Southern China Research Center of Statistical Science, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, China; and Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut, U.S.A.)

Abstract

This paper introduces Energy Bagging Tree (EBT) for multivariate nonparametric regression problems. The EBT makes use of a measure of dispersion constructed from a generalized Gini’s mean difference as node impurity, and the tree split function therefore corresponds to the product of energy distance and descendants’ proportions. As a nonparametric extension of the between-sample variation in the analysis of variance, this measure of dispersion serves well for EBT in understanding certain complex data. Extensive simulation studies indicate that EBT is highly competitive with existing regression tree methods. We also assess the performance of the EBT through a real data analysis on forest fires.

Keywords

multivariate nonparametric regression, energy bagging tree, energy distance, generalized Gini’s mean difference

2010 Mathematics Subject Classification

Primary 62G08, 62H20. Secondary 62P12.

Published 4 November 2015