Statistics and Its Interface

Volume 3 (2010)

Number 4

Group variable selection via a hierarchical lasso and its oracle property

Pages: 557 – 574

DOI: https://dx.doi.org/10.4310/SII.2010.v3.n4.a13

Authors

Nengfeng Zhou (Consumer Credit Risk Solutions, Bank of America, Charlotte, North Carolina, U.S.A.)

Ji Zhu (Department of Statistics, University of Michigan, Ann Arbor, Mich., U.S.A.)

Abstract

In many engineering and scientific applications, prediction variables are grouped, for example, in biological applications where assayed genes or proteins can be grouped by biological roles or biological pathways. Common statistical analysis methods such as ANOVA, factor analysis, and functional modeling with basis sets also exhibit natural variable groupings. Existing successful group variable selection methods have the limitation of selecting variables in an “all-in-all-out” fashion, i.e., when one variable in a group is selected, all other variables in the same group are also selected. In many real problems, however, we may want to keep the flexibility of selecting variables within a group, such as in gene-set selection. In this paper, we develop a new group variable selection method that not only removes unimportant groups effectively, but also keeps the flexibility of selecting variables within a group. We also show that the new method offers the potential for achieving the theoretical “oracle” property.

Keywords

group selection, lasso, oracle property, regularization, variable selection

Published 1 January 2010