Contents Online
Statistics and Its Interface
Volume 9 (2016)
Number 4
Special Issue on Statistical and Computational Theory and Methodology for Big Data
Guest Editors: Ming-Hui Chen (University of Connecticut); Radu V. Craiu (University of Toronto); Faming Liang (University of Florida); and Chuanhai Liu (Purdue University)
Model diagnostics in reduced-rank estimation
Pages: 469 – 484
DOI: https://dx.doi.org/10.4310/SII.2016.v9.n4.a7
Author
Abstract
Reduced-rank methods are very popular in highdimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding. Under Stein’s unbiased risk estimation framework, we propose a set of tools, including leverage score and generalized information score, to perform model diagnostics and outlier detection in large-scale reduced-rank estimation. The leverage scores give an exact decomposition of the so-called model degrees of freedom to the observation level, which lead to exact decompositions of many commonly-used information criteria; the resulting quantities are thus named information scores of the observations. The proposed information score approach provides a principled way of combining the residuals and leverage scores for anomaly detection. Simulation studies confirm that the proposed diagnostic tools work well. A pattern recognition example with hand-writing digital images and a time series analysis example with monthly U.S. macroeconomic data further demonstrate the efficacy of the proposed approaches.
Keywords
big data, information score, model diagnostics, multivariate regression, outlier detection, reduced-rank estimation
2010 Mathematics Subject Classification
Primary 62M10. Secondary 62J12.
Published 14 September 2016