Modeling the upper tail of the distribution of facial recognition non-match scores

Hunter, Brett D.; Cooley, Daniel; Givens, Geof H.; Beveridge, J. Ross

doi:10.4310/SII.2017.v10.n4.a13

Contents Online

Statistics and Its Interface

Volume 10 (2017)

Number 4

Modeling the upper tail of the distribution of facial recognition non-match scores

Pages: 711 – 725

DOI: https://dx.doi.org/10.4310/SII.2017.v10.n4.a13

Authors

Brett D. Hunter (Department of Statistics, George Mason University, Fairfax, Virginia, U.S.A.)

Daniel Cooley (Department of Statistics, Colorado State University, Fort Collins, Colo., U.S.A.)

Geof H. Givens (Givens Statistical Solutions LLC, Fort Collins, Colorado, U.S.A.)

J. Ross Beveridge (Department of Computer Science, Colorado State University, Fort Collins, Colo., U.S.A.)

Abstract

In facial recognition applications, the upper tail of the distribution of non-match scores is of interest because existing algorithms classify a pair of images as a match if their score exceeds some high quantile of the non-match distribution. We develop a general model for the non-match distribution above $u_{\tau}$, the $(1-\tau)$th quantile, borrowing ideas from extreme value theory. We call this model the $\mathrm{GPD}_{\tau}$ , as it can be viewed as a reparameterized generalized Pareto distribution (GPD). This novel model treats $\tau$ as fixed and allows us to estimate $u_{\tau}$ in addition to parameters describing the tail. Inference for both $u_{\tau}$ and the $\mathrm{GPD}_{\tau}$ scale and shape parameters is performed via M-estimation, where our objective function is a combination of the quantile regression loss function and $\mathrm{GPD}_{\tau}$ density. By parameterizing $u_{\tau}$ and the $\mathrm{GPD}_{\tau}$ parameters in terms of available covariates, we gain understanding of these covariates’ influence on the tail of the distribution of non-match scores. A simulation study shows that our method is able to estimate both the set of parameters describing the covariates’ influence and high quantiles of the non-match distribution. We apply our method to a data set of non-match scores and find that covariates such as gender, use of glasses, and age difference have a strong influence on the tail of the non-match distribution.

Keywords

generalized Pareto, M-estimation, quantile regression

Full Text (PDF format)

Published 30 May 2017