Statistics and Its Interface

Volume 9 (2016)

Number 3

A choice model with a diverging choice set for POI data analysis

Pages: 355 – 363

DOI: https://dx.doi.org/10.4310/SII.2016.v9.n3.a9

Authors

Xiaoling Lu (Center for Applied Statistics, Data Mining Center, School of Statistics, Renmin University of China, Beijing, China)

Junlong Zhao (School of Mathematics and Systems Science, Beihang University, Beijing, China)

Yu Chen (Guanghua School of Management, Peking University, Beijing, China)

Hansheng Wang (Guanghua School of Management, Peking University, Beijing, China)

Abstract

A point of interest (POI) is a geographical location, that might carry interest for the public. A POI provides a convenient way to register people’s locations through mobile devices, which leads to POI data. POI data contain accurate location information and are extremely valuable for location based services (LBS). Accordingly, principled statistical methods, which can be used for regression and/or prediction are required. To partially fulfill this theoretical gap, we propose a conditional logit approach for POI choice analysis. This new model is a natural extension of the classical choice model (McFadden, 1974, 1978) but with two key characteristics. First, POIs located far away from the current position are less likely to be selected as the next POI choice. As a result, the distance (or its appropriate transformation) between the current position and the next POI candidate is an important predictor and should be included in the model. Second, the classical choice model considers a finite choice set. By contrast, the new model studies a diverging choice set, mainly because the total number of POI locations in practice is typically large. The diverging choice set produces an expensive computation of the maximum likelihood estimation (MLE). To alleviate computational costs, we further propose a constrained maximum likelihood estimation (CMLE) method. Compared with MLE, CMLE utilizes only those POIs located within a reasonable distance. This prioritization leads to a significant reduction in computation at a reasonable efficiency loss. To demonstrate the finite sample performance of the method, numerical studies based on both simulated and real datasets are presented.

Keywords

choice model, constrained maximum likelihood estimation, diverging choice set, location based service, point of interest

Published 27 January 2016