Statistics and Its Interface

Volume 12 (2019)

Number 2

Incorporating deep features in the analysis of tissue microarray images

Pages: 283 – 293

DOI: https://dx.doi.org/10.4310/SII.2019.v12.n2.a9

Authors

Donghui Yan (Department of Mathematics, University of Massachusetts Dartmouth, Mass., U.S.A.)

Timothy Randolph (Fred Hutchinson Cancer Research Center, Seattle, Washington, U.S.A.)

Jian Zou (Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, Massachusetts, U.S.A.)

Peng Gong (Department of ESPM, University of California at Berkeley; and Department of Earth System Science, Tsinghua University, Beijing, China)

Abstract

Tissue microarray (TMA) images have been used increasingly often in cancer studies and the validation of biomarkers. TACOMA—a cutting-edge automatic scoring algorithm for TMA images—is comparable to pathologists in terms of accuracy and repeatability. Here we consider how this algorithm may be further improved. Inspired by the recent success of deep learning, we propose to incorporate representations learnable through computation. We explore representations of a group nature through unsupervised learning, e.g., hierarchical clustering and recursive space partition. Information carried by clustering or spatial partitioning may be more concrete than the labels when the data are heterogeneous, or could help when the labels are noisy. The use of such information could be viewed as regularization in model fitting. It is motivated by major challenges in TMA image scoring—heterogeneity and label noise, and the cluster assumption in semi-supervised learning. Using this information on TMA images of breast cancer, we have reduced the error rate of TACOMA by about 6%. Further simulations on synthetic data provide insights on when such representations would likely help. Although we focus on TMAs, learnable representations of this type are expected to be applicable in other settings.

Keywords

tissue microarray images, automatic scoring, hierarchical clustering, recursive space partitioning, deep representation learning

2010 Mathematics Subject Classification

62P10

Received 1 December 2017

Published 11 March 2019