Contents Online
Statistics and Its Interface
Volume 17 (2024)
Number 2
Special issue on statistical learning of tensor data
Multi-way overlapping clustering by Bayesian tensor decomposition
Pages: 219 – 230
DOI: https://dx.doi.org/10.4310/23-SII790
Authors
Abstract
The development of modern sequencing technologies provides great opportunities to measure gene expression of multiple tissues from different individuals. The three-way variation across genes, tissues, and individuals makes statistical inference a challenging task. In this paper, we propose a Bayesian multi-way clustering approach to cluster genes, tissues, and individuals simultaneously. The proposed model adaptively trichotomizes the observed data into three latent categories and uses a Bayesian hierarchical construction to further decompose the latent variables into lower-dimensional features, which can be interpreted as overlapping clusters. With a Bayesian nonparametric prior, i.e., the Indian buffet process, our method determines the cluster number automatically. The utility of our approach is demonstrated through simulation studies and an application to the Genotype-Tissue Expression (GTEx) RNA-seq data. The clustering result reveals some interesting findings about depression-related genes in human brain, which are also consistent with biological domain knowledge. The detailed algorithm and some numerical results are available in the online Supplementary Material, available at $\href{https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}{ https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}.
Keywords
Bayesian nonparametric prior, gene expression data, Indian buffet process, low-rank tensor, mixture model
2010 Mathematics Subject Classification
Primary 62H30. Secondary 62F15.
The first two authors contributed equally to this work.
Received 27 September 2022
Accepted 9 March 2023
Published 1 February 2024