Contents Online
Statistics and Its Interface
Volume 2 (2009)
Number 3
Support vector machines with disease-gene-centric network penalty for high dimensional microarray data
Pages: 257 – 269
DOI: https://dx.doi.org/10.4310/SII.2009.v2.n3.a1
Authors
Abstract
With the availability of gene pathways or networks and accumulating knowledge on genes with variants predisposing to diseases (disease genes), we propose a disease-gene-centric support vector machine (DGC-SVM) that directly incorporates these two sources of prior information into building microarray-based classifiers for binary classification. DGC-SVM aims to detect genes clustering together and around some key disease genes in a gene network. Toward this end, we propose a penalty over suitably defined groups of genes. A hierarchy is imposed on an undirected gene network to facilitate the definition of such gene groups. Our proposed DGC-SVM utilizes the hinge loss penalized by a sum of the $L_{\infty}$-norm over each group. The simulation studies show that DGC-SVM not only detects more disease genes along pathways than the existing standard-SVM and SVM with an $L_1$-penalty (L1-SVM), but also captures disease genes that potentially affect the outcome only weakly. Two real data applications demonstrate that DGC-SVM improves gene selection while retaining predictive performance of the standard-SVM and L1-SVM. The proposed method has the potential to be an effective classification tool that encourages gene selection along paths to or clustering around known disease genes for microarray data.
Keywords
DAG, gene expression, gene network, grouped penalty, hierarchy, penalization
Published 1 January 2009