Communications in Information and Systems

Volume 19 (2019)

Number 1

Virus classification based on Q-vectors

Pages: 81 – 94

DOI: https://dx.doi.org/10.4310/CIS.2019.v19.n1.a5

Authors

Hui Zheng (Department of Mathematics, Statistics, and Computer Science, University of Illinois, Chicago, Il., U.S.A.; and AbbVie Inc.)

Jie Yang (Department of Mathematics, Statistics, and Computer Science, University of Illinois, Chicago, Il., U.S.A.)

Rong L. He (Department of Biological Sciences, Chicago State University, Chicago, Illinois, U.S.A.)

Stephen S.-T. Yau (Department of Mathematical Sciences, Tsinghua University, Beijing, China)

Abstract

Based on a Markov model, we propose a new alignment-free method, Q-vector (QV), for sequence analysis. It incorporates the length information of viral sequences and could reflect the relationship between low mers and high mers. Compared with the $k$-mer and composition vector methods, QV method is significantly more efficient and accurate in classifying viral genomes. By incorporating the distance matrices derived by the QV and natural vector, respectively, we define a new distance matrix for classifying viral genomes and reduce the classification errors even further. We also construct the phylogenetic trees based on the new distance.

This work is supported by National Natural Science Foundation of China grant (#91746119).

Published 18 April 2019