A Virus Taxonomy Classification Framework
Tree visualisation features in ViCTreeView
ViCTree pipeline is currently set up for Densovirinae sub-family. Trees and alignments are updated when new sequences are submitted to the Genbank and if the newly added sequences form a new cluster that is not previously identified.
Densovirinae subfamily analysis is based on the Non structural protein 1 sequences. 21 seed sequences were used to carry out initial as well all subsequent analysis for the subfamily level taxonomic classification of the viruses.
Clustering threshold of 1.0 was applied to cluster sequences that were 100% identical. There is an optional parameter -u
to specify a file with a list of protein accession numbers that are accepted by ICTV as representative species for the family or the subfamily. This option allows the users to keep the static branches expanding phylogeny consistent.
Multiple sequence alignment and pairwise distance are calculated for the sequences included in the final set after the clustering step. Sequence metadata information is collected for this set of sequences. This includes the information about the genome accession, scientific name, taxonomy ID, taxonomy lineage, genus and NCBI URL. This information enables users to customise labels for the tips of the tree in the visualisation module. It also provide a link-out to NCBI genome sequence page for each sequence represented in the tree. A tree with rapid bootstrap analysis and best-scoring maximum likelihood with PROTGAMMAJTT (default) model is generated using RAxML.
The tree with the pairwise distance matrix and the metadata tables are then automatically submitted to the ViCTreeView module of the pipeline for the visualisation.
Pre-optimised blast parameter for a hit length 100 (-l 100)
with the coverage 50 (-c 50)
was specified.
CD-HIT clustering parameters were determined using the current pairwise distance criteria used for the classification of the species and genus within the virus family or subfamily.
For the Densovirinae subfamily, the species and genus level classification is based on 15% and 30% pairwise distances respectively where pairwise distances are represented as percentage. Hence, sequences below these threshold were clustered together using CD-HIT identity criteria of 1.0.
This step also helps to reduce the complexity of the tree by choosing the longest or user-defined representative -u
for the clusters generated by CD-HIT.
-m
parameter in ViCTree.
Filename | Contents |
---|---|
Seed set | All protein sequences used as seed set for Densovirinae example. |
Final set | The final set of sequences selected after BLAST and Clustering step. This set is used for Multiple Sequence Alignment. |
Alignments | Protein sequence alignments for the final set of sequences. |
Pairwise distance matrix | The pairwise distance matrix for the final set of sequences. |
Phylogenetic tree | Phylogenetic tree generated by the ViCTree pipeline in the newick format. |
By using the ViCTree pipeline, we identified all previously classified genera and species in the subfamily Densovirinae (Cotmore et al., 2014), as well as six new species that have been submitted to the ICTV for approval. Five new species for genus Ambidensovirus and one new species for genus unassigned are identified. Further details of these species are provided in the table below.
Name of new species | Representative isolate | GenBank accession number | Genus |
---|---|---|---|
Asteroid ambidensovirus 1 | Sea star-associated densovirus | KM052275 | Ambidensovirus |
Decapod ambidensovirus 1 | Cherax quadricarinatus densovirus | KP410261 | Ambidensovirus |
Hemipteran ambidensovirus 2 | Dysaphis plantaginea densovirus 1 | FJ040397 | Ambidensovirus |
Hemipteran ambidensovirus 3 | Myzus persicae densovirus 1 | AY148187 | Ambidensovirus |
Hymenopteran ambidensovirus 1 | Solenopsis invicta densovirus | KC991097 | Ambidensovirus |
Orthopteran densovirus 1 | Acheta domestica mini ambidensovirus | KF275669 | Unassigned |
Following figure shows a snapshot of the Densovirinae tree generated by the ViCTree analysis, visualised in the ViCTreeView module of the framework.
The ViCTree framework is developed by :
Sejal Modha (@sejmodha), Anil Thanki (@anilthanki) and Joseph Hughes (@josephhughes).