A Virus Taxonomy Classification Framework
Case Study: Densovirinae Example
Tree visualisation features in ViCTreeView
ViCTree can be installed on any UNIX/LINUX machine. This pipeline is tested and works on Ubuntu machines with multiple cores.
Fork the repository to add to your GitHub account. This is required for the visualisation as the tree data is fetched from the user's GitHub account. For more information see: Setting up GitHub repository for visualisation on this page.
ViCTree can be downloaded from GitHub using clone utility.
A sample command to clone ViCTree is:
git clone --recursive https://github.com/yourUserName/ViCTree.git
Note: Please replace the URL with your ViCTree repository URL to enable automated GitHub data upload.
The following programs must be available in the $PATH environment.
Note: Do not forget to run bash setup.sh
after downloading eutils to ensure all dependencies are installed properly. This is required to run xtract
utils.
Binary executables of a number of tools are available to download from GitHub repository with their correponding licenses for MACOSX and Linux versions.
Note: Programs versions described above are tested and work with ViCTree pipeline. Later version of these programs should work with ViCTree. Please contact the developers for version compatibility issues.
Download ViCTree package or clone ViCTree from this page and make sure all scripts are available in $PATH.
Usage: ViCTree [OPTIONS]
-t Taxa ID - INT(Required)
-s Seedset in fasta format (Required)
-l Hit Length for BLAST - INT(Required)
-c Coverage for BLAST -INT(Required)
-h This helpful message
-m Specify model for RAxML (Default is PTRGAMMJTT)
-i Identity for clustering sequences using cdhit
-n Output name of the virus family or sub-family (Required e.g. Densovirinae)
-p Number of threads
-u User-defined list of accession numbers to be set as cluster representatives
To run the ViCTree pipeline you will need to identify two mandatory parameters.
It is recommended to use protein accession number as header in order to retrieve metadata from NCBI as part of the pipeline.
The following command will launch the ViCTree analysis pipeline for Densovirinae sub-family.
ViCTree.sh -t 40120 -s txid40120_seeds.fa -l 100 -c 50 -p 10 -i 1.0 -n Densovirinae
user_id, repo_name
and branch
parameters to your forked repository in the following lines in the file index.html in ViCTreeView sub-directory.
var user_id = "josephhughes"
var repo_name = "ViCTree"
var branch = "master"
var dir = "ViCTreeView/data"
Note: If you have forked the repository without changing the name then just update the user_id
to your username in this file.
When the pipeline is run for the first time a folder with the taxID name is created that saves all the output files generated by the pipeline.
The main output files generated by the pipeline include:
Filename | Contents |
---|---|
taxID.fa | All protein sequences downloaded from NCBI for the specified taxID |
taxID_final_set.fa | The final set of sequences used for Multiple Sequence Alignment |
taxID_tree.nhx | The final tree generated and rerooted by RAxML |
taxID_clustalo_dist_mat.csv | The pairwise distance matrix file |
taxID_metadata | The metadata file for each sequence from taxID_final_set.fa that exists in NCBI |
ViCTreeView is a visualisation plugin for ViCTree. This is a customised phylogenetic tree visualisation plugin developed using D3.js. It reads input files from data directory of this repository and displays the phylogenetic tree.
ViCTreeView has a scroll bar at the top that corresponds to the pairwise distances between the nodes of the tree. User can select any value between 0 and 100 based on which clusters within the tree can be highlighted.
The ViCTree framework is developed by :
Sejal Modha (@sejmodha), Anil Thanki (@anilthanki) and Joseph Hughes (@josephhughes).