PhyloBot: A Web Portal for Automated Phylogenetics, Ancestral Sequence Reconstruction, and Exploration of Mutational Trajectories

doi:10.1371/journal.pcbi.1004976

Fig 1.

Summary of PhyloBot automated pipeline.

A user begins by uploading a collection of orthologous protein sequences in a FASTA-formatted text file. PhyloBot reads the sequence collection and launches its automated analysis pipeline, which includes sequence alignment, phylogenetic model-fitting, tests of branch support, ancestral sequence reconstruction, and prediction of functional genetics. Upon completion, the results can be viewed in a web browser.

More »

Expand

Table 1.

Software incorporated in the PhyloBot analysis pipeline.

PhyloBot uses several existing software tools at various stages in its automated analysis pipeline.

More »

Expand

Fig 2.

Screenshots from the PhyloBot web portal.

(A) The front page of the portal provides a control panel to create new jobs and to check the status of existing jobs. In this image, a user has five jobs; three of them are 100% complete and the other two are in progress. (B) A user can view detailed status for every job they create. The status page provides controls to start, stop, reset, and delete the job, in addition to displaying the job’s settings and the job’s current status.

More »

Expand

Fig 3.

Example of alignment robustness analysis.

In this simple example, orthologous amino acid sequences from five species were aligned using three different methods for multiple sequence alignment: Muscle, MSAProbs, and MAFFT. (A) PhyloBot maps the aligned position of every character across all alignments. Shown in red is the character map for the amino acids aligned into site 3 of the Muscle alignment. In the MSAProbs sequence alignment, these same residues are split across sites 3 and 4. In the MAFFT alignment, these residues are split across sites 3, 4 and 5. (B) PhyloBot displays the character map as pie charts expressing site identity relative to the Muscle alignment. PhyloBot will also show these maps relative to MSAProbs and MAFFT alignments.

More »

Expand

Fig 4.

Example of ancestral node robustness analysis.

In this small example with protein sequences from five species, maximum likelihood phylogenies were inferred using four different evolutionary models (JTT+GAMMA, JTT+CAT, LG+GAMMA, and LG+CAT) based on three different sequence alignment methods (Muscle, MSAProbs, and MAFFT). The resulting ML phylogenies disagree in their topologies, and an ancestral node in one tree may not exist in other trees. For example, shown in red is the phylogenetic node corresponding to the most-recent ancestor of H. sapiens, M. musculus, and G. gallus, with X. tropicalis and T. teleost as the outgroup. This ancestral node is not inferred to exist when using some combinations of models and methods. Specifically, the alternate phylogenies support an evolutionary hypothesis in which the sequences from G. gallus and X. tropicalis are sister to each other. PhyloBot gathers this information about all reconstructed ancestral nodes, in order to assess the extent to which an ancestor’s existence is robust across different models and methods.

More »

Expand

Fig 5.

Screenshots from the PhyloBot ancestral library viewer.

The images shown come from the Ancestral Library computed for the CMGC protein family [31]. (A) The library viewer displays an interactive tree for exploring reconstructed protein ancestors. Users select the maximum likelihood tree based on the alignment method and evolutionary model, and then click on ancestral nodes within that tree. (B) PhyloBot gathers summary statistics about every ancestral node. Shown here is the support summary for ancestral Node 401 in the CMGC family, reconstructed using msaprobs and PROTCATLG. The histogram bins the sequence sites of Node 401 according to their amino acid probability support. In this case, a majority of sites have support of 0.9 or greater. The line graph expresses the probability of the maximum likelihood amino acid residue, along with the second-best and third-best reconstructed residues; the line graph is a quick way to visually determine which protein domains were reconstructed with strong support. In this example, there is an unstructured region in the C-terminus that was reconstructed with low support. (C) PhyloBot shows details about every site in every reconstructed ancestor. Shown here is the probability support by site for Node 401 in CMGC. Users can optionally map this data to extant sequences. For example, here a user selected Homo sapiens CDK6. In the table the first column displays the sequence site in the MSAProbs alignment, the second column expresses the site number and best amino acid state in the reconstructed ancestor Node 401, the third column expresses the site number and amino acid state in Homo sapiens CDK6, the fourth column expresses the full probability distribution of all amino acid states reconstructed at that site in Node 401.

More »

Expand