Table 1.
Data set description: Subtypes and sources.
Number of sequences of different subtypes and from different sources. LA data contained 6 sequences from Cuba: 3 of subtype G and 3 CRF19.
Table 2.
Data set description: Locations.
Number of sequences from different locations.
Fig 1.
reconstructed for the three data sets: A1+CRF19 (top left), G+CRF19 (top right), and D+CRF19 (bottom centre). Nodes and branches are coloured by location (ACR performed by PastML, MPPA+F81), as explained in the legend in bottom left: Cuba is coloured red, African countries are brown apart from the the DRC (violet), European countries are green, Asian countries are blue, American countries (Brazil and the USA) are orange, Australia is yellow. Colourstrips around the trees show sample subtypes: A1 (light-blue), D (yellow), G (light-orange), CRF19 (red), and A1-A3 recombinant (blue, note that this sequence was used in the A1+CRF19 data set as its part that aligns with the A1-part of CRF19 is fully A1). CRF19 sequences form a cluster for all data sets. Subtype misannotations lead to several D- and A1-annotated sequences within the corresponding CRF19 clusters, and one CRF19-annotated sequence (CU1437–17) outside of the CRF19 cluster in the D+CRF19 tree. Dates (with CIs, reconstructed by LSD2) and countries (with marginal probabilities, reconstructed by PastML) of the CRF19 cluster MRCAs and their non-CRF19 (i.e. A1, D or G) parent nodes are shown. The dates of non-recombinant Cuban clusters are also shown.
Table 3.
Dates and CIs/HPD of the MRCA of CRF19 and its non-CRF19 parent, etimated on different data sets.
Fig 2.
Phylogeography (province level) of CRF19.
ACR and visualisation are done with PastML [18] (MPPA+F81). Branches are coloured by provinces (colour code is explained below), gray thinner branches correspond to a province change. The root is unresolved between Villa Clara and La Habana. The oldest resolved node and its date are indicated and correspond to Villa Clara (marginal probability 0.92), the dates and marginal probabilities of the main introductions from Villa Clara to La Habana are also marked.
Fig 3.
ACR for the transmission mode of CRF19.
ACR is done with PastML [18] (MPPA+F81), the visualisation is performed with iTOL [19]. Nodes and branches are coloured by transmission mode (green for HET, beige for MSM), gray thinner branches correspond to a state change, gray nodes have unresolved states (MSM or HET). The colourstrip around the tree shows the gender of the sampled individuals (red for female, blue for male), and we see several transmissions from the MSM community to female HET individuals.
Table 4.
Statistics on SDRMs present in at least one treatment-naive CRF19 sequence.
Fig 4.
ACR for the presence of RT:M41L DRM in the CRF19-infected individuals.
ACR is done with PastML [18] (MPPA+F81), the visualisation is performed with iTOL [19]. Nodes and branches are coloured by RT:M41L (violet for resistant (the mutation is present), light-green for sensitive). The colourstrip around the tree shows the treatment status of the sampled individuals (dark-green for treatment-naïve, lilac for treatment-experienced).
Fig 5.
Possible transmission scenarios for two samples.
An example internal node that has two tip children, A and B, and their sampling and diagnostics dates are shown on the left. As the internal node corresponds to a transmission, three scenarios are possible: (middle left) A is the donor, therefore the transmission must have happened before the recipient (B) got diagnosed; (middle right) B is the donor, therefore the transmission must have happened before the recipient (A) got diagnosed; (right) the donor was another individual that was not sampled and does not appear in the tree, therefore that transmission must have happened before both A and B got diagnosed. In practice, however, we cannot know which of the three scenarios is the correct one and therefore have to pick the more recent diagnostics date (B, Sep 2014) as the upper-bound constraint for the internal node.
Fig 6.
To reconstruct the ancestral states for characters annotated at diagnostics dates (such as provinces of Cuba), we added additional nodes to the time-scaled trees at times of diagnostics, named those nodes (on the middle panel, the diagnostics nodes, e.g. a, correspond to the tips with the same names but capitalised, e.g. A), and associated the input annotations for PastML with them. We then reconstructed ancestral and tip characters with PastML based on those annotations (see text for calculation details).
Fig 7.
To reconstruct the ancestral character states, resistant (violet) or sensitive (light-green), for a DRM (e.g. RT:M41L), we cut the time-scaled tree at the date of introduction to Cuba of the first ARV (as, for example, AZT for the DRM RT:M41L, used in Cuba since 1987) that can provoke this DRM (left panel), obtaining the pre-treatment-introduction tree (upper part of the tree) and a forest of post-treatment-introduction subtrees (bottom part). For the trees in the forest we then marked their roots as sensitive (middle left panel). We performed the ACR with PastML on the forest (middle right panel) and combined the results with the the all-sensitive annotation for the pre-treatment-introduction tree nodes (right panel).