Fig 1.
Inclusion of 5’ LTR region in HIV sequence database.
(a) Completeness of 5’ LTR region in recognized Subtype Reference Alignments provided in the HIV sequence database. By August 17, 2021, among 129 the currently identified groups, subtypes, sub-subtypes, and CRFs representatives, only A1, A4, A6, B, L, CRF02_AG, CRF03_A6B, CRF04_cpx, CRF06_cpx, CRF08_BC, CRF09_cpx, CRF11_cpx, CRF12_BF, CRF26_A5U, CRF27_cpx, CRF30_0206, CRF60_BC, O, and P have 5’ LTR region. (b) Entries summary of HIV-1 sequences in the Los Alamos National Laboratory HIV sequence database by August 17, 2021.
Fig 2.
The ML phylogenetic tree built using the screened 76 solo 5’ LTRs sequences.
The phylogenetic tree clearly showed confirmed distinct subtypes clusters, all supported by very strong bootstrap values. The best-fitting model of nucleotide substitution was GTR+G+I by using MEGA X. Tree topologies were searched using subtree-pruning-and-regrafting level 3 (SPR level 3) and the initial tree was made automatically (Default-NJ/BioNJ). The confidence of each node in the phylogenetic tree was determined using the bootstrap test with 500 replicates and values below 50% are not shown. The red squares represent 20 references identified from gold-standard sequences.
Fig 3.
The ML phylogenetic analysis using 302 full-length genomes and 5’ LTRs extracted from them, respectively.
(a) The maximum likelihood phylogenetic analysis by MEGA X based on 302 full-length sequences containing 5’ LTRs sequences. Sequences clustered in subtypes-specific branches with very high bootstrap values. (b) The maximum likelihood phylogenetic analysis by MEGA X based on extracted 5’ LTRs derived from the 302 full-length sequences. The confidence of each node in phylogenetic trees was determined using the bootstrap test with 500 replicates and values below 50% are not shown. Colored ranges represent different groups, subtypes, sub-subtypes, and CRFs. The red squares represent 20 provisional references identified from gold-standard sequences.
Table 1.
5’ LTR reference sequence statistics.
Fig 4.
The phylogenetic tree of 83 definitive screened reference representatives of HIV-1 5’ LTRs.
Representatives of each confirmed group, subtype, sub-subtype, and CRF were aligned and a phylogenetic tree was constructed using the maximum likelihood method by MEGA X. The results indicated that major distinct clusters are all with reliable bootstrap support and values below 50% are not shown (replicates = 500). Tips are labeled by groups/subtypes/sub-subtypes/CRFs, sample country, sample year, sequence name, and accession number. Colored ranges represent different groups, subtypes, sub-subtypes, and CRFs. The red squares represent 20 references identified from gold-standard sequences. The purple triangles represent 22 references identified from solo 5’ LTRs. The blue dots represent 41 references identified from 5’ LTR sequences with complete genome.
Fig 5.
The recombination characterization of 5’ LTR of CRF02_AG references.
The recombination analysis and phylogenetic analysis confirm that the 5’ LTRs of all 5 CRF02_AG representatives are recombinants with the same mosaic manner (including 02_AG.LR.-.POC44951.AB485636 from gold standard representatives in Subtype Reference Alignments). (a) Updated genome maps of CRF02_AG 5’ LTR. The top picture shows the original mosaic structure from the Los Alamos HIV sequence database. The below picture displays the updated recombinant details of 5’ LTR. The standard representatives are marked by different clors, as indicated. (b) The phylogenetic confirmation of the recombination pattern by sub-region trees using extracted fragments. The left picture shows the phylogenetic relationship of the region spanning HXB2 nt 1–277 and 424-634bp with all classified 5’ LTRs representatives in the current work. The right picture shows the phylogenetic relationship of the region spanning HXB2 nt 278-423bp with all classified 5’ LTRs representatives in the current work. The tree was constructed using the Maximum likelihood method implemented in MEGA X. The values at the nodes indicate the percent bootstraps in which the cluster to the right was supported. Colored ranges represent different groups, subtypes, sub-subtypes, and CRFs. The red triangles represent recombinants.
Fig 6.
The reliability test of newly established 5’ LTR classification system by identifying the 5’ LTR assignment of the clinical isolates in China.
(a) The ML phylogenetic tree was built using 19 amplified HIV-1 5’ LTR sequences together with 83 HIV-1 5’ LTR references. These 19 5’ LTR contained no recombination and showed good clustering with strong bootstrap support. The black dot represents 16 5’ LTRs having consistent subtype assignment with that of the 3 other regions including gag, pol, and env. The black square represents 3 5’ LTRs having inconsistent subtype assignments with that of the 3 other regions. Colored ranges represent different groups, subtypes, sub-subtypes, and CRFs. (b) The same recombination pattern of the 2 5’ LTR recombinants (HB010151 and HB010161) and the subsequent confirmation by sub-region trees. The 2 recombinants share a common mosaic pattern. The sub-region phylogenetic analysis confirmed that the extracted segments have the right subtype assignment as indicated by recombination analysis. The green square represents recombinants. The blue triangle represents major parent. The yellow dot represents minor parent. (c) The recombination pattern of HB030133 and the subsequent confirmation by sub-region trees. The sub-region phylogenetic analysis confirmed that the extracted segments have the right subtype assignment as indicated by recombination analysis. The green square represents recombinant. The blue triangle represents major parent. The yellow dot represents minor parent.
Table 2.
Assignments of 4 regions of the 22 amplified samples.
Fig 7.
Venn diagram representation of the significant changes of transcription factors caused by recombination at 5’LTR.
(a) Change of transcription factors via recombination between the CRF07_BC and the subtype B at 5’ LTR (the recombinant HB020068, HB070040, and HB070063). The green indicates specific transcription factors of subtype B. The blue indicates specific transcription factors of CRF07_BC. The overlap indicates transcription factors shared by subtype B and CRF07_BC. (b) Change of transcription factors via recombination between the recombinant HB010151, HB010161, and the parent strain HB010165 at 5’ LTR. The yellow indicates specific transcription factors of HB010151. The pink indicates specific transcription factors of HB010161. The cyan indicates specific transcription factors of HB010165. The overlap of yellow and pink indicates transcription factors shared by HB010151 and HB010161. The overlap of pink and cyan indicates transcription factors shared by HB010161 and HB010165. The overlap of yellow and cyan indicates transcription factors shared by HB010151 and HB010165. The overlap of yellow, pink, and cyan indicates transcription factors shared by HB010151, HB010161, and HB010165. (c) Change of transcription factors via recombination between the recombinant HB030133 and its parent strain HB070068 at 5’ LTR. The purple indicates specific transcription factors of HB030133. The wheaten indicates specific transcription factors of HB070068. The overlap indicates transcription factors shared by HB030133 and HB070068.