Human T-Cell Lymphotropic Virus Type 1 Subtype C Molecular Variants among Indigenous Australians: New Insights into the Molecular Epidemiology of HTLV-1 in Australo-Melanesia

Background HTLV-1 infection is endemic among people of Melanesian descent in Papua New Guinea, the Solomon Islands and Vanuatu. Molecular studies reveal that these Melanesian strains belong to the highly divergent HTLV-1c subtype. In Australia, HTLV-1 is also endemic among the Indigenous people of central Australia; however, the molecular epidemiology of HTLV-1 infection in this population remains poorly documented. Findings Studying a series of 23 HTLV-1 strains from Indigenous residents of central Australia, we analyzed coding (gag, pol, env, tax) and non-coding (LTR) genomic proviral regions. Four complete HTLV-1 proviral sequences were also characterized. Phylogenetic analyses implemented with both Neighbor-Joining and Maximum Likelihood methods revealed that all proviral strains belong to the HTLV-1c subtype with a high genetic diversity, which varied with the geographic origin of the infected individuals. Two distinct Australians clades were found, the first including strains derived from most patients whose origins are in the North, and the second comprising a majority of those from the South of central Australia. Time divergence estimation suggests that the speciation of these two Australian clades probably occurred 9,120 years ago (38,000–4,500). Conclusions The HTLV-1c subtype is endemic to central Australia where the Indigenous population is infected with diverse subtype c variants. At least two Australian clades exist, which cluster according to the geographic origin of the human hosts. These molecular variants are probably of very ancient origin. Further studies could provide new insights into the evolution and modes of dissemination of these retrovirus variants and the associated ancient migration events through which early human settlement of Australia and Melanesia was achieved.


Introduction
The Human T-lymphotropic virus type 1 (HTLV-1) is the first described human oncoretrovirus [1]. HTLV-1 infection is associated with a universally fatal malignancy, adult T-cell leukemia/lymphoma (ATLL), and with inflammatory disorders, the prototype of which is HTLV-1 associated myelopathy/tropical spastic paraparesis (HAM/TSP) [2]. HTLV-1 infects at least 5 to 10 million people worldwide [3]. It is widely distributed, with substantial clusters of high endemicity in certain geographic areas and ethnic groups in Southwestern Japan, sub-Saharan Africa, South America, the Caribbean basin and smaller endemic foci in Iran and Australo-Melanesia [3]. Seven main molecular HTLV-1 subtypes are currently recognized, predominantly from nucleotide sequence analysis of the LTR region. These are the Cosmopolitan subtype (a) that has spread worldwide, five African subtypes (b, dg) and an Australo/Melanesian subtype (c), which is found only in Oceania [4,5,6,7,8]. The limited horizontal transmission of HTLV-1 and its clustering in certain ethnic/geographic foci have encouraged the use of its very slow in vivo genetic drift as a mean of studying the origin and modes of dissemination of this retrovirus as well as the movements of ancient infected populations [4,9,10].
Characterization of HTLV-1c variants was initially performed by Yanagihara et al. among a small group of hunter-horticulturalists, the Hagahai, who live in the fringe highlands of Papua New Guinea (PNG) [11,12,13] and among people of Melanesian descent in the Solomon Islands [13,14]. Subsequently, efforts have been made to characterize new HTLV-1c subtype isolates in the neighboring territories of Australia [15,16] and the Vanuatu archipelago [9,17]. Nevertheless, our understanding of the molecular virology of the HTLV-1c subtype remains largely based on partial genome sequences of the gp21 env gene and LTR regions [9,10,13]. Indeed, only a single complete HTLV-1c subtype nucleotide sequence has been published to date, the original MEL5 human isolate from the Solomon Islands [18]. Previous studies indicate that HTLV-1 is also endemic to central Australia where high HTLV-1 seropositivity rates have been documented among Indigenous adults admitted to the sole regional hospital [19,20]. Indeed, cases of ATL, HAM/TSP and infective dermatitis have now been described and an association between HTLV-1 infection and bronchiectasis has also been reported among Indigenous Australians [15,21,22,23,24,25]. Interestingly, many clinical cases arise from the same central Australian region suggesting that environmental and/or viral factors may contribute to the etiology of HTLV-1 related diseases in this population. Unfortunately, only partial HTLV-1 nucleotide sequences are available for a single HTLV-1 strain from an Indigenous Australian [15], precluding any understanding of genetic variability in this population. Establishing a large HTLV-1 sequence database is thus essential for any study of the epidemiology and pathogenicity of HTLV-1c subtype.
The aim of the present study was therefore to describe the HTLV-1 genotypes infecting Indigenous central Australian residents and to correlate the results of the HTLV-1 nucleotide sequence variability with the geographic origin of the individuals living within this vast region of approximately 1,000,000 km 2 .

Studied populations and data collection
Our work was performed using HTLV-1 isolates obtained from a large series of patients who were initially enrolled to HTLV-1 pathogenesis studies between October 2007 and August 2010 [22]. Plasma and peripheral blood buffy coat (PBBC) samples were obtained from 23 HTLV-1 infected patients who presented to Alice Springs Hospital, predominantly with bronchiectasis. Presumed place of origin was determined from language group and/ or place of residence in infancy (figure 1). Numerous Indigenous languages are spoken in this region; however, for the purpose of the present study, these were divided into two groups according to the predominant geographic areas in which they are spoken; i) Northern (comprising the Ngarrkic language groups) and ii) Southern (comprising both Arandic and Western Desert language groups) (table 1). Also included in the present study were PBBC samples from four Natives of the Vanuatu archipelago. These were collected by us during work in Vanuatu between 2003 and 2005, as has been described previously [9,17].

Ethics statement
Written informed consent was given by all patients for their blood to be used for the pathogenicity studies, which included the molecular characterization of the HTLV-1 viral strains. The Central Australian Human Research Ethics Committee (CAH-REC) approved this study (CAHREC Ref: 2011.11.01).

HTLV-1 serologic analyses and molecular screening
The plasma and PBBC samples were transferred to Institut Pasteur, Paris, and stored at 280uC until HTLV-1 analysis. Plasma HTLV-1 antibodies were tested by a particle agglutination (PA) technique (Serodia HTLV-1, Fujirebio, Tokyo, Japan) and by an indirect immunofluorescence assay (IFA) using the HTLV-1transformed human T cell lines MT2. All samples were also tested by Western blot assay (WB) (HTLV Blot 2.4, MP Biomedicals Asia Pacific Pte. Ltd., Singapore).
To obtain the 522-bp fragment of the gp21 env gene, 1 mg of DNA from the Aus-GN strain was subjected to 2 series of PCR as previously described [27].

Author Summary
The Human T-lymphotropic virus type 1 (HTLV-1) infects at least 5-10 million persons worldwide. In Oceania, previous studies have shown that HTLV-1 is present in a few ancient populations from remote areas of Papua New Guinea, the Solomon Islands, the Vanuatu archipelago and central Australia. The latter comprise one of the most socioeconomically disadvantaged groups within any developed country. Characterization of the few available HTLV-1 viruses from Oceania indicates that these belong to a specific HTLV-1 genotype, the Australo-Melanesian csubtype. In this study, we provide details for 23 HTLV-1 viruses derived from the Indigenous population of central Australia, a vast remote area of 1,000,000 km 2 . We reveal considerable genetic diversity of HTLV-1c subtype viruses and the existence of two HTLV-1c clades within which a high degree of genetic diversity was also apparent. These newly described HTLV-1c clades clustered according to the geographic origin of their human hosts. Indigenous Australians from the North of central Australia harbor HTLV-1c subtype viruses that are distinct from those of individuals from regions to the South. These data suggest that HTLV-1 was probably introduced to Australia during ancient migration events and was then confined to isolated Indigenous communities in central Australia.

Phylogenetic analyses
Both strands of each PCR product were sequenced, and the ClustalW algorithm (MacVector 6.5 software, Oxford Molecular) was implemented to align forward and reverse sequences of each segment to derive a consensus sequence of the full LTR (758-bp) region, a fragment of the gp21 env gene (522-bp), colinearized gagtax (2,346-bp) and gag-pol-env-tax (7567-bp) genes (figure 2). Phylogenetic trees were generated from multiple alignments of the LTR region and gp21 env together with the colinearized gag-tax and gag-pol-env-tax genes. Included in the phylogenetic analyses were the 23 new proviral sequences from Australia and the four novel sequences from Vanuatu (ESH18, ESW44, EM5, PE376) that were characterized in the present study together with appropriate sequences of previously characterized strains from PNG (MEL1, MEL2 and MEL7), the Solomon Islands (MEL3 to MEL6 and MEL 8 to MEL10), Vanuatu (PE376, VAN54, VAN136, VAN251, VAN335) and Australia (MSHR-1). Additional representative sequences of the HTLV-1 a, b, d-g subtypes available in Genbank were also included.
The sequences were aligned using the DAMBE program (version 4.2.13) [28]. Absence of saturation of the alignment was confirmed by 2 methods: likelihood mapping (model TN93; non uniform substitution) with Tree-Puzzle software (version 5.2) and the test of Xia and Xie [28] with the DAMBE program. The final alignment was submitted to the Modeltest program (version 3.6) and the best model was selected according to the Akaike information criterion. This was then applied to phylogenetic analyses using the PAUP program (version 4.0b10) to infer trees according to both Neighbor-Joining (NJ) and Maximum Likelihood (ML) methods. To test the robustness of the tree topologies, 1,000 bootstrap replicates were performed. Numbers applied to the nodes of the tree (bootstrap values) indicate frequencies of occurrence for 100 trees. The quartet puzzling algorithm included   in the Tree-Puzzle software was applied for the maximum likelihood method [29].

Divergence time estimation
In order to estimate the divergence time between the different clades, we initially performed a typical molecular clock analysis [9]. This method was not conclusive. Indeed, the sequences seem to be too short (when considering the low mutation rate for HTLV-1) to be informative under the different models. We therefore estimated the divergence time using the previously reported mutation rates for HTLV-1 [30]. A theoretical ancestral sequence was initially determined for each monophyletic clade, and the number of mutation events required to generate the reported current sequences was then calculated. Finally, this average number of mutation events was multiplied by the known HTLV-1 mutation rate [30]. Although the technique is rough, the estimated date for the Vanuatu/Solomon node is 7,440 years BP (31,000-3,700), which is consistent with the date previously proposed (i.e. 10,000 years ago) [9].

Results
The 23 HTLV-1 infected individuals from Australia included ten women (mean age 49.7 years, range 27-70) and 13 men (mean age 42.2 years, range 16-67) (table 1). The four samples from Vanuatu were obtained from 3 women (mean age 59, range 40-76) and a 61-year old man. All plasma samples exhibited a complete HTLV-1 pattern in WB and the presence of HTLV-1 provirus was investigated in the DNA of these 27 individuals.

Gp21 env gene fragment analyses
The primary purpose of our work was to study the molecular relationship between the new Australian HTLV-1 proviral strains and those from HTLV-1 infected individuals in Australia and the neighboring islands whose sequences have been previously published. For most strains, the only available sequences in the Genbank database are the 522-bp fragment of the gp21 env gene. We therefore compared the gp21 env gene fragments of seven proviral sequences from Australia, including five new proviral strains (Aus-Cs, Aus-DF, Aus-NR, Aus-GN and Aus-GM) and two previously characterized sequences (MSHR-1 and Aus-RDJ) (Genbank: M92818 and JX891480, respectively) [15,25], with HTLV-1 proviral strains from PNG (MEL1, MEL2 and MEL7), Vanuatu (EM5, VAN54, VAN136, VAN251 and PE376) and the Solomon Islands (MEL3 to MEL6 and MEL8 to MEL10) [9,13,18,31]. Phylogenetic analyses performed with both NJ and ML methods clearly demonstrate the existence of three subgroups: ''Papua New Guinean'', ''Solomon/Vanuatu'' and ''Australian'', within the HTLV-1c subtype. Furthermore, inside the Australian subgroup, which comprises all the 7 Australian proviral strains, two distinct clades are now observed. The first clade includes strains derived from 3 patients (Aus-CS, Aus-DF and Aus-NR) whose origins are in the North of central Australia plus the two published sequences (MSHR-1 and Aus-RDJ), and the second comprises 2 patients (Aus-GN and Aus-GM) from the South of central Australia (figures 3A and 3B).

Long Terminal Repeat region analyses
Complete LTR proviral sequences were obtained for all 27 samples by PCR amplification of both LTR-gag and Px-LTR fragments. Alignment of the 746-bp LTR fragments for these 27 strains revealed no significant deletion or insertion in comparison to the HTLV-1 ATK-1 reference strain. Within group comparisons of the 23 new Australian HTLV-1 strains indicate that they are closely related to each other (range of nucleotide similarity, 99.5%-100%), though quite divergent from the LTR strains from Vanuatu (nucleotide similarity range, 94.5% -95.3%) and from the known HTLV-1c subtype prototype strain from the Solomon Islands (MEL5) (nucleotide similarity range, 95.3%-95.5%).
Using both NJ and ML methods, the phylogenetic analyses of the LTR region revealed two distinct subgroups within the Australo-Melanesian HTLV-1c subtype. The first group includes  4A and 4B). Unfortunately, no complete LTR proviral sequence from PNG is available in the sequences databases. However, phylogenetic analysis based on a partial 627bp fragment of the LTR region, including the PNG-1 strain from Papua New Guinea (Genbank: M85207), clearly confirmed the existence of three distinct clades within the HTLV-1c subtype (data not shown) [32]. Phylogenetic analyses using both gp21 env and LTR fragments were consistent with the existence of two Australian HTLV-1c clades. The nature of this relationship was further clarified by comparing larger genomic fragments using both ML and NJ methods.

Colinearized gag-tax genomic fragment analyses
A comparison of the concatenated and aligned 2,346-bp fragment of the gag-tax genes revealed a high degree of nucleotide homology among the Australian strains (range, 98.9%-100%) and a comparable degree of divergence of the Australian strains relative to those from both Vanuatu (range, 94.1%-97.5%) and the Solomon Islands (range, 94.4%-96.9%). Additional phylogenetic analyses using both NJ and ML methods of the colinearized gag-tax (2,346bp) genomic fragment, confirmed the tree topology derived from the gp21 env and LTR analyses and demonstrated the existence of an Australian subgroup that was highly supported phylogenetically (bootstrap value $ 99%) (figures 5A and 5B). Interestingly, the Australian subgroup can be further subdivided into two clades, for which bootstrap values are also statistically significant ($ 91%). The first includes HTLV-1 strains derived from most patients of Northern origin (6/7) and the second comprises a majority of individuals (14/16) from the South. Furthermore, a high genetic diversity exists within both Australian clades with sub-clades also supported by high bootstrap values ($ 90%).

Colinearized gag-pol-env-tax genomic fragment analyses
These analyses were performed using the complete proviral sequences obtained from four Australian samples: Aus-CS, Aus- NR, Aus-DF and Aus-GM (Genbank: KF242506, JX891479, KF242505 and JX891478 respectively). In addition, two of these complete proviral sequences were selected as the representative prototypes of each Australian clade (''Northern'' clade, Aus-NR; ''Southern'' clade, Aus-GM. The general genomic organization of these two prototypic sequences is similar to that of HTLV-1 prototypes ATK-1 and MEL5 strains (Genbank: J02029 and L02534, respectively). The overall range of nucleotide divergence of the first complete Australian strains from the prototypes ATK-1 and MEL5 was 7.8-8% and 3.3-3.4% respectively. The nucleotide homology between these two Australian prototypic sequences was 98.9% (102 differences over 9,046-bp).
Phylogenetic analyses using both NJ and ML methods of the colinearized gag-pol-env-tax (6,000-bp) genomic fragments, including HTLV-1, HTLV-2 and HTLV-3 representative sequences available in Genbank, confirmed the existence of an Australian subgroup that was highly supported phylogenetically (figures 6A and 6B).

Time divergence of the Australian clades
Finally, we estimated the time of divergence for the various Australian strains using the evolution rate of the HTLV-1 LTR region, which has previously been determined by Lemey [30]. These calculations suggest that divergence between the Vanuatu/Solomon and Australian subgroups occurred 20,400 years ago (85,000-10,200) and that speciation of the two Australian clades followed 9,120 years ago (38,(0)(1)(2)(3)(4)500). The estimated date for the Vanuatu/Solomon node in the present study was 7,440 years ago (31,000-3,700), which is consistent with our previous estimate (10,000 years ago) [9].

Discussion
The origin of most HTLV-1 subtypes appears to be linked to ancient and multiple episodes of interspecies transmission between STLV-1-infected non-human primates (NHPs) and humans [5,33,34,35]. Indeed, Old-World NHPs constitute a large reservoir for different lineages of STLV-1, and the virus is considered as transmissible to humans through body fluid contacts [27,36,37,38,39]. The very high homology between some STLV-1 and HTLV-1 strains, particularly the b and d-f subtypes, suggests that interspecies transmission to humans is probably ongoing in some areas of West and central Africa and results from close contacts during the hunting or butchering of NHPs [8,37,38,40,41,42].
Despite the presence of STLV-1-infected NHP species in Asia, there is no evidence of recent interspecies transmission in this area HTLV-1c Subtype Variants in Indigenous Australians PLOS Neglected Tropical Diseases | www.plosntds.org [43]. Furthermore, monkeys have never been endemic to the Australo-Melanesian region, indicating that interspecies transmission of STLV-1 to humans could not have occurred in these islands [44]. Therefore, HTLV-1c is likely to have been acquired by the ancestors of the Indigenous peoples of Australo-Melanesia as a result of interspecies transmission from NHPs during their migration through South-East Asia and prior to reaching the highlands of Papua New Guinea [5,10,43,45]. The subsequent migratory movements of this ancestral population then resulted in the radiation of HTLV-1c throughout the Australo-Melanesian region.
A first wave of migration led to the progressive colonization of the Solomon Islands, followed by the Vanuatu archipelago and finally, New Caledonia and the neighboring Melanesian islands. Consistent with the common origin of these Melanesian populations are our analyses performed on gp21 env, the LTR and the colinearized gag-tax and gag-pol-env-tax genes, which confirm that HTLV-1 strains from the Solomon Islands and Vanuatu belong to the same subgroup. Based on a combination of paleo-anthropological data and genomic DNA analyses, it is believed that the initial human settlement of the Solomon archipelago dates from the Paleolithic period, ca. 30,000 years ago [46], while Vanuatu was settled much latter, during the Neolithic period, ca. 10,000 years ago [47]. In previous phylogenetic and molecular-clock analyses, we suggested that the HTLV-1c proviral strains from the Indigenous people of Vanuatu and the Solomon Islands emerged from a common ancestor ,10,000 years ago [9], which is consistent with data presented here (ca. 7,440 yrs ago).
A second wave of migration is likely to have occurred from PNG to Australia. Indeed, it is thought that the occupation of Sahul, the continent formed when glacio-eustatically lowered sea levels exposed dry land connections between Australia and Papua New Guinea, may have commenced 45,000 years ago and continued until the end of the Pleistocene period 12,000 years ago [48,49,50]. Recently, Rasmussen and colleagues presented evidence derived from the gene flow between populations, which indicates that present-day Indigenous Australians are descendants of the earliest humans to occupy Australia and that they represent one of the oldest continuous populations outside Africa [51].
In the present study, we reveal a high degree of genetic diversity among the HTLV-1c subtype proviral strains that infect the Indigenous people of central Australia. At least two different HTLV-1c genetic clades exist in this Indigenous population and these cluster according to the geographic origin of their human hosts. Thus, it is possible to propose that the common ancestor of the modern Australian HTLV-1 strains arrived in Australia when a group originating from the ancestral HTLV-1 infected population migrated from PNG and settled in Australia. Subsequently, the Australian population split (ca. 9,000 yrs ago) leading to continued viral evolution among small, isolated clan groups of Indigenous people dwelling in the remote desert regions of central Australia and this resulted in the speciation of the two Australian clades. The broad ethno-geographic distinction between the Indigenous human hosts of these Australian clades is particularly interesting given that considerable movement of Indigenous people has resulted from a century of European dominance in this region. Thus, Northern Ngarrkic speaking clan groups were moved to the South while Western Desert groups moved to the East, in each case toward ration supply centers that were established nearer the major regional center of Alice Springs [52]. Prior to colonization contact between these groups is likely to have been minimal. The data presented here therefore describes probably the molecular epidemiological expression of both longterm evolution and more recent human movements that were driven by colonization.
Further studies, which characterize the HTLV-1c proviral strains that infect other Indigenous populations elsewhere in Australia and Oceania, will provide new insights into the origin of these retroviruses, potentially enhancing our understanding of the pathogenicity, evolution and modes of dissemination of these HTLV-1c variants and their human hosts.