Phylogenetic clustering networks among heterosexual migrants with new HIV diagnoses post-migration in Australia

Background It is estimated that approximately half of new HIV diagnoses among heterosexual migrants in Victoria, Australia, were acquired post-migration. We investigated the characteristics of phylogenetic clusters in notified cases of HIV among heterosexual migrants. Methods Partial HIV pol sequences obtained from routine clinical genotype tests were linked to Victorian HIV notifications with the following exposures listed on the notification form: heterosexual sexual contact, injecting drug use, bisexual sexual contact, male-to male sexual contact or heterosexual sexual contact in combination with injecting drug use, unknown exposure. Those with heterosexual sexual contact as the only exposure were the focus of this study, with the other exposures included to better understand transmission networks. Additional reference sequences were extracted from the Los Alamos database. Maximum likelihood methods were used to infer the phylogeny and the robustness of the resulting tree was assessed using bootstrap analysis. Phylogenetic clusters were defined on the basis of bootstrap and genetic distance. Results HIV pol sequences were available for 332 of 445 HIV notifications attributed to only heterosexual sexual contact in Victoria from 2005–2014. Forty-three phylogenetic clusters containing at least one heterosexual migrant were detected, 30 (70%) of which were pairs. The characteristics of these phylogenetic clusters varied considerably by cluster size. Pairs were more likely to be composed of people living with HIV from a single country of birth (p = 0.032). Larger clusters (n≥3) were more likely to contain people born in Australian/New Zealand (p = 0.002), migrants from more than one country of birth (p = 0.013) and viral subtype-B, the most common subtype in Australia (p = 0.006). Pairs were significantly more likely to contain females (p = 0.037) and less likely to include HIV diagnoses with male-to-male sexual contact reported as a possible exposure (p<0.001) compared to larger clusters (n≥3). Conclusion Migrants appear to be at elevated risk of HIV acquisition, in part due to intimate relationships between migrants from the same country of origin, and in part due to risks associated with the broader Australian HIV epidemic. However, there was no evidence of large transmission clusters driven by heterosexual transmission between migrants. A multipronged approach to prevention of HIV among migrants is warranted.


Introduction
Heterosexual sexual transmission of HIV accounts for approximately 20% of new HIV notifications in Australia annually and has remained stable from 2008-2017, with migrants making up approximately 40% of these notifications [1][2][3][4][5][6]. Although many HIV infections among those migrating from high prevalence countries to low prevalence countries are likely to be acquired before migration, migrants may also be vulnerable to acquiring HIV in the destination country [7,8]. Results from a study of HIV in heterosexual migrants in the United Kingdom (UK) from 2004-2010 using CD4 cell counts at diagnosis and a subsequent study migrants diagnosed at 57 clinics across Europe between 2013-2015 using a combination of clinical and self-report data, indicated that 33% and 63% of diagnosed HIV cases, respectively, were likely to have been acquired post-migration [9,10]. We previously applied a CD4 count model to data from HIV notifications with heterosexual exposure to HIV in the Australian state of Victoria, with the results suggesting that approximately 50% of migrants were likely to have acquired their HIV post-migration to Australia [3].
Molecular epidemiology, including phylogenetic analysis, is increasingly being used as to monitor and characterise HIV transmission and inform public health and prevention responses [11][12][13][14]. Molecular epidemiology has also been used to demonstrate increases in non-B subtypes in Australia and several European countries [15][16][17][18][19][20][21][22][23], and to estimate the number of infections likely to have been acquired post-migration in some cases [22]. However, these studies have not examined the characteristics of phylogenetic clusters, beyond using them to determine acquisition post-migration. A mathematical model of HIV infection in the Netherlands found that highly assortative sexual mixing between migrants who came from the same region of origin resulted in higher HIV prevalence among migrants [24]. We used phylogenetic analysis and subtyping of routine HIV pol sequencing (undertaken as part of clinical care) to better understand heterosexual HIV risk and transmission among migrants and explore the role of networks in transmission.

Study population
The key population of interest was migrants in the Australian state of Victoria who were newly diagnosed with HIV and reported heterosexual sexual contact as the route of transmission.
Victoria is Australia's second most populous state and has the second highest number of people diagnosed with HIV and new HIV diagnoses [25,26].
We included all migrants with new HIV diagnoses aged �18 years notified to the Victorian Department of Health and Human Services, where the likely exposure to HIV was recorded as heterosexual sexual contact. People born in any country other than Australia or New Zealand, were classified as migrants, irrespective of citizenship status, and Australian and New Zealandborn individuals were classified as non-migrants. Consistent with previous analyses, New Zealanders were grouped with Australian-born people [3]. In addition, the following notifications were used to facilitate characterisation of phylogenetic clusters among cases classified as heterosexually acquired: all notifications in non-migrants aged�18 years attributed to heterosexual sexual contact and all notifications attributed to bisexual sexual contact or injecting drug use (including those attributed to injecting drug use and male-to-male sex [MSM]) in those �18 years irrespective of migrant status.

Data sources
The Victorian Department of Health and Human Services receives notifications of HIV from both the laboratories performing the test and the diagnosing doctors, as mandated by the Public Health and Wellbeing Act 2008 and its associated regulations. Notification data were obtained from 1 st January 2000 (as very little sequencing was done prior to 2000) to 30 th of June 2014. From 2000-2004 relatively few HIV notifications were sequenced. Therefore, although these sequences were included in the phylogenetic analysis to allow for possible clustering with later notifications, notifications from 2005 onward were the focus of the analysis.

Patient characteristics
Patient and associated diagnosis characteristics including date of diagnosis, patient demographics (sex and date and country of birth), clinical characteristics at the time of diagnosis (CD4 count, any reported symptomatology), HIV testing history (date and result of the previous HIV test), possible route/s of exposure, likely country of exposure and year of arrival to Australia (for people born outside of Australia or New Zealand) were extracted from notifications data. Multiple HIV exposures can be selected on the notification form: for the purpose of these analyses, exposures were categorised as heterosexual sex (where heterosexual sexual contact was the only reported exposure), bisexual (where MSM and heterosexual sexual contact were both listed on the form), heterosexual sex with injecting drug use, MSM with injecting drug use and other/unknown.

HIV sequence data
Sanger sequencing of the HIV pol gene is used in routine clinical practice to determine antiretroviral drug susceptibility and HIV subtype. Pre-treatment HIV pol sequences were identified for the period 2000-2014 in the Victorian Infectious Diseases Reference Laboratory (VIDRL) database and the Burnet Institute Clinical Research Laboratory HIV database, where clinical samples from patients with HIV infection receiving their clinical care at The Alfred hospital (a Victorian HIV care service), were routinely sequenced as part of HIV genotype testing during this period. Both laboratories participated in quality assurance programs. Data were linked with HIV notifications by matching name codes (the first two letters of cases' first name and surname) and date of birth, recorded in all three datasets.
Reference sequences were selected from the Los Alamos database using a stratified random sample, with strata defined by country of birth and viral subtype. Where available, three sequences were selected from each country of birth and viral subtype combination represented in the study sample. The final sample included 177 reference sequences.

Classifying notifications on the basis of the probable timing of HIV acquisition relative to migration
For our analyses, evidence of probable place of HIV acquisition was classified as "strong", "medium" or "weak" on the basis of testing history, laboratory evidence of recent infection and CD4 count at diagnosis. Those individuals with a clinic and/or laboratory-confirmed previous negative HIV test post-migration to Australia and those with laboratory evidence of recent infection (group IV indeterminate Western blot and HIV detected by virological assay) post-migration to Australia were classified as having strong evidence of HIV acquisition postmigration. Those with a self-reported previous negative HIV test post-migration to Australia were classified as having medium evidence of HIV acquisition post-migration.
Weak evidence of place of acquisition was classified on the basis of their CD4 count at diagnosis. We adopted the formulae used in the UK's HIV & AIDS New Diagnoses & Deaths Database [3,9] which estimates time since HIV acquisition based on modelled estimates of the median estimated CD4 counts at HIV acquisition and the median rate of CD4 decline after diagnosis. We used known CD4 count at diagnosis from our dataset to estimate time between HIV exposure and diagnosis which we subtracted from date of diagnosis to estimate date of acquisition and upper and lower confidence intervals for this estimate [3,9]. If the year of arrival to Australia was before the lower bound confidence interval of the estimated year of acquisition, a HIV case was classified as weak evidence of acquisition post-migration. All other cases including those where the bounds of the confidence interval around estimated year of acquisition included year of arrival were classified as weak evidence of having acquired their infection before migration [3].
Those notifications in migrants that could not be classified using any of the methods above (no evidence of recent infection, no negative test in Australia and either CD4 count or year of migration not available), location of HIV acquisition were classified as unknown.

Sequencing methods
A 1035 base-pair product spanning the entire coding region for protease (PR) and the first 246 codons of reverse transcriptase (RT) was amplified from HIV-1-specific RNA derived from 500 μL of plasma and sequenced using the ABI Prism reagents, hardware and software (Applied Biosystems, Foster, CA; HXB2 co-ordinates of the sequence dataset set are 2253-3287). The methods have been described in detail previously [15]. HIV strains were initially subtyped based on their pol sequences using the Stanford HIV database (http://hivdb.stanford. edu/). Subtype assignment was confirmed by submitting sequences to the Los Alamos database (http://www.hiv.lanl.gov) and the NCBI HIV genotyping tool (http://www.ncbi.nlm.nih.gov/ projects/genotyping/). If subtype assignment was unclear due to potential recombination, Recombinant Identification Program (https://www.hiv.lanl.gov/content/sequence/RIP/RIP. html) and Jumping Profile Hidden Markov Model (http://jphmm.gobics.de/) were used to assign the subtype.

Phylogenetic analysis
Sequences were aligned using the Bio Edit tool (mbio.ncsu.edu/BioEdit). Codons associated with drug resistance were removed to avoid clustering due to convergent evolution (hivdb. stanford.edu/s/who). The nucleotide substitution model was selected based on the Akaike Information Criteria. Maximum likelihood methods used to construct a phylogenetic tree using Mega 6.0 [27] based on the general time reversible nucleotide substitution model with gamma distribution and a proportion of invariable sites. The robustness of the resulting tree was assessed using bootstrap with 1000 replicates.
Phylogenetic clusters were defined on the basis of genetic distance (<0.045) and bootstrap support (>0.95) using the command line version of ClusterPicker version 1.2. Genetic distance was defined using ClusterPicker's "ambiguity" method which is the p-distance for A, C, T, G sites and sites with IUPAC ambiguity codes. The characteristics of phylogenetic clusters were visualised using Pajek 5.07.

Statistical analysis
Chi-squared (categorical variables) and Kruskal-Wallis tests (numeric variables) were used to assess differences in notification characteristics between migrants and non-migrants. Potential associations between phylogenetic cluster size and the composition of the clusters were assessed using Fisher's exact (categorical variables) and Kruskal-Wallis tests (numeric variables). The probability that phylogenetic clusters containing migrants from the same country of origin would be observed by chance was assessed by calculating the binomial probability of same country clusters being observed given that the largest group of migrants migrating from the same country was 45 and 876 sequences were included in the phylogenetic analysis.
We assessed factors associated with phylogenetic clustering using univariable and multivariable logistic regression associated with phylogenetic clustering, with a backward selection approach used to build the multivariable model which initially included variables whose pvalue was <0.25 in univariable regression. The cut-off for statistical significance for all analyses was p<0.05.

Ethics
Approval for the study including a waiver of informed consent for use of retrospective data was granted by The Alfred Office of Ethics and Research Governance (project 218/14).

Results
There were 445 new HIV diagnoses in Victoria between 2005-2014 in people reporting heterosexual sex as the only exposure (Fig 1). Of these, 332 (75%) had pol sequence data available. The characteristics of new HIV diagnoses with sequence data available are described in Table 1. Compared to individuals born in Australia/New Zealand, migrants were on average younger at notification (median 36 years vs 42 years, p = 0.013) and less likely to have evidence of newly acquired infection (9% vs 21%, p = 0.003). Among migrants, the most common region of birth was sub-Saharan Africa (35%) followed by South-East Asia (22%). The HIV-1 subtype distribution was substantially different among migrants (22% B, 41% C, 24% CRF01_ AE, 13% other) compared to Australian/New Zealand-born individuals (63% B, 12% C, 21% CRF01_AE, 4% other; p<0.001, Table 1). Availability of HIV partial pol sequence did not differ by age, sex, country of birth (Australian/New Zealand born vs migrant), or CD4 count.

Phylogenetic clustering
Overall, 876 sequences were included in the phylogenetic analysis, including reference sequences (Figs 1 and 2). Of these, 258 (29%) were in 92 phylogenetic clusters. Of the 332 notifications from 2005 onward where heterosexual sex was listed as the only exposure and pol sequences were available, 206 (62%) were from migrants and 126 (38%) were Australian/NZ born, with 118 (36%) samples in 65 phylogenetic clusters. Of these 118 samples identified in a cluster, 66 (56%) were migrants and 52 (44%) were Australian/NZ-born notifications. Based on multivariable analysis, newly acquired infection, region of birth ("Other" compared to sub-Saharan Africa) and B-subtype were independent predictors of phylogenetic clustering (S1 Table).
Australian Of the 65 phylogenetic clusters identified, 43 contained at least one migrant who reported heterosexual sex as the only mode of transmission (Fig 3). The size of the clusters ranged from 2-10 with 30 clusters (70%) consisting of pairs (cluster size = 2) (Fig 3). The characteristics of the pairs were considerably different to those of the larger clusters (Table 2, Fig 3). Compared to clusters larger than two, pairs were more likely to include at least one female (87% vs. 54% of clusters larger than 2, p = 0.037) and more likely to be composed of migrants from the same

PLOS ONE
country of origin (40% vs 0% of clusters larger than 2; p = 0.032). Larger clusters were more likely to include bisexual/MSM-IDU (70% vs. 17% of pairs; p<0.001), people born in Australia/New Zealand (including people from multiple countries of origin) (92% vs 40% of pairs; p = 0.002) and people from multiple countries of origin that do not include Australia (62% vs. 21% of pairs, p = 0.013) The proportion of clusters that included heterosexual migrants with any (p = 0.890) or strong (p = 1.000) evidence of HIV acquisition after migration did not differ by cluster size (Table 2).
Of the 43 phylogenetic clusters including 67 migrants who reported heterosexual sex as their only mode of transmission, 22 individuals (33%) were in 12 clusters with only migrants from the same country (Tables 2 & 3). Of the 22, HIV acquisition was estimated to have occurred before migration for nine, seven after migration and six were unknown ( Table 3). Given that there were 876 sequences in the analysis and the largest group migrating from the same country was 45, the probability of observing a same-country pair by chance was p = 0.003. Therefore, the observation of 12 same country pairs overall and seven same country pairs with evidence of infection acquired after migration, were statistically significant (both p<0.001). Nonetheless, these observed infections in single-country clusters accounted for a minority (7/29; 24%) of all Australian acquired infections among migrants.

Discussion
This study examined the characteristics of phylogenetic clusters involving heterosexual migrants living in Victoria, Australia from 2005-2014. The majority (70%) of 43 clusters that included at least one migrant with heterosexual risk were pairs and all clusters with more than one migrant from a single country of origin were pairs. The number of same country migrant pairs that we observed was significantly greater than expected by chance, including among pairs with evidence of post-migration acquisition of HIV. Whilst this suggests that the sexual networks among migrants from the same country of origin does contribute to the risk of HIV infection after migration, the overall number of same country-of-origin clusters was small and confined to pairs rather than larger clusters. In addition, the risk of heterosexually acquired infection among migrants appears to be partially attributable to sex between migrants from the same country (within couples) and partially attributable to risk from sex with those born in Australian/New Zealand, including those reporting male-to-male sex.
The eight clusters consisting of four or more notifications included mainly men, people born in Australian/NZ and migrants from multiple countries of origin, and those reporting male-to-male sex as a possible route of HIV acquisition. These clusters were also mainly viral subtype B, the predominant HIV subtype transmitted through male-to-male sex in Australia PLOS ONE [5]. These findings suggest that migrants in larger clusters are predominantly acquiring their HIV after migration and HIV risk is occurring through sexual contact with people born in Australia/New Zealand and may include male-to-male sex. Further work is required to see how well-connected sequences from migrants are with those from the Australian gay community. It is possible that some of the reported heterosexual exposures in these clusters were due to male-to-male sexual exposure that was not reported due to social desirability bias, which

Timing of HIV infection among heterosexual migrants in the cluster
Cluster includes heterosexual migrants with any evidence of HIV infection after migration: n (%) 18  may be higher among migrants from countries or cultures where male-to-male sex is stigmatised. A recent study of Australian HIV diagnoses in migrants found that diagnoses attributable to male-to-male sex appeared to be increasing in migrants from several regions of origin, although this was based only on a comparison between two periods rather than an analysis of trends [6].
To the best of our knowledge, the only published study of HIV transmission networks in migrants is a mathematical modelling study of the heterosexual HIV epidemic in the Netherlands, which investigated the effects of sexual mixing patterns on the epidemic [24]. Modelled estimates based on empirical data on sexual mixing and sexual behaviour from cross-sectional surveys suggested that the majority of HIV infections in migrants from high prevalence countries were acquired in the Netherlands, but the epidemic was very stable due to low levels of risk behaviour among migrants and highly assortative mixing within migrants from the same region. That study focussed on only migrants from high and medium HIV prevalence regions whereas our study was on all heterosexual migrant notifications. Sexual mixing and sexual behaviour among migrants to Australia may differ from migrants to the Netherlands. Nonetheless, a key finding of the modelling study was that the heterosexual epidemic in the Netherlands was stable due to low levels of risk behaviour among migrants from high prevalence countries. This finding is consistent with the findings of our study in that phylogenetic clusters consistent with heterosexual transmission alone were small (all n�3 and most n = 2).
The results of our study suggest the risk of HIV acquisition among heterosexual migrants to Australia post-migration is from a variety of sources; for some this is within country of origin networks including transmission before or after migration and for others it is following migration and is related to the Australian predominantly MSM HIV epidemic. The proportion with evidence of acquisition after migration did not vary with cluster size, highlighting that HIV risk for migrants after migration is hybrid, both from small clusters which are likely to represent couples that are often composed of migrants from the same country of birth, and from risks associated with the Australian epidemic, which is largely driven by male-to-male sex. Within country of origin transmission networks accounted for a minority of transmission and the infections that occurred post-migration may have been preventable by timely diagnosis and ART treatment of partners living with HIV. Unfortunately previous findings from Australia and other high income countries are that migrants are at increased risk of delayed diagnosis [3,[28][29][30][31][32] and delayed ART treatment [33]. While there remains structural barriers to HIV testing and care for migrants in Australia, including a lack of access for some migrants to universal health coverage [33], qualitative research has also found that diverse groups of migrants to Australia perceive HIV risk in Australia to be low [34]. In addition to changes to eligibility universal health coverage that include migrants, non-stigmatising and culturally appropriate education about the risk of HIV and STIs in Australia in migrant groups may also be needed.

Limitations
Analysis of partial pol sequences obtained through routine clinical practice continues to be a key strategy for population genomic studies in HIV [11]. However, while phylogenetic clustering implies closely related infection, it does not necessarily imply direct transmission. It is possible that there were other infections in the transmission networks that have not yet been diagnosed, were diagnosed in another state or country and therefore were not notified in Victoria, were diagnosed prior to the study period, or did not have a pol sequence available. It is possible that some cases that were classified as having acquired HIV infection post-migration to Australia were exposed to HIV whilst travelling back to their country/region of birth or other regions abroad, which was not measured in this study. Furthermore, it was not possible to describe the possible transmission networks of those who were not in phylogenetic clusters.

Conclusions
To the best of our knowledge, this is the first study to investigate the characteristics of possible HIV transmission networks specifically among heterosexual migrants. We found that postmigration risk of HIV infection in heterosexual migrants was attributable in part to sexual networks among migrants from the same country of origin (mostly pairs), but also attributable to larger sexual networks that included those born in Australia and New Zealand and those reporting male-to-male sex. A multipronged approach to prevention is warranted including promoting timely diagnosis and treatment of HIV infected migrants and non-stigmatising culturally appropriate education about HIV risk in Australia for migrants.
Supporting information S1