Estimating the Timing of Mother-to-Child Transmission of the Human Immunodeficiency Virus Type 1 Using a Viral Molecular Evolution Model

Background Mother-to-child transmission (MTCT) is responsible for most pediatric HIV-1 infections worldwide. It can occur during pregnancy, labor, or breastfeeding. Numerous studies have used coalescent and molecular clock methods to understand the epidemic history of HIV-1, but the timing of vertical transmission has not been studied using these methods. Taking advantage of the constant accumulation of HIV genetic variation over time and using longitudinally sampled viral sequences, we used a coalescent approach to investigate the timing of MTCT. Materials and Methods Six-hundred and twenty-two clonal env sequences from the RNA and DNA viral population were longitudinally sampled from nine HIV-1 infected mother-and-child pairs [range: 277–1034 days]. For each transmission pair, timing of MTCT was determined using a coalescent-based model within a Bayesian statistical framework. Results were compared with available estimates of MTCT timing obtained with the classic biomedical approach based on serial HIV DNA detection by PCR assays. Results Four children were infected during pregnancy, whereas the remaining five children were infected at time of delivery. For eight out of nine pairs, results were consistent with the transmission periods assessed by standard PCR-based assay. The discordance in the remaining case was likely confused by co-infection, with simultaneous introduction of multiple maternal viral variants at the time of delivery. Conclusions The study provided the opportunity to validate the Bayesian coalescent approach that determines the timing of MTCT of HIV-1. It illustrates the power of population genetics approaches to reliably estimate the timing of transmission events and deepens our knowledge about the dynamics of viral evolution in HIV-infected children, accounting for the complexity of multiple transmission events.


Introduction
The prevention of mother-to-child transmission (MTCT) of the human immunodeficiency virus (HIV) is one of the main global health issues and is one of the World Health Organization's key priorities. HIV-1 still infects about two million children worldwide, with 330,000 new infections reported in children in 2011 (Report on the Global AIDS Epidemic 2012, Geneva, World Health Organization). Without preventive intervention, probability of MTCT is 15-30% in high-income countries and up to 25-40% in sub-Saharan Africa, mainly due to differences in infant feeding practices [1]. In non-breastfeeding populations, and in the absence of chemoprophylaxis and targeted obstetrical prevention measures, transmission occurs in utero in approximately one third of the cases (mostly late in the third trimester of pregnancy) and intrapartum in the remaining two thirds [1,2].
Application of various phylogenetic approaches to HIV-1 has led to reconstruction of HIV-1 evolutionary history within and between hosts with good precision [3]. At the epidemiological level, history and origin of the HIV-1 epidemic have been successfully explored through molecular phylogeny [4][5][6], and these approaches also provided insights into clusters of HIV-1 transmission [7][8][9]. Moreover, application of these molecular evolutionary processes at the intra-host scale were implemented to investigate HIV-1 compartmentalization dynamics, disease progression, and adaptive response to drug therapy [10][11][12][13][14]. If traditional phylogenetic analyses establish evolutionary relatedness, recent developments that incorporate sample date information have permitted new approaches [15]. Specifically, the introduction of molecular clock in phylogenetic inference allowed more precise investigations into the evolutionary processes in order to infer molecular phylodynamics of HIV-1 [16]. Additionally, longitudinally obtained sequences from patients have been more recently used to investigate transmission direction and to estimate date of infection [17][18][19][20][21].
Aside from the descriptions of rare early in utero transmission events [22], the timing of transmission has been mostly estimated using partially indirect dynamic models [2]. Such probabilistic approaches exclusively rely on the virus detection at successive sampling time points after delivery [23]. Molecular variability constitutes valuable additional information, and viral sequences sampled at different time points can be used to investigate the rate and population dynamics of intra-host HIV-1 evolution [6,[24][25][26][27][28][29][30]. We hypothesized that this approach could provide a good approximation of the timing of MTCT by estimating the time of most recent common ancestor (TMRCA) of viruses in the child. In this study, we used a coalescent approach within a Bayesian statistical framework to investigate the timing of MTCT. Using time series of cellular HIV-1 DNA and plasma HIV-1 RNA env sequences from nine mother-child pairs, we compared these quantitative results with the classic biomedical definition based on serial PCR that more qualitatively discriminates between transmission periods (i.e. in utero vs. intrapartum).

Ethics Statement
The initial protocol and its amendments were approved by the ethics committees of the Thai Ministry of Public Health, Chiang Mai University, and the Harvard School of Public Health. All study sites complied with regulations of the Department of Health and Human Services for the protection of research subjects. Women were enrolled at 28 weeks' gestation if they provided written informed consent.

Biological material
Nine mother-infant pairs, enrolled in the Perinatal HIV Prevention Trial-1 (PHPT-1) in Thailand were studied [31].The infants were not breastfed. HIV-1 infection status was determined by PCR-based HIV-1 proviral DNA detection, using DNA extracted from peripheral blood collected longitudinally within 48 hours of birth, at six weeks, four months, and six months of age. We considered the infant to have been infected in utero if the first test (using blood collected within 48 hours of birth) was positive and intrapartum if the first test result was negative but the following test results were positive [23]. HIV-1 env gene sequences were obtained from the mothers at delivery and from sequential samples collected at different times in their babies starting from the first positive PCR result up to several months or several years after birth (table 1). Extraction, amplification, cloning, and sequencing of env genes were done as previously described [32]. Briefly, genomic DNA was extracted from peripheral blood and viral RNA was extracted from plasma samples. A 1.2 kb fragment covering almost the entire HIV-1 env gp120 gene (from upstream V1 to downstream V5) was amplified by nested polymerase chain reaction (PCR) or PCR following reverse transcription (RT-PCR) using subtype-specific primers [32]. For each sample, several independent PCRs were pooled before cloning and sequencing in order to be representative of the viral diversity. All sequences were deposited in GenBank under accession numbers HM121341 to HM121962. A preliminary phylogenetic analysis using reference sequences from all the major HIV-1 clades showed that all individuals were infected by CRF01_AE strains, the predominant lineage in Thailand.

Data analyses
Sequences were aligned with the MAFFT v7.13 software [33] in both multiple and pairwise alignments, using an HIV-B and a consensus of HIV-1 CRF01_AE sequence as reference sequences. The resulting alignments were visually inspected with JALVIEW v2.0 [34]. The partition of genetic variation (diversity) was investigated through a molecular analysis of variance (AMOVA; [35]) implemented in ARLEQUIN v3.0 [36] to assess which fraction of genetic variation was accounted by various factors: mother-and-child pairs, sampling times, and DNA vs. RNA, (table 2). For the whole dataset of nine mother-and-child pair, a maximum likelihood (ML) tree was constructed using PHYML v2.4 [37] with a GTR+Gamma mutational model as assessed by the mutational model selection procedure implemented in HYPHY v2.2 [38,39]. Branch support was evaluated with 1000 bootstrap replicates ( Figure 1).
Finally, to assess the impact of putative recombination on our phylogenetic analyses, we applied the method of Genetic Algorithm implemented in GARD [40] looking for potential topological tree incongruence (i.e. the impact of recombination on tree reconstruction) and tested for significance using Kishino Hasegawa (KH) topological incongruence analysis [41].

Coalescent Bayesian Markov Chain Monte Carlo (MCMC) estimates of the transmission timing
We used the BEAST package v1.7 [42] to estimate MTCT timing with a combination of probability models of gene trees and Bayesian statistical framework. Analysis were conducted under the best-fitting set of molecular, coalescent and demographic models (i.e. GTR+Gamma and an extended Bayesian skyline plot), as indicated by a Bayes Factor .20 (Table S1) [43,44]. For each mother infant pair, three chains were run assuming an uncorrelated lognormal relaxed molecular clock that allows the rate of evolution to vary across the tree [15]. Three independent chains were run for each pair with similar prior but different initial states, checking that such replicates reached similar results in order to assess convergence. Mixing and convergence (i.e. effective sample size greater than 200 for all parameters) were assessed in Tracer v1.5. All chains reached stationarity before 50,000,000 steps (except for pair 0779, which analysis had to be rerun with 100,000,000 steps for convergence). The Logcombiner v1.7.5 tool was then used to combine the chains. The first 10% of the chains were discarded (burn in period). Subsequently, the chain was regularly sampled to get an average estimate of parameter values and associated uncertainty [95% highest posterior density (HPD)]. Maximum clade credibility tree was selected with the software TREEANNOTATOR v1.7.4, after a 10% burn in. Trees were visualized in FIGTREE v.1.3.1.

MTCT timing according to reference diagnostic procedures
The timing of MTCT was estimated using qualitative PCR assays for HIV-1 DNA detection in sequential samples collected from the infants. According to these data, four infants were infected during pregnancy (pairs 0858, 1005, 1110, 1224; in utero transmission), whereas five infants were infected at delivery or perinataly (pairs 0779, 0939, 1021, 1333, 1391; intrapartum transmission) (table 1).

Sequences analysis
A mean of 6 sequential samples per infant was included in the study (range = 2-8) with a mean follow-up period of 664 days (range: 277-1034 days; table 1). A total of 622 env sequences were analyzed, with a mean number of 12 clones per time-point (n = 4-32) and a mean of 69 clones for each mother-infant pair (range: 29-101). Among them, 61.4% were issued from cellular DNA, and 38.6% were issued from plasma viral RNA (382 and 240 sequences, respectively; Table 1).
Population structure was explored through an analysis of the molecular variance (AMOVA). The genetic variability between viral sequences (diversity) was explained by order of importance by: (i) mother-child pair, (ii) mother vs infant within each pair, (iii) the temporal stratification of the data, and (iv) the DNA or RNA origin of the sequences (Table 2). For instance, the percentage of molecular variance attributed to the nature of the sequences' origin (i.e. either cellular DNA or plasma viral RNA) was 3.9%, compared to 10.1% and 79.0% for the temporal stratification of the data and the nature of the pair, respectively. This finding suggests that the origin of the sequences (i.e. cellular DNA or plasma viral RNA) had moderate influence on the results.
Since recombination could strongly influence the population structure of sequences dataset, sequences were screened for recombination before phylogenetic analysis using GARD [45]. Though we found evidenced of single recombination events for most of the pairs, topological incongruence analysis using the KH test was not significant, indicating the signal for recombination was likely due to substitution rate variation, rather than differing phylogenetic history [41].

Phylogenetic analysis
As expected, the phylogeny of the entire data set ( Figure 1) showed well supported (monophyletic) subtrees for each motherinfant pair, since each child viral sequence was directly and uniquely derived from the corresponding mother. Each motherinfant tree can therefore be considered separately ( Figure 2). Two topological patterns corresponding to two different types of transmission events could be distinguished: eight of nine pairs supported a single viral transmission event with child sequences constituting a well-differentiated (monophyletic) subtree ( Figure 2:  pairs 0779, 1021, 1333, 1391 0858, 1005 and 1110) and were supported by high posterior probabilities (Figure 2). In these cases,  a unique viral ancestor was likely transmitted from the mother to child. In contrast, viral lineages in the infants were intermixed with the mother lineages for pairs 0939, and 1224 ( Figure 2). Such polyphyly of the child subtrees can be a consequence of (i) a poor phylogenetic resolution or (ii) transmission of several maternal variants. According to this second hypothesis, the TMRCA estimate would trace back to some point during the course of the maternal viral evolution. This kind of pattern would then provide support for multiple transmission variants that may have occurred during a single event (co-infection), or as several successive events (super-infections). Biological data obtained from the reference diagnostic PCR-based procedure may help to distinguish between these possibilities. Infection of infant 0939, which occurred perinatally, might be related to a single transmission event. Indeed, transmission of multiple variants during this time-limited period is more likely to have occurred In contrast, infection of infant 1224, which occurred in utero, might have resulted from either a co-infection by several variants during a simple event or super-infections due to several sequential events ( Figure 3).

Coalescent Bayesian Markov Chain Monte Carlo (MCMC) estimate of transmission timing
For the nine pairs, the posterior distribution of the child viral population TMRCA was inferred. The resulting estimate of the TMRCA closely corresponded to the transmission timing according to standard assays ( Figure 3). The genealogical inferences were consistent with the biological diagnosis based on timing of HIV-1 proviral detection for eight of nine pairs. The 95% HPD of possible infant root age overlapped birth when viral transmission occurred intrapartum according to the reference biological diagnosis (Figure 3: four bottom grey boxes) and excluded it when transmission occurred in utero (Figure 3: four top open boxes). Variance of TMRCA largely varied among the pairs. Though the inferred TMRCA results were consistent for five pairs (0779, 1021, 1333, 1391 and 1005; HPD range extending to roughly two months), others showed a greater variance. This variance extended to almost five months for pairs 0858 and 1224. Pair 1110 showed an extremely large range extending close to birth but was documented with only 29 sequences, including 62% of sequences issued from cellular DNA. Of note, pair 0939 range excluded birth and was discordant with the biomedical diagnostic assay. However, the corresponding phylogenetic tree suggested infection with multiple viral variants (Figure 2). The 95% HPD largely preceded birth, extending even before pregnancy, thereby reinforcing the probable infection by several maternal variants.  The MRCA of the viral variants should then trace back to the mother's viral population and may even precede pregnancy. If the timing assumed by the biomedical diagnosis was correct, it is possible that several maternal variants were transmitted at birth in child 0939. Considering the ''children-only'' dataset, parameter estimates were still consistent with those obtained with the entire dataset. Inferred TMRCA were mutually highly consistent for 7 of the 9 pairs and in a lesser extent for pairs 1224, 1333 who were also distinct by the shortest duration of follow-up (respectively 277 and 350 days) (Figure 4). This finding was expected considering that the child TMRCA could be estimated regardless of the maternal sequence data (Figure 4).

Discussion
HIV-1 pediatric infection is a challenging health issue worldwide, and a primary goal of HIV-1 prevention programs is to limit acquisition of HIV-1 during mother-to-child transmission [1]. Although it has been shown that MTCT of HIV-1 can occur both during pregnancy and at delivery [2], the exact timing of MTCT remains difficult to determine precisely with biomedical methods based on serial PCR and indirect probabilistic approaches [23]. To our knowledge, time-measured Bayesian MCMC analyses of longitudinally sampled sequences have never been used to investigate this question. The well-documented sample of longitudinally collected sequences from nine HIV-1 infected mother-infant pairs included in this study represents a unique opportunity to investigate MTCT timing and to validate Bayesian inference.
The maternal and infant viral populations within each pair were significantly differentiated, thus reflecting a drastic bottleneck associated with transmission [32,[46][47][48][49][50][51][52][53][54][55][56]. This bottleneck and the subsequent molecular evolution within the infected child provided the basis for our proposed approach. Tree topologies suggested different types of transmission: six pairs showed well-supported monophyletic subtrees claiming for a single transmission event of a single viral variant. In contrast, the data suggested transmission of several variants, as indicated by several separate branches, for the three remaining pairs. Transmission time estimates using a time-scaled Bayesian phylogenetic approach were shown to be reliable. Despite the short time scales involved (from 277 to 1034 days depending of the mother-infant pair), they were qualitatively highly consistent with the reference diagnostic procedures based on timing of HIV-1 proviral DNA detection. In eight of nine cases, our inferences of the timing of transmission corroborated the PCR results, including four in utero and four intrapartum infected infants. These results validate the time-scaled Bayesian approach to date MTCT of HIV-1. The single discordant case concerned a pair for which TMRCA was not consistent with the biomedical diagnosis that argued for intrapartum transmission. Transmission of multiple maternal variants was strongly suggested by the phylogenetic analysis of this mother-infant pair. The introduction of several variants in a single transmission event or closely related events could lead to large uncertainty in the estimated timing of transmission, leading the ancestry among viral lineages to trace further back in time in the mother's population. However, it remains difficult to distinguish between this hypothesis supported by the PCR results and a possible misclassification by the standard diagnostic assay. The observed TMRCA variance largely varied among the pairs and could be partially linked to the time of transmission; the more recent the transmission, the more precise the estimate. Hence, TMRCA estimates were consistent (corresponding to a greater coefficient of variance) for five pairs, particularly for the intrapartum infected children. Interestingly, a wide confidence interval was observed for two infants (pair 1110, and pair 1224 albeit at a lesser extent), considered to have been infected in utero according to the PCR results. This uncertainty might be explained by both the transmission of several variants for the two cases and a low number of available sequences for pair 1110.
We must consider that the transmission of several variants might be attributed either to introduction of several maternal variants in a single transmission event (co-infection) or to several successive introductions of maternal variants (super-infection) that could occur during pregnancy (in utero) or during pregnancy and then at delivery (in utero and intrapartum). Discrimination between these two possibilities remains a challenging objective. Unlike phylogenetic methods that follow a lineage backwards to coalescence, probabilistic models were implemented to detect if transmission occurred because of one or several variants [57]. To our knowledge, superinfection during pregnancy has never been modeled through time. A model introducing migration and divergence processes between the viral populations of the mother and its child might distinguish co-infection from superinfection.
This study was limited by the inclusion of sequences issued from both cellular DNA and plasma viral RNA, which may reflect different tempos of HIV-1 evolution. However, our AMOVA analysis suggested that the origin of the sequences (i.e. cellular DNA or plasma viral RNA) did not play a major role on the observed molecular variance. Additionally, the use of a relaxed clock model partially accommodates for the potential different evolution rate of sequences present in cellular DNA and plasma viral RNA. The reappearance of archived viral population must be considered [58,59]. For example, the phylogenetic tree of pair 1391 showed that proviral DNA sequences sampled at time t = 755 days in the child were extremely close to RNA sequences present earlier in his plasma, at 47 and 184 days after birth. This finding suggests the possibility of continuous re-emergence of variants from putative viral reservoirs in children. If DNA reservoirs contribute to the RNA pool recurrently, it appears relevant to consider both DNA and RNA sequences to avoid features of HIV-1 evolution. The potential bias introduced by the maternal sequences must be considered also. They were issued from cellular DNA since RNA sequences were not available for most mothers as a consequence of the low plasma viral load at delivery in response to antiretroviral prophylaxis during pregnancy [31]. However, the inferred TMRCA using RNA-derived sequence only were still consistent with the full analysis when considering the seven mother-infant pairs for which the number of sequences issued from plasma RNA was sufficient. Sampling times lead to a local rescaling of rates of substitutions through the local clock model and branching. Even if it constraints the relative nodes positions along the trees, the time estimate of the child's root provided confidence in the TMRCA inference and was confirmed by the congruence with the biomedical assays. Another limitation may be linked to the technical strategy used. Indeed, the sequences were not obtained using the single genome amplification technology that is considered as a gold standard since it avoid recombination events during the PCR from bulk DNA and it limits the non proportional representation of target sequences due to template resampling [60][61][62]. However, and as discussed above, the concordance between the molecular evolution approach and the reference biomedical diagnosis suggest that this limitation did not have a drastic effect.

Conclusions
This study has benefited from the temporal anchoring of HIV-1 sequences to infer transmission timing and evolutionary rates in mother-infant pairs. Thanks to the validation based on the determination of the transmission period through the reference diagnostic procedure, our study offered a unique opportunity to validate the Bayesian coalescent framework on documented cases. It confirms that time series of viral sequences can provide powerful insights to draw inference about transmission timing. Such theoretical approaches could also provide insight into multiple infections processes (i.e. coinfection or superinfection) not readily detectable with the current diagnostic assays.

Supporting Information
Table S1 Bayes factor of the strict molecular clock (M1) and the uncorrelated lognormal clock (M0) models for each pair. Bayesian Analysis were conducted under the bestfitting set of molecular, coalescent and demographic models as indicated by a Bayes Factor (B 01 ).20 (DOCX)