Patrilineal Perspective on the Austronesian Diffusion in Mainland Southeast Asia

The Cham people are the major Austronesian speakers of Mainland Southeast Asia (MSEA) and the reconstruction of the Cham population history can provide insights into their diffusion. In this study, we analyzed non-recombining region of the Y chromosome markers of 177 unrelated males from four populations in MSEA, including 59 Cham, 76 Kinh, 25 Lao, and 17 Thai individuals. Incorporating published data from mitochondrial DNA (mtDNA), our results indicated that, in general, the Chams are an indigenous Southeast Asian population. The origin of the Cham people involves the genetic admixture of the Austronesian immigrants from Island Southeast Asia (ISEA) with the local populations in MSEA. Discordance between the overall patterns of Y chromosome and mtDNA in the Chams is evidenced by the presence of some Y chromosome lineages that prevail in South Asians. Our results suggest that male-mediated dispersals via the spread of religions and business trade might play an important role in shaping the patrilineal gene pool of the Cham people.


Introduction
The Austronesian language family is one of the largest and most widespread language families. It is spoken by more than 350 million people on islands from Madagascar to Easter Island [1,2]. Nevertheless, the languages in this family have a rather limited distribution on the mainland. Chamic, the representative language of the family, is spoken by the Cham people. In Mainland Southeast Asia (MSEA), Chamic exists as a ''linguistic enclave'', because it is surrounded by non-Austronesian-speaking groups (e.g. Mon-Khmers) [3,4,5]. Many studies investigate the diffusion of Austronesian in MSEA by tracing the origin of the Cham people. The ''Out-of-Taiwan'' hypothesis regards the Cham ancestors as the Austronesian immigrants from Island Southeast Asia (ISEA) and immigration is dated to around 500 BC [4,6,7]. Before the arrival of the Austronesian immigrants, southern Vietnam appears to have been occupied by the local Austro-Asiatic speakers, especially Mon-Khmers [8]. There is a high chance of admixture between the Chams and Mon-Khmer groups. Previously linguistic analyses of the Chamic report that some loan-words from Mon-Khmer languages form indigenous cultural contributions [4,6]. The ''Nusantao Maritime Trading and Communication Networks'' hypothesis states that cultural diffusion through trading and communication networks played an important or even dominant role in the ethnogenesis of the Cham [9]. Because the origin of the Cham people is open to debate, the demographic history of the Austronesians in Southeast Asia requires further investigation.
Analyses of mitochondrial DNA (mtDNA) variation of the Cham population resolve a closer relationship with populations in MSEA rather than with those from ISEA, and this occurs despite that recent gene flow from ISEA [10]. This result suggests that the origin of the Cham people likely involves the massive assimilation of local Mon-Khmer populations, and this is accompanied with language shift. Thus, the Austronesian diffusion in MSEA appears to mediated mainly by cultural diffusion [10]. Because mtDNA data only offer a maternal perspective, only half of the story is known. Does patrilineal history reveal the same story? We address this question by evaluating non-recombining region of the Y chromosome (NRY) markers, including 26 single-nucleotide polymorphisms (Y-SNPs) and eight short tandem repeats (Y-STRs), in 59 male Cham individuals whose matrilineal histories are known [10]. For comparison, the NRY markers of 76 Kinh, 25 Lao, and 17 Thai males were also surveyed ( Figure 1; Table 1).

Population structure
Genetic relationships between the Cham and other Southeast Asian populations were discerned with the aid of additional published Y-chromosomal datasets ( Figure 1; Table 1). We employed a principal component analysis (PCA) based on the NRY haplogroup distribution frequencies of 45 populations (Table S2) to show the overall clustering pattern of the populations. Populations from eastern ISEA (EISEA) and from Laos formed two clusters in the first PC ( Figure 3) and this pattern was mainly owed to haplogroups C-M216, K-P131*, and O-M95* ( Figure S1). The second PC resolved a close affinity between the Kinh and Vietnamese (most likely, the Kinh) populations with those from mainland southern China due to the high frequency of haplogroup O-M88 ( Figure S1). The Cham population showed a close affinity to some but not all populations from western ISEA (WISEA; Figure 3). The clustering pattern revealed by PC1 and PC2 was statistically significant (P,0.05) in AMOVA based on the same profiles of haplogroup distribution frequencies (Table S2). Nevertheless, in terms of the linguistic affinities, the difference between Austronesian (i.e. Cham and WISEA populations) and non-Austronesian (i.e. other MSEA populations) was not statistically significant according to AMOVA (p = 0.08). We incorporated data for eight common Y-STRs (DYS19, DYS389-I, DYS389-II, DYS390, DYS391, DYS392, DYS393, and DYS439) from additional populations in MSEA [11,12], Multidimensional scaling (MDS) based on R ST genetic distances for these Y-STRs did not associate the Chams with populations from WISEA ( Figure 4).

Admixture in the Cham population
The origin of the Chams could not be simply explained as a demic diffusion of Austronesian immigrants from WISEA. The genetic patterns between the Cham and other Southeast Asian populations, as detected in PCA and MDS, suggested a more complex history. The complex demographic process likely involved genetic admixture with local non-Austronesian speakers in MSEA. Therefore, we performed the admixture analysis [13,14] to quantify the proportion of genetic contribution from WISEA and MSEA to the Chams ( Table 2). The patrilineal contribution from WISEA to the Chams (0.37595) was less than that from MSEA (0.62405). Comparatively, the Vietnamese (most likely, the Kinh) population from southern Vietnam had a dominant proportion of the MSEA contribution (0.842972;  Table 1

Haplotype diversity analyses
To discern the relationship between the Y-STR haplotypes in the Chams and other Southeast Asians, median-joining networks [15] were constructed using eight common Y-STRs for each of the 11 haplogroups found in the Cham population ( Figure 5). In the networks of haplogroups O-M95* and P-P27.1, some haplotypes were exclusively shared between the Cham and ISEA populations.  (Table S3). These patterns would suggest that these Chams lineages had an in situ origin from MSEA. Among the 48 haplotypes identified in the Chams, 11 and 18 were shared with those in ISEA and MSEA, respectively (Table S3). Nevertheless, the counts for shared haplotype did not differ significantly (twotailed Fisher's exact test, P = 0.303; Table S4). Moreover, six haplotypes belonging to haplogroup O-M95* were shared by both ISEA and MSEA groups. The exact origin of these lineages in the Chams remains elusive.
To trace the source of the exotic South Asian prevailing components, we incorporated published data (Table S5) from India [16,17], Pakistan [16], and West Asia [18,19] and reconstructed median-joining networks of haplogroups R-M17 and R-M124 ( Figure 6). All haplotypes in the Chams were scattered in the networks, which implied that these lineages had an origin via recent gene flow rather than deeply rooted ancestry. Two Cham lineages of R-M124 were shared the same haplotype with those from North India. This observation suggested that North India might be the original source of the R-M124 lineages in the Chams. The relationships among lineages of R-M17 were complex in the network, which suggested multiple geographic/ ethnic sources for the R-M17 lineages in the Chams.

Discussion
Integrating the information from two uniparentally inherited markers (NRY and mtDNA) is a powerful means of disentangling the human population histories [20], and especially for elucidating sex-biased migrations and social-cultural effects [21]. Compared with our previous study for mtDNA variation in the Chams [10], the current assessment for NRY variation facilitates a better understanding into the origin of the Cham people. Both NRY and mtDNA haplogroup profiles (Figure 7) suggest that, in general, the Chams are indigenous to Southeast Asia. Characteristic East and Southeast Asian lineages, viz., NRY haplogroups O-P191 and C-M217, together with mtDNA haplogroups B, F, M7, and R9, accounted for the majority of the patrilineal (,67.8%) and matrilineal (,60.1%) gene pools of the Chams, respectively. Some ancient Southeast Asian components (NRY haplogroups: C-M216*, F-M213*, and K-P131*; mtDNA haplogroups: M*, N*, and R*) were also identified in the Chams.
The origin of the Chams appears to be much more complex, at least based on the results of PCA, MDS, AMOVA, and haplotype (near-) matching analyses. Recent gene flow from ISEA is detected in the patrilineal pool of the Chams, most likely via the dispersal of Austronesian speakers. Further, the Cham population also contains a significant amount of local genetic contributions from non-Austronesian populations in MSEA. This pattern corresponds with our previous study based on mtDNA [10]. Taken together, the origin of the Chams is mainly a result of admixture between the Austronesian immigrants from ISEA with the indigenous populations (most likely, Mon-Khmers) in MSEA.
South Asian NRY haplogroups R-M17, R-M124, and H-M69 [16,22] are common in the Chams (,18.6%; Figures 2) yet no mtDNA haplotypes are known [10]. Male South Asians contribute to the genetic makeup of Chams, but not South Asian females. The existence of these South Asian patrilineal lineages was in good accordance with the archaeological and historical records. The dominant religion of the Cham people is known to have been Hinduism (overwhelmingly Shaivism) and their culture was deeply influenced by that of India [3,7]. Both Indian and Cham people appear to have played important roles in Southeast Asian maritime trade [23,24]. Contact between the two peoples makes gene flow between them inevitable. The discordance between NRY and mtDNA contributions in the Chams (Figure 7) is well explained by the male-mediated dispersals, most likely through the spread of religions and business trade. In particular, the admixture between alien males and local females is compatible with the matrilocal residence in the Cham people [25,26]. Patrilineal genetic structuring differs between the Chams and Kinhs. For instance, in contrast to the Chams, frequently the Kinhs have lineages (8/76, ,10.5%) from the characteristic Chinese haplogroup O-M7 [27] yet only one lineage from the South Asian haplogroup R-M17 (Figure 7). In addition to the Sinicized cultures, substantial Chinese assimilation into the Kinh people via immigration is suggested for northern Vietnam [3,7]. Thus, the different ethnohistories of the Chams and Kinhs are reflected by their unique mtDNA and NRY patterns.
In summary, this study expands our knowledge on the complex history of the Austronesian diffusion in MSEA. Further improvements to the resolution of the NRY tree [20,28] will help to unravel the story of the Cham people. This initiative will also benefit from the employment of genome-wide autosomal markers [29,30,31]. In the future, a comprehensive study involving extensive sampling will pinpoint more details about the demographic history, such as the source and route for migration, the timing for admixture and expansion.

Samples and data collection
Blood samples of 177 unrelated males were collected from four populations (Table 1; Figure 1). Among them, samples from 59 Cham individuals were collected from Binh Thuan province, southern Vietnam. Binh Thuan was part of the Cham principality of Panduranga, the last Cham territory that had been annexed by   Table 1. Because of severe genetic drift [27], populations Taiwan, Nias, and Mentawai that were resolved as the outliers in the initial analyses and were excluded. doi:10.1371/journal.pone.0036437.g004 Nguyen Vietnam in 1832 AD [3,7], and it was said to harbor a significant number of Chamic speakers [7]. The mtDNA data of the Cham, Kinh, and Thai populations were previously reported [10,32]. This study was approved by the Institutional Review Board of Kunming Institute of Zoology. All subjects were interviewed to obtain informed written consent before sample collection.
Comparative NRY data from southern China and Southeast Asia ( Figure 1; Table 1) were taken from previously published literature [11,12,27,33]. We uniformed all Y-SNPs and Y-STRs data into the same resolution to include as more populations as possible. This truncation of some data caused the NRY haplogroups collapsed into 16 clusters (Table S2), and Y-STRs were reduced to eight loci (DYS19, DYS389-I, DYS389-II, DYS390, DYS391, DYS392, DYS393, and DYS439). Additional data of haplogroups R-M17 and R-124 were collected from published South and West Asian datasets [16,17,18,19] (Table S5).

DNA extraction and genotyping
Genomic DNA was extracted by the standard phenol/ chloroform methods. Seventeen Y-SNPs (Table S1) were genotyped by the GenomeLab TM SNPstreamH (Beckman Coulter). We used three panels of multiplex PCR reactions following manufacturer's recommendation (Protocol S1). The primers for multiplex PCR and single base extension reactions were designed by Autoprimer software (Beckman Coulter) [34]. To improve the resolution of phylogeny, we further screened nine Y-SNPs by direct sequencing some individuals (Table S1). The PCR amplification and sequencing primers were previously reported [35]. Using described methods [36,37,38], we genotyped eight Y-STRs (DYS19, DYS389-I, DYS389-II, DYS390, DYS391, DYS392, DYS393, and DYS439) on an ABI 3730 DNA Analyzer (Applied Biosystems). For DYS389-I and DYS389-II, we used the genotyped data of DYS389-I, and DYS389-II minus DYS389-I in our analyses.

Data analysis
Arlequin 3.5 (http://cmpg.unibe.ch/software/arlequin35/) was used to calculate AMOVA and R ST distances [39]. Principal component analysis (PCA) and multidimensional scaling (MDS) were performed using SPSS 13.0 software (SPSS). In PCA, the original haplogroup frequency data were transformed to standardize against the different effect of genetic drift on haplogroups of different frequencies [40]. Admix 2.0 (http://web.unife.it/progetti/ genetica/Isabelle/admix2_0.html) was used to estimate the level of admixture of MSEA and WISEA groups in the Cham and Vietnamese populations [13,14]. The average haplogroup frequencies of MSEA and WISEA were taken for the two parental populations, respectively. Median-joining networks [15] of Y-STRs within certain haplogroups were constructed with NETWORK 4.6 (http://www.fluxus-engineering.com/network_terms.htm).   Protocol S1 The primers and protocols for the SNPs genotyped by the GenomeLab TM SNPstreamH. (DOC)