Cranio-morphometric and aDNA corroboration of the Austronesian dispersal model in ancient Island Southeast Asia: Support from Gua Harimau, Indonesia

The Austronesian language is spread from Madagascar in the west, Island Southeast Asia (ISEA) in the east (e.g. the Philippines and Indonesian archipelagoes) and throughout the Pacific, as far east as Easter Island. While it seems clear that the remote ancestors of Austronesian speakers originated in Southern China, and migrated to Taiwan with the development of rice farming by c. 5500 BP and onto the northern Philippines by c. 4000 BP (the Austronesian Dispersal Hypothesis or ADH), we know very little about the origins and emergence of Austronesian speakers in the Indonesian Archipelago. Using a combination of cranial morphometric and ancient mtDNA analyses on a new dataset from Gua Hairmau, that spans the pre-Neolithic through to Metal Period (5712—5591cal BP to 1864—1719 cal BP), we rigorously test the validity of the ADH in ISEA. A morphometric analysis of 23 adult male crania, using 16 of Martin’s standard measurements, was carried out with results compared to an East and Southeast Asian dataset of 30 sample populations spanning the Late Pleistocene through to Metal Period, in addition to 39 modern samples from East and Southeast Asia, near Oceania and Australia. Further, 20 samples were analyzed for ancient mtDNA and assigned to identified haplogroups. We demonstrate that the archaeological human remains from Gua Harimau cave, Sumatra, Indonesia provide clear evidence for at least two (cranio-morphometrically defined) and perhaps even three (in the context of the ancient mtDNA results) distinct populations from two separate time periods. The results of these analyses provide substantive support for the ADH model in explaining the origins and population history of ISEA peoples.


Introduction
The Austronesian language is spread from Madagascar in the west, Island Southeast Asia (ISEA) in the east (e.g. the Philippines and Indonesian archipelagoes) and throughout the Pacific, as far east as Easter Island. Austronesian language dispersal models have been proposed by Blust and Bellwood [1][2][3] and Bellwood has gone on to test these using archaeological evidence as a proxy for human movement between 5000 and 1000 years ago [1,[4][5][6][7]. The most widely recognized model that the remote ancestors of Austronesian speakers originated in Southern China, and migrated to Taiwan with the development of rice farming by c. 5500 BP and onto the northern Philippines by c. 4000 BP, is now broadly accepted [6,7].
The subsequent Austronesian language speaking dispersals, from the Neolithic through to later Metal periods, throughout Island Southeast Asia, including Malaysia and Indonesia, and into the Pacific, referred to as the "Out of Taiwan" model or Austronesian Dispersal Hypothesis (ADH), are similarly well attested to archaeologically [8]. Notwithstanding, unlike the case in Mainland Southeast Asia (MSEA) [9,10], human skeletal remains have not played a substantive role in human mobility debates in ISEA. The principle reason is the hitherto poor preservation of human remains from key localities in the region. For instance, Niah Cave, is the largest Neolithic mortuary site in ISEA, from which more than a hundred human skeletal remains have been reported, including the earliest dated in the region: the 'deep skull' [11,12]. Unfortunately, the very poor preservation of these remains, particularly the crania, has hampered morphological analysis.
Recent excavations at the Gua Harimau cave site in southeastern Sumatra, provide an assemblage spanning the pre-Neolithic to Metal periods and an opportunity to assess the ADH model. The aim of this paper is to test the ADH using a combination of cranial morphometrics and ancient mtDNA analyses on the Gua Harimau remains. Comparisons will be made with appropriate modern and archaeological samples to better understand ancient patterns of genetic exchange and human mobility patterns in the region.

Gua Harimau in context
The cave site of Gua Harimau is located in Padang Bindu, Oku district, in southeastern Sumatra, Indonesia (Fig 1). The cave, which formed several tens of meters above the present alluvial plain, opens towards the southeast. The width of the main entrance is c. 30m and the average horizontal depth is c. 15m. Since 2012, a substantive area of the floor of the single chambered cave has been excavated, recovering 84 individual human skeletons dating from the pre-Neolithic through to the Neolithic, Bronze and Iron Ages (Fig 2) or 5712-5591cal BP to 1864-1719 cal BP (Table 1). Some level of continuity in artefact types and floral/faunal remains from the Neolithic through to Bronze and Iron ages suggests continuity in occupation over this period of time.
The main feature differentiating the Metal Period burials is the presence of bronze and/or iron artefacts. Neolithic burials can be identified by way of characteristic paddle impressed and incised ceramics. Somewhat intriguingly, cord marked ceramics were also identified among some Neolithic burials. The pre-Neolithic layer, dated to between 5712-4434 cal BP, is characterized by the presence of unifacial pebble tools, similar to the almond-shaped tools often found in Hoabinhian cultural assemblages in MSEA, as well as a significant number of flakes. The deeper late Pleistocene layer, dated using charcoal to 13,055 +/-120 14 C years [13] or 14,061-13,312 cal BP (95.4%) (OxCal v4.3, IntCal 13, Bronk Ramsey [14]), is characterized by numerous flakes originating from a range of materials, including obsidian. No human remains have been recovered from this basal layer.

Cranial morphometric analysis
The excavation and analysis of Gua Harimau Cave was carried out with the permission of Dr H. Kuryana Azis (Head of the Local Government), Mr Pak Paisol (Head of the Local Tourism and Cultural Office) and Padang Bindu village. The male crania from Gua Harimau were analyzed to be consistent with standard recording protocols and generally male dominated comparative data sets available. Of the original 84 individuals, 23 were suitable for morphometric analysis and these are divided into Early and Late Gua Harimau (see representative crania in Fig 3). The Early Gua Harimau sample consists of two individuals buried in a flexed position, assigned to the pre-Neolithic period (individuals No. 74 and 79). Individual No. 74 was directly dated to between 4785-4434 cal BP. No. 79 could not be directly dated, but it was stratigraphically located between individuals No. 74 and No. 80 which was directly dated to 5712-5591cal BP. The Late Gua Harimau sample includes 20 individuals buried in a supine, extended position and one disturbed burial, the position of which could not be determined (individual No. 53), assigned to the Metal period, dated to between 2424-2333 and 1864-1719 cal BP. While three Neolithic burials could be dated (see Table 1), ranging between 3136-2953 and 2711-2379 cal BP, no Neolithic crania were sufficiently preserved to enable morphometric analysis.
The cranial data set included a subset of 16 measurements (Martin's method numbers M1, M8, 9, M17, M43(1), M43c, M45, M46b, M46c, M48, M51, M52, M54, M55, M57, M57a), that represent the most commonly available measurements among the comparative samples. The cranio-metric affinities of the comparative samples were assessed using Q-mode correlation coefficients [15], based on the above 16 cranial measurements. The comparative archaeological cranial series are listed in Tables 2 and 3 and included a total of 64 individuals from both archaeological and contemporary contexts in East, and Southeast Asia and the Pacific. The dataset includes individuals from the late Pleistocene, early to mid-Holocene, Neolithic (Continued) (defined as early farming populations [16]), and Bronze and Iron Ages through to the proto-Historic and Historic periods. Space precludes a review of each sample in the dataset, however, the references in Tables 2 and 3 provide details on the majority of samples used in this analysis.
To aid the interpretation of any phenotypic affinities between the samples, Neighbor Net Split tree diagrams were generated using the software package Splits Tree Version 4.0, applied to the distance (1-r) matrix of Q-mode correlation coefficients (r) [68].

Mitochondrial DNA analysis
Tooth enamel forms a natural barrier to exogenous DNA contamination, and DNA recovered from teeth appears to lack most inhibitors of the enzymatic amplification of ancient DNA (aDNA) [69]. In addition, because recent research reveals that the temporal bone is an ideal region from which to analyze aDNA, samples were taken from both teeth and the temporal bone in this analysis [70]. In total, 20 samples (two pre-Neolithic, three Neolithic and 15 Metal Period) from well preserved teeth and temporal bones were selected for DNA analysis. A list of all samples used in this analysis are presented in Table 4 (see Results) along with their determined haplogroups.
Authentication methods for DNA extraction. Mitochondrial DNA (mtDNA) analyses were performed at the National Museum of Nature and Science, Tokyo, Japan, and at Yamanashi University, which have laboratories dedicated to aDNA analysis. Standard protocols were employed to avoid contamination, including the separation of pre-and post-PCR experimental areas, UV irradiation of equipment and benches, negative extraction, and PCR controls [71].
To prevent contamination from post-excavation handling, all samples were rinsed with DNA-decontamination agents (DNAaway; Molecular Bio Products, San Diego, CA, USA) or 13% bleach solution (Nacalai Tesque Inc., Kyoto, Japan), and then washed thoroughly with distilled water before drying. Next, tooth samples were encased in Exafine silicone rubber (GC, Tokyo, Japan). The tip of the root of each tooth was removed via a horizontal cut using a cutting disk, and the dentin within the dental pulp cavity was powdered and removed through the root tip using a dental drill [72]. Powdered samples were then decalcified using 0.5 M EDTA (pH 8.0) at room temperature overnight, samples were then decalcified for a further 48 hours in a fresh EDTA solution. Decalcified samples were lysed in 500 μl of Fast Lyse (Genetic ID, Fairfield, IA, USA) with 30 μl of 20 mg/ml Proteinase K at 60˚C for four hours. DNA was extracted from lysate using a FAST ID DNA Extraction Kit (Genetic ID) in accordance with the protocol described by Adachi et al. [73]. Data analysis and genotyping of mtDNA. mtDNA SNPs were detected using the amplified product length polymorphism (APLP) method [74,75]. This method has been applied in  aDNA analyses previously and has yielded successful results [71,76]. In this study, 81 haplogroup-diagnostic SNPs and three deletion/insertion polymorphisms, and a 9 base-pair repeat variation in the non-coding cytochrome oxidase II/tRNALys intergenic region were analyzed using the multiplex APLP method and the primer sets described by Kakuda et al. [77]. Polymorphic sites included in this analysis are known to cover most haplogroup-defining mutations found in East and Southeast Asian mtDNAs. The constitution of the PCR reaction mixture, thermal conditions, and method for separating and detecting PCR products were undertaken following Kakuda et al. [73]. In addition to APLP analysis, next-generation sequencing (NGS) technology and the mtDNA capture method were applied to individuals, whose haplogroups were tentatively determined or ambiguously identified by APLP analysis, to determine the mtDNA haplogroup or haplotype more precisely. Libraries were prepared using 8 μl of DNA extracts and using the established protocols following Shinoda or Meyer and Kircher [78,79]. For some of the badly degraded ancient DNA, Multiplex PCR kit (QIAGEN) was used instead of AccuPrime Pfx kit (Life Technologies) in the first round of PCR amplification. The TreSeq DNA LT Set A or HT (Illumina) barcode for was used for indexing. Bait preparation and mtDNA enrichment for libraries were conducted following the protocol of Maricic et al. [80] and sequenced on an Illumina MiSeq platform (MiSeq Reagent Kit 150 or 300 Cycles) with 75 or 150 cycles paired-end run.
Raw sequence reads were processed following a modified protocol of Shinoda et al. [79]. After adapter trimming and merging of paired reads with AdapterRemoval v2 (-trimnstrimqualities-minquality 25-minlength 35-collapse), the merged reads were mapped to a human reference genome (hg19) using the Burrows-Wheeler Aligner (BWA) (version 0.7.8) aln option (-l 1000) [81,82]. Cross-contaminants among samples sequenced on the same sequence run were removed using the process outlined by Kanzawa-Kiriyama et al. [83]. The reads mapped to NUMT or mitochondrial genome were retrieved and remapped to the human mitochondrial genome (revised Cambridge Reference Sequence: rCRS) with the same criteria applied when mapping to hg19. PCR duplicates were removed using Picard MarkDuplicates (version 1.119) (http://broadinstitute.github.io/picard/), and only reads with mapping quality !20 were retained [84]. A mpileup file (-Q 30) was compiled using SAMtools (version 1.0), calculating coverage of width and average depth. The resulting bam file was also applied to Genome Analysis Toolkit (GATK) HaplotypeCaller [85] (-stand_emit_conf 10) to call SNPs and indels [81]. The sites with low depth (<3) and high mismatch to consensus sequences (>30 percent) were masked, allowing the manual determination of the mtDNA haplogroup based on PhyloTree-Build 17 [86]. Some SNPs that were characteristic of the haplogroup but masked because of low depth were manually re-identified. We also determined the mtDNA haplogroup by using HaploGrep2 as double check of the haplotyping [87].
We investigated the degree of terminal C to T misincorporation using PMDtools and read length distribution, both of which are characteristic of ancient DNA [88]. In order to estimate contamination frequency, we used Schmutzi software (contDeam.pl-lengthDeam 40-library double) [89].

Cranial morphometric analysis
Basic statistics for the early and late Gua Harimau male series are presented in the S1 Appendix. Fig 4 presents the results of the Net Split analysis, applied to the distances of the Q-mode correlation coefficients based on 16 cranial measurements. Essentially, this unrooted network tree exhibited a straightforward dichotomization of the comparative group into two major clusters: (1) Northeast and East Asians, and several sets of Southeast Asians, ranging from the Neolithic to contemporary periods, occupy the upper left of the tree. The contemporary Southeast Asians are scattered adjacent to this cluster. (2) The Australo-Papuans, Veda of Sri Lanka, Nicobarese, Andaman Islanders, and early Holocene Southeast Asians, including Hoabinhian samples (who are morphologically quite distinct from Northeast and East Asians) form another major separate tree cluster on the lower right of the tree. It is quite interesting that Early Gua Harimau, a subset of the pre-Neolithic samples, is closely connected with the late Pleistocene and early Holocene populations, including Hoabinhian, early Bac Son, and Con Co Ngua. These samples form a mega cluster together with the Australo-Papuan and Gua Cha Malay series (Hoabinhian).
In the context of the ADH, it is notable that the Late Gua Harimau series (a Metal Period subset) exhibits close affinities with the Taiwan Formosa (Bunun), Sumatra, Moluccas, Philippines, and Celebes Island series. These current Austronesian speakers also closely cluster with  Table 4 presents the results of APLP analysis employed in the identification of mtDNA haplogroups. Of the 36 individuals considered in this analysis, we could successfully assign 13 mtDNAs of the individuals to the smallest named haplogroups and macro-haplogroups that can be identified by the APLP system employed in the present study.

Mitochondrial DNA analysis
In order to determine mtDNA haplogroups more precisely, from the highly fragmented DNA, we used NGS and the mtDNA capture method to investigate these tentative assigned sequences and one N.D. (not determined) sample (individual No. 79), which was the oldest of all the samples. Results of the NGS analysis are presented in Table 5. There were sufficient mtDNA reads to determine the mtDNA haplogroup or haplotype for 11 individuals. The lengths of the mtDNA fragments were very short, which is characteristic of aDNA (S2 Fig). Terminal C to T and G to A misincorporations were observed in all 11 individuals (S1 Fig). Contamination was less than two percent, so we expect that many fragments were endogenous human mtDNA. Thus, it is clear that the extracted solutions contained authentic human DNA.
Complete or nearly complete mitochondrial genome sequences were determined from these 11 individuals at 7.1~109.1-fold coverage. Each haplogroup and additional SNPs and indels from the hg node are also presented in Table 5. Although individual No. 14 was classified into M7b1a, the DNA fragments were relatively long and had little misincorporation (S1, S2 and S3 Figs). In addition, length of the DNA fragments having M7b1a specific mutations are relatively longer than other reads, especially C/T or G/A damaged reads at 3 bases sequence termini, which are considered authentic DNA [90]. Therefore, we considered it to be modern human DNA contamination. For No.12, while APLP analysis assigned its mtDNA to haplogroup N9a, NGS data was insufficient for evaluating the authenticity of this mtDNA. While APLP analysis can quickly and efficiently determine a haplogroup with low cost, the NGS analysis can verify the authenticity of a haplogroup or haplotype.

Pre-Austronesian indigenous populations
Before assessing the ADH, it is necessary to review the evidence for the earliest human populations in the region. The date for anatomically modern human colonization of MSEA and ISEA is attested by way of assemblages excavated in Tam Pa Ling in Laos, Niah in Malaysia, and Tabon in the Philippines, ranging from 47,000 to 30,000 years BP [91][92][93][94]. Of these, the Niah and the Tabon series were excavated from sites now occupied by Austronesian speakers, and in the context of the ADH can be seen as representative of pre-dispersal indigenous populations. However, the poor preservation of such remains limit any attempts to assess their relationship to each other or later series in the region.  A189G,  T310TC,  T450C,  A3397G,  G3483A,  C3600A,  A5484G,  C6164T,  A7271G,  T9833C,  G9966A,  C10777T,  G11150A,  T14178C,  T14311C T152C,  T310TC,  GCA513G,  C3817T,  T4336C,  T4823C,  G8592A,  A9285G,  G11176A,  C12378T,  G15172A,  T16229C   It is not until the late Pleistocene to early/mid-Holocene (often referred to the Hoabinhian in MSEA), c. 23,000-8000 BP [95][96][97][98], that we have a robust sample of ostensibly pre-ADH crania. Key specimens derive from cave sites in Vietnam and Malaysia (for instance, Lang Gao, Lang Bon, Pho Binh Gia, Lang Cuom, Cua Gi, Mai Da Nuoc and Mai Da Dieu in Vietnam and Gua Cha on the central Malay Peninsula [10, 21-24, 26, 30, 96]). As shown in Fig 4, all the available Hoabinhian specimens are consistently defined as having close Australo-Papuan affinities in terms of their cranio-metrically expressed morphology. While the focus of this analysis is on male crania (see Methods), female material (e.g., Hang Cho, Gua Gunung Runtuh, and Moh Khiew) have also demonstrated remarkable cranial and dental similarities to Australian and/or Melanesian samples, suggesting a close biological affinity [99][100][101]. The network tree diagram (Fig 4) further indicates that some Pleistocene and early Holocene samples from China (Liujiang and Zenpiyang from Guangxi) share morphological similarities with MSEA Hoabinhian samples. Furthermore, the cranial traits characterizing these early indigenous inhabitants in the region (for instance, in northern Vietnam), were retained through the subsequent pre-Neolithic Da But Culture (c. 6700-4500 BP), clearly suggesting that such preagricultural foraging communities were likely direct lineal descendants of Hoabinhian foragers.
The earliest reliably dated anatomically modern humans in the region have been found in Southeast Asia, suggesting the initial colonization of the region via India, rather than north and inland through Siberia (see discussion in Buckley and Oxenham [102]). Moreover, these first colonists shared a common ancestry with the earliest settlers of continental Sahul (Australian and New Guinea). Indeed, there is a long history of scholarship [9,10] suggesting morphological similarities, with implied genetic affinities, between Australian Aboriginals, Papuans, Melanesians and (poorly preserved) pre-Neolithic populations in Southeast Asia (e.g., Tabon in Philippines and Niah, Gua Cha, Guar Kepha, and Gua Kerbau in Malaysia). The current analysis of a more extensive cranial dataset finds further support for close affinities between early Southeast Asians, including Hoabinhian samples and Australian and Papuan-Melanesian groups, as well as the Andaman and Nicobar Indians. These observed close biological ties linking Sahul, early mainland Southeast Asia, and Eastern India, strongly suggest that the first anatomically modern human colonizers of this region migrated to the southern rim of Eurasia and then dispersed into late Pleistocene Sundaland (Southeastern Asia), including what is now ISEA. Pre-Austronesian indigenous populations may, in turn, share a common ancestry with early Hoabinhian populations in MSEA and present-day Australian Aboriginal, Papuan, and Melanesian peoples. In fact, as depicted in Fig 4, the pre-Neolithic samples from Gua Harimau (Early Gua Harimau) show a close affinity with these early settlers of MSEA and Sahul, or the first anatomically modern humans in the region.

Austronesian dispersal
The cranio-metric analysis (see Fig 4) demonstrates a close association between the Late Gua Harimau (Metal period) and contemporary Taiwan (Bunun), Sumatra, Moluccas, Philippines, and Celebes Island samples. The morphological affinities between these series suggests a significant level of genetic interaction among neighboring inhabitants of ISEA in the past. The clustering of the Hoa Diem sample with the aforementioned series is worth discussing in more detail.
The large mortuary site at Hoa Diem, located in Khanh Hoa Province in central Vietnam, is interesting in terms of its assumed ancestry to the Chamic people of the same region. The excavation of this site has produced a large number of jar burials and associated mortuary ceramics that are strikingly similar to those from Kalanay Cave in the Philippines [52].
Similarities in material culture between the Philippines and central coastal Vietnam, as well as cranial morphometric clustering of Indonesian (Late Gua Harimau) and coastal Vietnam (Hoa Diem) populations collectively suggest substantive connections and interactions among Island populations bordering the South China Sea during the Iron Age.
These major Neolithic demographic transitions (NDT) in the region are often referred to in terms of the two-layer-model, whereby the first layer refers to late Pleistocene occupation of East and Southeast Asia, with the second layer being characterized by the NDT and the arrival of the ancestors of contemporary Austroasiatic (MSEA) and Austronesian (ISEA and the Pacific) speakers. Modern day Australians, Papuans and Melanesians represent the direct descendants of the first layer populations, while the descendants of the second layer include the somewhat heterogenous populations characterizing the Neolithic through to modern times.
The results from the cranial morphometric analysis in this study clearly supports the twolayer-model for both MSEA and ISEA by demonstrating close morphological associations between widely dispersed pre-NDT samples (or first layer populations) in the broader region. For instance, the early Holocene Qihedong series from Fujian Province, China, and Early Gua Harimau sample from Sumatra, Indonesia, cluster together within the Australo-Papuan group (see Fig 4). On the other hand, evidence for the spread of second layer (or NDT populations) can be identified by way of the close affinities between the Late Gua Harimau series, a number of Austronesian speaking assemblages from ISEA, and the Neolithic Southern Chinese sample from Xitou.
Turning to the genetic evidence, in disagreement with the Austronesian Dispersal Hypothesis (ADH), or Out-of-Taiwan model, is Cox and colleagues work [123,124] which argued for a significant genetic cline across ISEA and the Pacific, ostensibly traced back to incoming populations from MSEA. Cox et al. [124] concluded that the phenotypic gradient likely reflects a mixing of two major ancestral source populations; one descended from the initial occupants of the region who were related to modern Melanesians, and the other related to Asian immigrants since the Neolithic period. Other research has also rejected the idea of large-scale demographic movement during the Neolithic, advocating for local evolutionary processes in the context of evidence for a common genetic heritage derived from the late Pleistocene colonization of Sundaland [125,126]. As for the Austronesian expansion into mainland Southeast Asia, mtDNA analysis of Austronesian-speaking Cham individuals in central Vietnam suggests that cultural, rather than genetic, links were more a factor in this case [127]. Other DNA studies have argued that Southeast Asia was a major geographic source of East Asian populations, within which the roots of all present-day East Eurasians are historically united via a single primary wave of migration into the region [128,129].
In this study we were able to determine mtDNA haplogroups for 11/36 samples. Three individuals (Nos. 9, 38 and 60) were identified as a subgroup of haplogroup B, a common haplogroup in ISEA that is comprised of two main clades: B4 and B5. Most of these B lineages in ISEA fall within haplogroup B4, while B5 is relatively rare. The bulk of B4 in ISEA is B4a with its major branch, B4a1, including the so-called 'Polynesian motif'. Although Hill et al.'s [125] mtDNA analysis indicated that the dispersal of haplogroup B4a1 was triggered by postglacial flooding in the late Pleistocene or early Holocene, B4a1a has a similar distribution to that of Austronesian speakers. Gua Harimau individual No. 9 was assigned to haplogroup B4a1a, suggesting their ancestry may be Austronesian.
Arguably, lineages of haplogroup B are largely the result of a second wave of dispersal of proto-Austronesian speakers. The ancestral forms of haplogroups B4b, B4c, and B5b are found in South Chinese populations, a mainland origin, and subsequent dispersal into ISEA. The B4c haplogroup has been found in samples of ancient Negrito hair, potentially indicating a diffusion of this haplogroup from the mainland [130].
Two Gua Harimau individuals (No. 38 and 60) were classified into sub-haplogroups of B4 using whole-mtDNA sequence analysis: B4c1b2a2. Haplogroup B4c was found to have an age between 32,000 BP and 25,000 BP, with sub-haplogroup B4c1 originating between 27,000 BP and 24,000 BP, B4c1b2 to between 16,000 BP and 14,000 BP, with the origin of B4c1b2a2 dating to the Neolithic [131]. According to the DNA Database in Japan, haplogroup B4c1b2a is found in South China (Liaoning and Zhejiang provinces), as well as among aboriginal Taiwanese, the Philippines, and Indonesia. Given this demographic distribution, sub-haplogroup B4c1b2a appears to be the group associated with the Austronesian expansion during the Neolithic and/or post-Neolithic periods.
Haplogroup E is common in ISEA [125], and is frequently carried by aboriginal Taiwanese, however, it is otherwise almost absent in China and the Pacific. It has been proposed as a potential hypothetical genetic marker of Austronesian-speaking people [126]. Notwithstanding, others have attributed the origin of this haplogroup to an early Holocene population expansion originating within ISEA, which is inconsistent with the Neolithic agriculturallydriven population dispersal hypothesized in the ADH model [125,126]. In fact, there are two major subclades, E1 and E2. Of these, E1 comprises two additional subclades, E1a and E1b, the former almost entirely restricted to Taiwan and ISEA, while the latter is found predominantly in the ISEA but absent in Taiwan.
Previously, haplogroup E itself dates to over 25,000 BP and lineages within haplogroup E have dates ranging from 6,000 BP to 16,000 BP, while a recent study based on ancient DNA calibration and Bayesian dating suggests that haplogroup E probably arose 8,136-10,933 ya (95% highest posterior density, HPD) and the majority of E lineages show a coalescence at 5-8 kya with a higher mean probability at about 6 kya [132]. According to this new time frame, Ko et al. (2014) reconstructed a history of early Austronesians arriving in Taiwan in the north 6,000 ya, spreading rapidly to the south, and leaving Taiwan~4,000 ya to spread throughout ISEA, Madagascar, and Oceania. Based on the demographic distribution and new time depth, E1a1a is a candidate for the presumed out-of-Taiwan dispersal. Spatial frequency distribution and diversity suggest that this haplogroup arose within ISEA, while some of its subclades subsequently spread to Taiwan [126]. This haplogroup probably evolved within the descendants of the Austronesian-speaking groups originating from Taiwan.
Relatively poor mtDNA preservation of Gua Harimau individual No. 4 (Metal period, c. 2,196-1,786 BP) makes identification of its sub-haplogroup difficult. Notwithstanding, individual No. 4's sequences were tentatively classified as E1a1a based on diagnostic coding site changes. The greater diversity of haplogroup E in ISEA compared to Taiwan is consistent with the expansion of populations from the south [125,126]. However, E1a1a has a lower diversity in Philippine and Sulawesi populations than it does among Taiwanese aboriginals, despite making up a larger proportion of these populations [126]. While haplogroup E may be a marker of postglacial expansion, clades within this haplogroup, such as E1a1a, possibly reflect the impact of later population events [133]. The most plausible explanation for this observation is that the diffusion of the haplogroup E1a1a in the Gua Harimau population occurred after the Neolithic expansion.
Haplogroup Y2a1 was observed in Gua Harimau individuals No. 23 and 24. This haplogroup is also common in ISEA and shared by Philippine, Taiwanese aboriginal, and other ISEA populations [133]. Y2 has a slightly higher frequency in the Philippines compared with surrounding groups. The existence of this haplogroup suggests a genetic link between ISEA and Gua Harimau populations.
The two somewhat common and widespread Southeast Asian mtDNA haplogroups are B and R9, the latter encompassing haplogroup F, with F1a being widespread in Southeast Asia. The "Early Train" hypothesis [134] claimed large scale late Pleistocene/early Holocene dispersals from MSEA into Sunda, and helps explain the distribution of F1a. Gua Harimau individual No. 25 belongs to haplogroup F1a1a1, which has been observed in high frequencies in MSEA, suggesting a link between Gua Harimau and the mainland. There are several rare ancient haplogroups within macrohaplogroup N and its sub-haplogroups R and M in ISEA. The C14 AMS dating of Gua Harimau individual No. 26 (group R) places it at c. 3000 BP, or prior to major settlement by Metal period migrants, while individual No. 27 (group M) dates to the metal period at c. 2000 BP. It seems likely that these haplogroups are relics of the original Pleistocene inhabitants of ISEA. This view is based on evidence from the persistence of mtDNA ostensibly characterizing the earliest settlers of the region. Indeed, as discussed above in the context of cranial morphometric analysis, the pre-Neolithic indigenous Gua Harimau population can potentially trace their maternal ancestry back to the first anatomically modern settlers of ISEA. It is noteworthy that the Gua Harimau gene pool consists of Austronesian (B4a1, B4c, E1a and Y2), mainland (F1a), and putative indigenous (R Ã and M Ã ) forms. The mtDNA analysis is limited in estimating the composition of the three lineages making up the Gua Harimau population as well as the manner in which they genetically changed over time at the site. Nuclear genome analyses are required in order to gain further detail.

Conclusions
The archaeological human remains from Gua Harimau cave, Sumatra, Indonesia provide evidence for at least two (cranially defined) and perhaps three (in the context of the ancient mtDNA results) distinct populations from two separate time periods, thus supporting the ADH or two-layer-model. The cranial data indicate that the pre-Neolithic occupants of (Early) Gua Harimau, who cluster with the Australo-Papuan series, were subsequently replaced by a population with close cranial affinities to present-day Austronesian speakers, including Taiwanese aboriginals, who possess Northeast Asian features to a certain extent. Further, it is apparent that the Neolithic Southern Chinese, represented by Xitou in Fujian Province, share close cranial affinities with both Austronesian speaking samples and the (Late) Gua Harimau series, supporting the view that their remote homeland was somewhere in Southern China. The results from the mtDNA is not consistent with the view (based on DNA studies of modern populations, [123,125]) of a single origin, stretching back into the Pleistocene, for the Gua Harimau population. While the two-later-model is well supported for MSEA [9,10,16], this study now provides substantive support for the value of the two-layer-model in also explaining the population history of ISEA. GH14 includes all mapped reads, and GH14 damaged includes the reads having C/T or G/A changes at 3 bases of sequence termini. White circle indicates the reads having mutations relating to haplogroup M7b1a. Those reads are relatively longer than other reads, and we considered that these are contaminants. (JPG)