Estimating genetic kin relationships in prehistoric populations

Archaeogenomic research has proven to be a valuable tool to trace migrations of historic and prehistoric individuals and groups, whereas relationships within a group or burial site have not been investigated to a large extent. Knowing the genetic kinship of historic and prehistoric individuals would give important insights into social structures of ancient and historic cultures. Most archaeogenetic research concerning kinship has been restricted to uniparental markers, while studies using genome-wide information were mainly focused on comparisons between populations. Applications which infer the degree of relationship based on modern-day DNA information typically require diploid genotype data. Low concentration of endogenous DNA, fragmentation and other post-mortem damage to ancient DNA (aDNA) makes the application of such tools unfeasible for most archaeological samples. To infer family relationships for degraded samples, we developed the software READ (Relationship Estimation from Ancient DNA). We show that our heuristic approach can successfully infer up to second degree relationships with as little as 0.1x shotgun coverage per genome for pairs of individuals. We uncover previously unknown relationships among prehistoric individuals by applying READ to published aDNA data from several human remains excavated from different cultural contexts. In particular, we find a group of five closely related males from the same Corded Ware culture site in modern-day Germany, suggesting patrilocality, which highlights the possibility to uncover social structures of ancient populations by applying READ to genome-wide aDNA data. READ is publicly available from https://bitbucket.org/tguenther/read.

An individual's genome is a mosaic of different segments inherited from our various archaeology, the analysis of IBD has the potential to provide an independent means to 25 test kinship behavior and social organization [22], but current methods would be 26 restricted to exceptionally well-preserved samples. In forensic science and practice, the 27 dominant approach has been to type several short tandem repeat (STR) markers, which 28 in most cases provide sufficient information for relatedness assessment, but the STRs 29 might be hard to type in degraded samples [23]. In addition to nuclear STRs, 30 mitochondrial and Y-chromosome haplogroups have been widely used to infer family 31 relationships (e.g. [15,16,24,25]), although they can only exclude certain direct 32 relationships since most mitochondrial and Y-chromosome haplogroups are relatively 33 common among unrelated individuals. These uniparental markers can be typed from 34 degraded samples, and can be used to exclude maternal or paternal relationships, but 35 not to infer the actual degree of relationship. Genome-wide data, however, can be 36 obtained from degraded samples at a higher success rate than STRs and it can be used 37 to confidently identify individuals [26]. 38 Single Nucleotide Polymorphism (SNP) data can be obtained from genotyping 39 experiments (e.g. SNP arrays or RAD sequencing), targeted capture [27], and 40 whole-genome shotgun sequencing (e.g. [28,29]). The field of ancient DNA has 41 developed rapidly over the last few years and allowed pivotal studies of the population 42 history of Europe [27][28][29][30][31][32][33][34][35][36][37] and the peopling of the Americas [36,38,39]. However, both 43 whole-genome shotgun sequencing (e.g. [29,31,32]) and genome-wide SNP capture 44 (e.g. [27,33]) usually achieve coverages <1x per informative site for most individuals 45 which makes diploid genotype calls at all sites virtually impossible. Methods to infer 46 relationships, however, rely on such ideal data to identify IBD blocks which is a major 47 limitation for applying these methods to ancient DNA data. 48 However, even low coverage data contain information about the degree of 49 relationship. To utilize this information, we developed READ (Relationship Estimation 50 from Ancient DNA), a heuristic method to infer family relationships up to second 51 degree from samples with extremely low coverage. The method is tested on publicly 52 available data with known relationship, which we sub-sample to resemble the properties 53 of degraded samples. We also apply our method to a number of ancient samples from 54 the literature and confidently classify individual pairs as being related. 55

56
Method Outline 57 The input for READ are a set of TPED/TFAM files [8] containing genotype calls for a 58 population. The biallelic SNP sites in that file would usually be from some externally 59 ascertained SNP panel (e.g. Human Origins array or 1000 genomes) and all SNPs are 60 2/19 assumed for be pseudo-haploid (i.e. one randomly sampled allele per individual and site) 61 as the low coverage in aDNA studies normally does not allow to call heterozygous 62 genotypes. We then divide the genome into non-overlapping windows of 1 Mbps each 63 and for each pair of individuals calculate the proportion of non-matching alleles inside 64 each window P 0. Similar to [40,41], the genome-wide distribution of P 0 is then 65 normalized using the average P 0 of an unrelated pair of individuals which accounts for 66 effects of SNP ascertainment and population diversity. Depending on the normalized 67 proportion of shared alleles, each pair of individuals is classified as unrelated, 68 second-degree (i.e. nephew/niece-uncle/aunt, grandparent-grandchild or half-siblings), 69 first-degree (parent-offspring or siblings) or identical individuals/identical twins (Fig. 1). 70 As a method with the goal to classify pairs of individuals, READ always outputs the 71 best fitting degree of relationship. To assess the certainty of each categorization, the 72 distance to the classification cutoffs are expressed as multiples of the standard error of 73 the mean (Z). 74  Simulations based on modern data with known relationship 75 READ's performance was tested on 1,326 individuals of 15 different populations from 76 the phase 3 data of the 1000 genomes project [42]. A total of 86,336 pairwise 77 comparisons were tested. The rates of false positives (unrelated individuals classified as 78 related) and false negatives (related individuals classified as unrelated) are highly 79 dependent on the amount of data available for pairwise comparison. READ showed an 80 overall good performance with false negative and false positive rates below four percent 81 for as little as 1,000 overlapping SNPs ( Fig. 2A). Fig. 3 shows how these SNP numbers 82 would relate to sequencing coverages of the two individuals compared. The proportion

Individual 1 SNPs
of related individuals that were classified as related but not to the correct degree 84 ("Wrong degree") increased with lower numbers of overlapping SNPs. Separating the 85 error rates between first and second degree relatives shows that most of this increase is 86 due to first degree relatives classified as second degree relatives when the number of 87 SNPs is low (Fig. 2B). False positive rates are low for both degrees of relationship and 88 false negative rate is below one percent for first degree relatives ( Fig. 2B and C). The 89 rate of false negatives is considerably high for second degree relatives and it increases up 90 to 39% for low numbers of SNPs (Fig. 2C). 1% [44][45][46] and careful data curation as well as filtering (see Discussion) should be able 105 to minimize the impact of other sources of genotyping errors.

107
To investigate READ's performance on empirical aDNA data, we analyzed a large 108 published genotype data set of 230 ancient Eurasians from the Mesolithic, Neolithic and 109 Bronze Age periods [33]. In accordance with the original publications [27,29,33], READ 110 inferred RISE507 and RISE508 to be the same individual and all nine known 111 relationships were correctly identified as first degree relatives (Table 1). In addition to 112 those, READ identified one additional pair of first degree relatives as well as six new  Fig. 2). The maximum number of SNPs is set to 1,156,468, identical to what has been used in the simulations and similar to the 1.2 million SNPs used in the empirical data set [33]. The calculation assumes a Poisson distribution of sequencing coverage across the genome [43].
Combining the information obtained from radiocarbon dating, READ, and 116 uniparental haplotypes can help to narrow down the possible form of relationship. For 117 instance, I0111 (female) and I1530 (male) are inferred (using READ) to be first degree 118 relatives, which means they are either full-siblings, mother/son or father/daughter. The 119 shared mitochondrial haplogroup (H3ao) makes father/daughter less likely (but not 120 impossible), and the slightly older radiocarbon date for I0111 (2475-2204 calBCE versus 121 2345-2198 calBCE [33]) makes mother/son more likely than siblings while not excluding 122 the latter.

123
READ identified an unknown pair of first degree relationship between two Srubnaya 124 individuals (I0360 and I0354). Notably, Mathieson et al (2015) [33] have excluded I0354 125 since she was an outlier compared to other Srubnaya individuals. The shared 126 mitochondrial haplogroup (U5a1) and the slightly older age of I0354 make her the 127 putative mother of I0360. The classification of I0360 and I0354 as first degree relatives 128 is probably genuine considering that READ has very low false positive rates. If this 129 prediction was a false positive, it would be very likely that they are at least second 130 degree relatives as the fraction of unrelated individuals wrongly classified as first 131 degrees is extremely low (Fig. 2B). Furthermore, a highly distinct genetic background of 132 one of the individuals should rather cause false negatives and not false positives, which 133 increases the likelihood that the two individuals are in fact related. I0354 could have 134 been a recent migrant to the region who produced offspring (I0360) with a local male, 135  Particularly interesting is a group of five related males from the Corded Ware site in 138 Esperstedt, Germany (Table 1, Fig. 5). Mathieson et al (2015) [33] described two first 139 degree relationships between I1540 and I1541 as well as between I1541 and I1538. 140 Notably, READ missed the second degree relationship between I1540 and I1538, which 141 is likely to be a false negative as the false negative rate for second degree relatives is 142 known to be substantial with low amounts of data (Fig. 2C) and the value for that pair 143 (0.91) is only 1.2 standard errors above the threshold for second degree relatives (0.9). 144 Identical radiocarbon dates do not help to indicate a chronological order, but based on 145 their Y-chromosomes (all likely R1a, S1 Table), one can suggest that they represent a 146 paternal line of ancestry. I1540 is classified as R1a1, but the Y-chromosomal marker 147 this call is based on (L120) is missing in individuals I1538 and I1541, so they could all 148 carry the same haplotype. In addition to these three individuals, I1534 is a second 149 degree relative of I1538 and I1541, who was carrier of R(xR1b) but a more detailed 150 classification was not possible due to the low coverage. I0104, who is a second degree 151 relative to I1541, might also carry the same Y-chromosome as I1534, I1538, I1540 and 152 I1541, but that cannot be determined due to low coverage in those individuals. 153 Generally, the data would be consistent with all five individuals carrying the same 154 Y-haplotype as there are no contradicting calls for R1a defining markers (S1 Table), 155 which would suggest paternal relationship among them. In total, 13 Corded Ware 156 individuals from Esperstedt were genotyped, nine of them were males. It is notable that 157 all five related Esperstedt individuals discussed here were males and only one pair of 158 related Corded Ware individuals from Esperstedt involved a female (I1539 and I1532; 159 Table 1).

191
Applying READ to aDNA data Several methods to estimate the degree of 192 relationship between pairs of individuals have been developed. For genome-wide diploid 193 data with low error rates, they successfully infer relationships up to 11th degree [11].

194
Since such data cannot be obtained from degraded samples, a loss in precision was 195 expected. Estimation of second degree relationships (i.e. niece/nephew-aunt/uncle, 196 grandparent-grandchild, half-siblings) is sufficient to identify individuals belonging to a 197 core family which were buried together. We can show that obtaining as little as 2,500 198 overlapping common SNPs is enough to classify up to second degree relationships from 199 effectively haploid data. The biggest limitations when using such low numbers of SNPs 200 is the high rate of false negatives for second degree relatives. READ can be considered 201 as a conservative tool that avoids false positives by having a relatively high false 202 negative rate which can be decreased substantially with more data. Missing some second 203 degree relationships seems preferable over wrongly inferring relationships for unrelated 204 individuals. A consequent advantage of our method is that it is very unlikely that first 205 degree relatives are classified as unrelated but some second degree relatives might be 206 wrongly classified as unrelated. Shared uniparental haplotypes or a test result close to 207 8/19 the threshold (e.g. less than two standard errors difference) could raise such suspicions 208 and might motivate additional sequencing of the samples in question. The amount of 209 overlapping SNPs depends on the genome coverage of both individuals (Fig. 3; e.g. two 210 0.1x individuals will have approximately the same amount of overlapping data as a 211 0.05x and a 0.2x individual or a 0.01x individual and a 1x individual). The number of 212 SNPs required for a positive classification as first degree can be obtained by shotgun 213 sequencing all individuals to an average genome coverage of 0.1x (Fig. 3), which is in 214 reach for most archaeological samples displaying some authentic DNA. More data would 215 be beneficial to avoid false negatives in the case of second degree relatives. Recently 216 developed methods for modern DNA, which use genotype-likelihoods to handle the 217 uncertainty of low to medium coverage data require 1-3x genome coverage to estimate 218 third degree relationships [47][48][49]. Such approaches are promising for well-preserved 219 samples but these coverages might not be within reach for most aDNA studies. Other 220 methods specifically designed for ancient DNA data either require larger population 221 sample sizes than READ [50], large reference data sets [41,51] or are not directly 222 designed to identify relatives and estimate their degrees [52].

223
READ does not explicitly model aDNA damage and it only considers one allele at 224 heterozygous sites. This implies that a careful curation of the data is required to avoid 225 errors due to low coverage, short sequence fragments, deamination damage, sequencing 226 errors and potential contamination. We recommend a number of well established 227 filtering steps when working with low coverage aDNA data [27-33, 53, 54]. To avoid spuriously mapping microbial contamination [55,56]. All downstream analysis should be 232 restricted to reads and bases with mapping and base qualities of 30 or higher to reduce 233 the potential effects of mismapping and sequencing errors [56,57]. To further reduce the 234 effect of sequencing errors, most aDNA studies only consider biallelic SNPs known to be 235 polymorphic in other populations, and call pseudo-haploid genotypes by randomly 236 sampling one read covering that position. Deamination damage can be avoided during 237 the data generation by enzymatic repair of damages [58], or later by computational 238 rescaling of base qualities before SNP calling [59] or by excluding all transition SNPs.

239
For humans, millions of polymorphic transversion sites are known across the 240 genome [42] still leaving substantial amounts of data for analyzing such data sets.

241
Furthermore, a range of methods exist to estimate human contamination of a particular 242 sample [60][61][62][63][64] and the analysis could be restricted to fragments displaying 243 characteristic damage to filter contamination [65,66]. Finally, each study could simulate 244 data exactly resembling the empirical data analyzed (fragment sizes, damages, 245 contamination) to evaluate how these factors would affect the downstream analysis [56]. 246 An important part of the READ pipeline is the normalization step. This step makes 247 the classification independent of within population diversity, SNP ascertainment and 248 marker density. This property, however, requires at least one additional and unrelated 249 individual from the same population and ideally the same data type to avoid batch  Fig. 6 shows that most (but 254 not all) groups from similar cultural and geographical backgrounds have relatively 255 similar normalization scores, but caution should be taken as strong misspecification of 256 the normalization value can cause false negatives or false positives (see Results section). 257 In practice, the relationships are not known a priori. For our data analysis, we assumed 258 that the median across all pairs of individuals from a population of more than four 259 9/19 samples represents a proxy of an unrelated pair (as the number of pairs is n(n−1) 2 ; e.g. 260 10 pairs for a sample size of 5), which we also set as the default mode for READ. The 261 implementation of READ offers other options as well since the median would not work 262 in cases like parent-child-trios (two first degree relationships, one unrelated), where the 263 maximum of all three comparisons should be used for normalization. Other methods 264 normalize by obtaining allele frequency data for a whole population [47,51], which 265 seems less feasible than obtaining just one unrelated individual (or a pair of unrelated 266 individuals from a surrogate population). Furthermore, prehistoric populations are 267 quite differentiated from modern groups [31,37,53]  radiocarbon dating allowed us to infer how two individuals were related to each other. 278 Additional information such as osteological data on the age of the samples or 279 stratigraphic information as burial location or depth could further help to assess the 280 direction of a kinship. Of particular interest was a group of five males from Esperstedt 281 in Germany who were associated with the Corded Ware culture -a culture that arose 282 after large scale migrations of males [68] from the east [27,29]. Around 50 Corded Ware 283 burials (Fig. 5B), six of them stone cists, were excavated near Esperstedt in the context 284 of road constructions in 2005 [27,69]. Characteristic Corded Ware pottery was found in 285 the graves and all male individuals had been buried on their right hand site [69]. 286 Interestingly, the central individual of the group of related individuals (I1541, Fig. 5A) 287 was buried in a stone cist approximately 700 meters from the graves of the other four 288 individuals which were all close to each other (Fig. 5B) [69]. The close relationship of 289 this group of only male individuals from the same location suggest patrilocality and 290 female exogamy, a pattern which has also been found from Strontium isotopes at 291 another Corded Ware site just 30 kilometers from Esperstedt [15] and suggested for the 292 Corded Ware culture in general [70]. This represents just one example of how the 293 genetic analysis of relationships can be used to uncover and understand social structures 294 in ancient populations. More data from additional sites, cultures and species other than 295 humans will offer various opportunities for the analysis of relationships based on 296 genome-wide data.

298
Approach to detect related individuals 299 Our approach is based on the methodology used by GRAB [14] which was designed for 300 unphased and diploid genotype or sequencing data. This approach divides the genome 301 into non-overlapping windows of 1 Mbps each and compares for a pair of individuals the 302 alleles inside each window. Each SNP is classified into three different categories: IBS2 303 when the two alleles are shared, IBS1 when only one allele is shared and IBS0 when no 304 allele is shared. The program calculates the fractions for each category (P 2, P 1 and P 0) 305 per window and, based on certain thresholds, uses them for relationship estimation.

306
GRAB can estimate relationships from 1st to 5th degree, but it has not been tested 307

10/19
with data from different SNP panels or populations [14]. 308 We assume that our input data stems from whole genome shotgun sequencing of an 309 ancient sample resulting in low coverage sequencing data. Therefore, we only expect to 310 observe one allele per individual and site which is either shared or not shared between 311 the two individuals. READ does not model aDNA damage, so it is expected that the 312 input is carefully filtered, e.g. by restricting to sites known to be polymorphic, by 313 excluding transition sites or by rescaling base qualities before SNP calling [59].

314
Analogous to GRAB [14], we partition the genome in non-overlapping windows of 1 315 Mbps and calculate the proportions of haploid mismatches and matches, P 0 and P 1, for 316 each window. Since P 0 + P 1 = 1, we can use P 0 as a single test statistic. The average 317 P 0 is calculated from the genome-wide distribution. To reduce the effect of SNP 318 ascertainment, population diversity and potential batch effects, each individual pair's 319 average P 0 scores are then normalized by dividing all values by the average 320 non-normalized P 0 score from an unrelated pair of individuals from the same 321 population ascertained in the same way as for the tested pairs. Such a normalization 322 step is not implemented in GRAB [14] suggesting that GRAB might be sensitive to 323 ascertainment bias and general population diversity. The normalization sets the 324 expected score for an unrelated pair to 1 and we can define classification cutoffs which 325 are independent of the diversity within the particular data set. We define three 326 thresholds to identify pairwise relatedness as unrelated, second-degree (i.e.

339
READ is implemented to classify pairs of individuals in certain categories, so it will 340 always output the best fitting degree of relationship. As a measurement of confidence of 341 that classification, we estimate the standard error of the mean of the distribution of 342 normalized P 0 scores and calculate the distance to the cutoffs in multiples of the 343 standard error (similar to a Z score).

344
Relationship Estimation from Ancient DNA (READ) was implemented in Python 345 2.7 [71] and GNU R [72]. The input format is TPED/TFAM [8] [42]. We used vcftools version 0.1.11 [73] to extract autosomal 352 biallelic SNPs with a minor allele frequency of at least 10% (1,156,468 SNPs in total -353 similar to the aDNA data set used for the empirical data analysis [33]; see below) and 354 to convert the data to TPED/TFAM files. The data set contains pairs of individuals 355 that were reported as related, 851 of them as first degree relationships and 74 as second 356 degree. We randomly sub-sampled 1000, 2500, 5000 and 50000 SNPs and also randomly 357 picked one allele per site in order to mimic extremely low coverage sequencing of ancient 358 11/19 samples. READ was then applied to these reduced data sets and the median of all 359 average P 0s per population was used to normalize scores assuming that this would 360 represent an unrelated pair. Individual pairs with known relationship and their degree 361 of relatedness are shown in S2 Table and S3 Table. Additionaly, we introduced different 362 error rates to the data to assess the possible effects of sequencing and mapping errors, 363 contamination and post-mortem damage. Error rates were introduced by randomly  [33]) with more than four individuals separately.

381
Shotgun sequencing data was also analyzed separately from SNP capture data to avoid 382 batch effects. The median of all average P 0s per group was used for normalization 383 assuming that this would represent an unrelated pair. Mathieson et al (2015) [33] report 384 nine pairs of related individuals and they infer all of them to be first degree relatives 385 without providing details on how they were classified. Y-chromosome haplotypes of the 386 five individuals shown in Fig. 5A were checked using samtools [75] (applying a minimum 387 mapping and base quality of 30) and marker information for the haplotypes R1a and 388 R1b from the International Society of Genetic Genealogy (http://www.isogg.org, 389 accessed January 16, 2017). The results are shown in S1 Table. 390 Supporting Information 391 S1