Subtypes of Native American ancestry and leading causes of death: Mapuche ancestry-specific associations with gallbladder cancer risk in Chile

Latin Americans are highly heterogeneous regarding the type of Native American ancestry. Consideration of specific associations with common diseases may lead to substantial advances in unraveling of disease etiology and disease prevention. Here we investigate possible associations between the type of Native American ancestry and leading causes of death. After an aggregate-data study based on genome-wide genotype data from 1805 admixed Chileans and 639,789 deaths, we validate an identified association with gallbladder cancer relying on individual data from 64 gallbladder cancer patients, with and without a family history, and 170 healthy controls. Native American proportions were markedly underestimated when the two main types of Native American ancestry in Chile, originated from the Mapuche and Aymara indigenous peoples, were combined together. Consideration of the type of Native American ancestry was crucial to identify disease associations. Native American ancestry showed no association with gallbladder cancer mortality (P = 0.26). By contrast, each 1% increase in the Mapuche proportion represented a 3.7% increased mortality risk by gallbladder cancer (95%CI 3.1–4.3%, P = 6×10−27). Individual-data results and extensive sensitivity analyses confirmed the association between Mapuche ancestry and gallbladder cancer. Increasing Mapuche proportions were also associated with an increased mortality due to asthma and, interestingly, with a decreased mortality by diabetes. The mortality due to skin, bladder, larynx, bronchus and lung cancers increased with increasing Aymara proportions. Described methods should be considered in future studies on human population genetics and human health. Complementary individual-based studies are needed to apportion the genetic and non-genetic components of associations identified relying on aggregate-data.

Introduction Differences in disease prevalence between Latinos and other populations are well established. Latin Americans and Latino Americans show in general higher incidence rates of gallbladder and stomach cancer, asthma and diabetes, and lower incidences of breast and prostate cancer than non-Hispanic Whites and African Americans [1][2][3][4][5][6][7][8][9][10]. Recently much attention has been paid to differences in disease susceptibility according to the individual proportions of Native American, European and African ancestry [11,12]. On average, Colombians and Puerto Ricans have higher percentages of African ancestry than Mexicans and Chileans, and this potentially translates into differential disease risks, with important implications to health policy [13]. This article focuses on a finer level of genetic heterogeneity, namely on the type of Native American ancestry. Even if two persons from Mexico and Chile have identical Native American proportions, let say equal to 50%, their Native American proportions originate from different native peoples, potentially resulting in unequal disease susceptibilities and prevalences. So far, Latino heterogeneity related to the type of Native American ancestry is not considered in disease prevention and management programs. Present results may contribute to changing this situation.
The genome of modern Chileans is the result of genetic admixture between Native Americans from two major indigenous peoples, the Mapuche and the Aymara, Spaniards who reached Chile in the mid-sixteenth century, African slaves who arrived in seventeenth century, and subsequent migrations in the nineteenth and twentieth centuries, mainly from Europe. Recent publications on genetic variability in Chile highlight the relevance of examining Native American ancestry to understand population history [13][14][15][16]. Based on genotype data from 313 Chileans, Eyheramendy et al. confirmed previously reported larger contributions of European men and Native American women, and attributed the increased Native American proportion in the South of the country to the late occupation of this territory by non-indigenous immigrants [14]. The predominance of Native American ancestry in the south of Chile was corroborated by Ruiz-Linares et al., who utilized 1561 Chilean genotypes to investigate the geographic variation in ancestry in a large Latin American study [13].
The goal of the present study is markedly different. We focus here on the relationship between top causes of mortality and the two main types of Native American ancestry in Chile. We conduct first an aggregate-data study based on 1805 subjects, and then validate an identified association between Mapuche ancestry and gallbladder cancer using individual data from 64 patients and 170 healthy controls with and without a family history. In total, the present study relies on genome-wide single nucleotide polymorphism data from 2,039 admixed Chileans and 639,789 country-wide registered deaths between 2005 and 2011. It aims to illustrate the need to consider fine-scale Latino heterogeneity to advance in the understanding of disease etiology, and to personalize healthcare. Our investigation benefited from 1) the geography of Chile, a country with over 4,300 km from north to south but only 200 km from east to west which shows large regional differences in disease-specific mortality rates, 2) the genetic architecture of Chileans which is largely composed of European and Native American ancestry and presents a low (3%) African component, and 3) the clear separation of the two investigated subcomponents of Native American ancestry which varies greatly from Northern to Southern Chile.  Fig 1D. The third principal component explained 0.3% of genetic variability and separated the Mapuche from the Aymara Native American subcomponents. A larger genetic variability was observed in the Mapuche compared to the Aymara reference group (dot dispersion in Fig 1B and 1D).

Results
Mapuche and Aymara individuals showed the lowest genetic distance among all four reference groups (Weir & Cockerham's Fst = 0.038). According to ADMIXTURE recommendations, the large number of SNPs available (more than 300,000 after quality control and merging with reference individuals) was more than sufficient to estimate ancestry proportions [17]. Fig 1E shows the correlation between Native American proportions estimated using HGDP, and the sum of Mapuche and Aymara proportions when four reference groups (African, European, Mapuche and Aymara) were considered. The correlation between the sum of Mapuche and Aymara proportions, and the sum of the two corresponding Native American proportions when the number of ancestral populations was set to four in an unsupervised ADMIXTURE analysis with three reference panels (African, European and HGDP) is shown in Fig 1F. Irrespective of the admixture estimation method (un/supervised) and the assumed number of ancestral populations (three/four), use of HDGP instead of Mapuche and Aymara reference individuals resulted in an underestimation of the Native American ancestry  Table 1 shows the estimated average Native American (HDGP reference), Mapuche and Aymara ancestry components in the aggregate-data study, and possible ancestry differences by age, gender, educational level, socioeconomic status, salary and each of the 15 Chilean regions. Complete results for African and European proportions are available as supplementary material. The mean Native American ancestry component in the 1805 individuals from the present aggregate-data study amounted to 40% (95%CI 37%-43%). This proportion was smaller than the sum of average Mapuche (40%, 95%CI 37%-42%) and Aymara (8%, 95%CI 4%-12%) proportions, confirming the necessity of suitable surrogates for ancestry estimation which reflect the actual composition of the study population. The average European and African percentages estimated using four reference groups were 49% (95%CI 47%-52%) and 3% (95%CI 2%-3%), respectively. Increasing Native American ancestry was associated with a lower socioeconomic status, and the largest Native American proportions were found in the north and in the south of Chile (Fig 2). The overall picture was markedly different when the Mapuche and Aymara subcomponents of Native American ancestry were separated. Males included in the aggregate-data study showed a 2% larger Mapuche component than participating women due to the larger proportion of men enrolled from the south compared to women. Increasing Mapuche ancestry associated with a lower educational level and with a lower socioeconomic status. For example, the average Mapuche component was 40%-6% = 34% in the ABC1 group (high middle class), compared to 40% in the D/E strata (semi-and un-skilled manual occupations, unemployed and lowest grade occupations). Mapuche ancestry showed large regional differences, with the highest proportion (54%) in the De los Ríos, and the lowest percentage (30%) in the De Arica y Parinacota regions (Fig 2). Opposite results were noticed for the Aymara ancestry component. Males included in the aggregate-data study showed a 2% lower Aymara component than women. Aymara ancestry differences by educational level and socioeconomic status did not reach statistical significance. The highest Aymara component was found in the De Arica y Parinacota (29%) and De Tarapacá (28%) regions, and the lowest proportion (6%) in the De los Ríos region.
In addition to regional estimates of African, European, Native American, Mapuche and Aymara ancestry components, Fig 2 shows regional mortality rates due to gallbladder cancer in Chile. The correlation between Mapuche proportions and gallbladder cancer mortality rates was striking. Table 2 provides estimates of the strength of association between the two leading causes of death in Chile-diseases of the circulatory system and neoplasms-and Native American, Mapuche and Aymara proportions. Corresponding results for all investigated disease categories, African and European ancestry components are provided as supplementary material. Since around 500 ICD10-categories were investigated, the following description of aggregatedata results highlights associations with probability values under 0.05/500 = 0.0001. The necessity of separating the Mapuche and the Aymara subcomponents due to possible masking of contrary effects was evident. For example, Native American ancestry did not show evidence of association with mortality by multiple valve diseases (P = 0.98), but a 1% increase in Mapuche ancestry translated into a 5.7% increased risk of death due to diseases in this category (P = 2×10 −6 ). Increasing Mapuche ancestry was also associated with increasing mortality risks due to atrial fibrillation and flutter, hypertensive heart disease, sequelae of cerebrovascular disease and myocardial infarction, and with a decreasing risk of death by other cerebrovascular diseases. Increasing Aymara ancestry associated with a lower mortality risk due to the majority of diseases of the circulatory system. The advantage of separating the Mapuche and Aymara ancestry subcomponents was also obvious for neoplasms. For example, Native American ancestry did not show any association with gallbladder cancer mortality (P = 0.26). By contrast, a 1% increase in the Mapuche proportion represented a 3.7% increased mortality risk Origin of Native American ancestry and mortality (P = 6×10 −27 ), and a 1% increase in Aymara ancestry translated into a 2.7% lower risk of death due to gallbladder cancer. Increasing Mapuche ancestry was associated with an increasing mortality due to gallbladder, esophagus and stomach cancers, malignant neoplasms without specification of site and myelodysplastic syndromes, and with a decreasing risk of bladder, larynx, skin, and bronchus and lung cancers. Opposite associations were noticed for Aymara ancestry, e.g. the mortality risk due to skin, bladder, larynx, bronchus and lung cancers increased with increasing Aymara proportions. Standardized mortality ratios for diseases of the respiratory and digestive systems, and for endocrine, nutritional and metabolic diseases by 1% increase in the Native American, Mapuche andAymara proportions are provided in Table 3. For example, a 1% increase in the Mapuche proportion translated into a 4.7% increased mortality risk due to asthma, but also with a -1.3% mortality risk by diabetes.
Results from the stepwise forward model selection to identify the most significantly associated ancestry components are shown in Fig 3. Native American ancestry was not selected for any disease. Mapuche ancestry showed the most significant associations with mortality rates due to 12 ICD10-disease categories, including asthma and gallbladder cancer, and the Aymara component was selected for 19 categories. No model selection resulted in the simultaneous inclusion of both the Mapuche and the Aymara proportions.
Panels A-C in Fig 4 show results from an exploratory genetic principal component analysis of genotypes of individuals included in the validation study. In agreement with results based on aggregate-data, the first principal component (7.1% of genetic variability) distinguished African from non-African ancestry, the second principal component (2.7% of variability) separated European and Native American ancestry components, and the third principal Maps with average regional African, European, Native American, Aymara and Mapuche proportions, and regional mortality rates due to gallbladder cancer in Chile. https://doi.org/10.1371/journal.pgen.1006756.g002 Origin of Native American ancestry and mortality component (0.4% of genetic variability) mirrored the Mapuche-Aymara ancestry axis. The genotypes of some individuals who developed gallbladder cancer (GBC, filled dots) were markedly close/similar to the genotypes of Mapuche reference individuals. Hazard ratios estimated in the validation study are shown in Table 4. The validation collective included 186 women and 48 men who showed similar risks of being diagnosed with gallbladder cancer (P = 0.66). Each 1% increase in the Mapuche proportion translated into a 2% increased risk of being diagnosed with gallbladder cancer (P = 0.04). Taking into account family structure and integrating the Chilean gallbladder cancer incidence rates in a Cox proportional hazards survival model, as implemented in the Mendel software, resulted in a hazard ratio equal to 1.02 (P = 0.05).
Sensitivity analyses identified no outliers with noticeable influence on regional estimates of the ancestry components; outlier plots are provided as supplementary material. Use of resampling techniques to account for the differential number of individuals per region in the aggregate-data study showed a marginal impact on the estimated standardized mortality ratio for gallbladder cancer; the increased mortality risk per 1% Mapuche proportion estimated by resampling was 3.3% (95% CI 2.5 to 4.1%).  Origin of Native American ancestry and mortality

ICD Description
Exclusion of variants of likely European descent in Mapuche reference individuals in order to artificially increase the minimum Native American proportion decreased the average Mapuche percentage in the aggregate-data study from 40% to 37%; the corresponding principal component analysis plots can be found in the supplementary material. The artificial increase Table 3. Total number of deaths and standardized mortality ratios (SMR) due to respiratory, digestive and endocrine diseases by 1% increase in the Native American (HGDP), Mapuche and Aymara proportions.  of the Native American proportion in Mapuche reference individuals resulted in a stronger relationship between Mapuche ancestry and mortality due to gallbladder cancer based on aggregated data (4.1% increased mortality risk per 1% increase in the Mapuche percentage), and it showed no effect on the hazard ratios estimated in the validation study. Inclusion of hospitalization rates due to cholecystectomy as additional explanatory variable in a multiple Poisson regression model resulted in a 3.0% increased mortality risk due to gallbladder cancer (95% CI 2.3 to 3.7%) per 1% increase in the Mapuche percentage.

ICD Description
As above mentioned, part of the validation collective-around half of the dataset-was previously investigated by Koshiol et al. and it did not include affected families [18]. The average Mapuche proportion in this subset of the validation collective amounted 47%, and the corresponding hazard ratio for gallbladder cancer per 1% increase in the Mapuche proportion was 1.026. The second half of the validation collective was newly recruited and it included the four families depicted in Fig 4D. The average Mapuche proportion in the second subset of the validation study was lower (40%), and the corresponding hazard ratio was higher (1.035) than in the complete validation collective. Although the difference between hazard ratios in the second Origin of Native American ancestry and mortality half and in the complete collective did not reach statistical significance, the increased hazard of gallbladder cancer after inclusion of affected families adds consistency to our findings.

Discussion
We report here for the first time on the relationship between leading causes of death and the type of Native American ancestry. Fortunately, awareness of the importance of accounting for genetic diversity within ethnic groups in public health is raising fast [19,20]. The need to estimate ancestry components using reference individuals who reflect the actual composition of the study population was also a relevant finding of the present investigation. Previous studies have demonstrated the need of accounting for fine-scale ancestry patterns to identify possible associations with biomedical traits [11,12,21]. Among Latinos, Puerto Ricans show higher mortality rates due to asthma than Mexican Americans [22][23][24][25]. Considering this difference, which could be attributed to the increased African component of Puerto Ricans, or to sub-continental differences in European ancestry with Mexicans, may be critical to individualize medicine approaches tailored to asthmatics from different ethnic groups. Another novelty of the present study was the identification of the Mapuche-Aymara ancestry axis as the third major component of genetic variability in Chile. The low genetic distance between Mapuche and Aymara reference individuals (Fst = 0.038) contrasts with larger genetic differences observed among other Native American indigenous peoples [21]. The geography and the settlement history of Chile, which was principally populated from North to South, likely explain this low genetic distance. Investigation of the subcomponents of Native American ancestry may help to refine previously identified associations with human disease. For example, combination of the Mapuche and Aymara subcomponents of Native American ancestry in Chile resulted in an overall underestimation of the Native American percentage. We identified an association between asthma mortality and Mapuche ancestry but, due to opposite effects for Mapuche and Origin of Native American ancestry and mortality Aymara proportions, the association between Native American ancestry and asthma did not reach statistical significance.
In agreement with previous studies, the first principal component of genetic variability among Chileans separated African from non-African contributions, and Chileans showed on average a low percentage of African ancestry [13,14]. The bulk of the study population was dispersed between Native American and European reference individuals, and this dispersion constituted the second component of genetic variability. Examination of the Mapuche and Aymara proportions permitted identification of novel associations that are relevant from both a social and a public health context. Increasing Mapuche ancestry was associated with a lower educational level and a lower socioeconomic status in Chile. From the point of view of public health, increasing Mapuche proportions were associated with increased mortalities due to certain types of cancer (gallbladder, esophagus and stomach tumors, malignant neoplasms without specification of site and myelodysplastic syndromes). Increasing Aymara percentages corresponded to increased mortality risks by bladder, larynx, skin, and bronchus and lung neoplasms. Although the relative contribution of socioeconomic factors, access to health services and therapy response to mortality differences in Chile needs further investigation, results from the present study suggest that finer gradation of Latino and Native American ancestry may have important implications for prevention and disease management, not only in Chile but in other parts of the world.
The possible confounding of some of the identified associations between ancestry proportions and disease-specific mortality risks by geographical, socioeconomic and cultural factors was a limitation of present results based on aggregate-data. Regarding geography, the association between Aymara ancestry and non-melanoma skin cancer was probably related to the high ultraviolet radiation in the north of Chile [26]. The relationship between Aymara proportions and mortality due to lung and bladder cancers could be attributed to a large extent to the high concentration of arsenic in drinking-water in the north of Chile [27]. Although confounding by socioeconomic factors cannot be ruled out, access to individual information on educational level, socioeconomic status and salary represented a clear advantage of the present aggregate-data study over plain ecological investigations [28]. This permitted adjustment of the estimated mortality risk ratios for possible confounders at both the individual (Table 1) and the regional levels (for example, hospitalization rates due to gallbladder removal as surrogate of access to the health system in sensitivity analyses). Culture and ancestry go often together, but it should be kept in mind that our study relied on admixed people with a continuous gradient of ancestry. In other words, two hypothetical individuals with 25% and 30% Aymara proportions would probably present a similar physical appearance and possibly be exposed to similar culture-related environments. Yet according to our findings, their mortality risks due to esophageal cancer would be markedly different, precisely 5 x 3.7% = 18.5%. Summing up, present aggregate-data results need future investigations based on individual data to apportion the different components of the complex relationship between genetic ancestry and human disease.
Consideration of individual ancestry proportions may boost the personalization of prevention and disease management programs [19][20][21]. Less healthy lipid profiles have been found in Mexican Americans than in European Americans, and this information has been translated into tailored prevention programs for associated diseases [29]. Hispanics and European Americans show higher rates of hepatitis C viral clearance after treatment than African Americans, and precision medicine has recognized the need of clinical trials with African American participants to decrease treatment disparities [30]. In agreement with our findings (Supplementary S2 Table), European ancestry has been associated with cardiovascular disease, and Europeanancestry specific susceptibility variants for heart failure have been identified, adding plausibility to reported associations based on aggregate-data [31,32]. It is relatively wellknown that race and ethnic group are inaccurate predictors of disease risk and response to treatment, but this imprecision has been largely attributed to the differential ancestry percentages. For example, Latinos from Mexico show average Native American (African) proportions equal to 56% (5%), compared to 29% (11%) for Colombian Latinos [13]. The present study goes one step further and demonstrates the added value of considering ancestral groups which reflect the actual composition of the study population. The average Native American proportions of Chileans in the De Arica y Parinacota and the De los Ríos regions were similar around (58%), but the Mapuche and Aymara percentages were markedly different, 54% Mapuche and 6% Aymara in the De los Ríos compared to 30% Mapuche and 29% Aymara in the De Arica y Parinacota regions, resulting in considerably different disease-specific mortality rates and associated healthcare needs.
Gallbladder cancer is a major health problem in Chile, where the disease represents the second leading cause of cancer death in women after breast cancer. While the association between Native American ancestry and gallbladder cancer is well established, an added value of the present findings lies in the refinement and accurate quantification of the strength of this association [1,18,[33][34][35][36]. We found that the mortality risk due to gallbladder cancer increased by 3.7% (95% CI of 3.1 to 4.3%) per 1% increase in the Mapuche proportion. The statistical analysis of validation data from sporadic and familial gallbladder cancer cases, and from unaffected individuals, was consistent with this association: a 1% increase in the Mapuche component corresponded to a 2% increased risk of developing gallbladder cancer. Resampling results suggested that the relationship between Mapuche ancestry and gallbladder cancer was robust against the differential number of individuals per region in the aggregate-data study. Sensitivity analyses revealed a minor contribution of the differential access to the health system to this association, and hinted that the use of Mapuche reference individuals with larger Native American proportions may result in stronger association effects.
These results may have important repercussions at the population and individual levels given the large variability of Mapuche proportions in the Chilean population. In the present study, the first (third) quartiles of Mapuche proportions were 28% (43%) in the aggregatedata, and 38% (49%) in the validation dataset, respectively. At the population level, the Chilean government financially supports gallbladder removal for individuals between ages 35 and 49 years to prevent gallbladder cancer; prophylactic cholecystectomy could be also indicated in other high risk populations [37,38]. The difference in Mapuche proportions between the De los Ríos and the De Arica y Parinacota regions translates into an expected 24x3.7 = 89% difference in gallbladder cancer mortality, which may point to the necessity of intensified prevention measures in southern regions of the country. At the individual level, people with relatives affected by gallbladder cancer could opt for estimating their risk of developing gallbladder cancer relying on individual ancestry percentages, and make medical decisions based on this information. Future studies are needed to develop and validate risk prediction models which incorporate genetic ancestry and other risk factors in order to personalize gallbladder cancer prevention.
In summary, admixed Chileans show two main types of Native American ancestry: Mapuche and Aymara. Increasing Mapuche proportions are specifically associated with an increased mortality due to asthma and gallbladder cancer, and with a decreased mortality due to diabetes. Increasing Aymara percentages correspond to increased risk of death by melanoma and bladder, larynx, skin, and bronchus and lung neoplasms. These results suggest that finer gradation of Native American ancestry has important implications for unraveling of disease etiology and disease prevention.

Ethics statement
Ethics approval was obtained from the Medical Faculties of the Universidad de Chile (approval #123-2012) and the Pontificia Universidad Católica de Chile (#11-159), and from Universidad de Tarapacá and University College London as previously described in Ruiz-Linares et al. [13]. All participants provided written informed consent prior to participation. Structured questionnaires applied to volunteers of the aggregate-data and validation studies are available upon request.

Reference individuals, and subjects in the aggregate-data and validation studies
Surrogates of African and European ancestry were 87 Yorubans in Ibadan, Nigeria, and 80 Utah residents with Northern and Western European ancestry from the 1000 Genome Project [39]. Native American reference individuals were 64 samples from the Americas in the Human Genome Diversity Project (HGDP) [40]. Nine Mapuche and nine Aymara individuals were selected to represent the two largest indigenous peoples in Chile based on the three following criteria: four grandparental Mapuche or Aymara surnames, estimated Native American proportion of at least 74% for Mapuche and at least 99% for Aymara reference individuals, and mitochondrial DNA haplogroups consistent with Mapuche (haplogroup C or D) or Aymara (haplogroup B) descent [41]. Table 1 shows demographic characteristics of the 1805 individuals in the aggregate-data study. The median age of study participants was 27 years and 39% were women. 2% belonged to the ABC1 socioeconomic group (high middle class) and 27% to the D/E stratum (semi-and un-skilled manual occupations, unemployed and lowest grade occupations). The Chilean regions with the largest numbers of participants were De Arica y Parinacota (44% of subjects), Metropolitana de Santiago (17%) and Del Biobío (9%). About 2/3 of recruited subjects were professional soldiers, with a relatively large proportion of men born in the south of Chile (Del Maule, Del Biobío and De la Araucanía regions). Traditionally, the majority of professional soldiers are recruited from among the middle classes, with minorities from the social elite (officer corps) and the lowest socioeconomic groups (illiterate citizens), representing well the general Chilean population. Part of the aggregate-data collective has been previously described by Ruiz-Linares et al. [13].
The individual-data study to validate the observed association between Mapuche ancestry and gallbladder cancer included baseline and genotype information from 234 persons, including 64 gallbladder cancer cases, four families with multiple affected members (N = 16 subjects) and 154 healthy individuals without a family history of GBC matched to cases by gender and region. Part of this validation collective has been previously described by Koshiol et al. [18] Table 4 shows selected characteristics of individuals in the validation study, and Fig 4D depicts the four family pedigrees.
Aggregated mortality and incidence data was obtained from the Chilean Department of Statistics and Health information (www.deis.cl). We considered 639,789 deaths in total registered between 2005 and 2011. Causes of death were grouped according to the tenth version of the International Classification of Diseases (www.who.int/classifications/icd/, ICD10). Chile is divided into fifteen regions, which are the first-level administrative division. Age-standardized mortality and incidence rates were calculated for each region with respect to the Chilean population census data from 2002. Only groups of diseases causing at least 100 deaths in Chile between 2005 and 2011 were considered, resulting in around 500 investigated ICD10-categories.

Genotyping and quality control
Blood samples were collected by certified phlebotomists and trained nurses. DNA was extracted following standard laboratory procedures. Participants in the aggregate-data study were genotyped with Illumina's Human610-Quad beadchip and participants in the validation study with Illumina's OmniExpress array. Both arrays include at least 700,000 genomewide single nucleotide polymorphisms (SNPs). Intentional duplicates and arrays with more than 5% missing genotypes were excluded. Genetic variants were filtered to exclude nonautosomal polymorphisms, variants with a missing call rate over 5%, and also variants with a minor allele frequency under 5%. After linkage disequilibrium-pruning at r 2 higher than 0.1, more than 35,000 variants were used for the subsequent genetic principal component and ancestry analyses.

Genetic principal component analysis and estimation of ancestry components in the aggregate-data and validation studies
Genetic principal component analyses were conducted using the eigenstrat function available at www.popgen.dk/software/index.php/Rscripts [42]. The ADMIXTURE software was used for supervised estimation of individual African, European, Native American, Mapuche and Aymara ancestry components relying on the above described references [17]. In particular, Native American surrogates were either the 64 American HGDP samples, or the nine Mapuche and nine Aymara selected references, both combined and in two separated groups.

Statistical analyses
The relationship between genetic ancestry and aggregated mortality data was investigated by multiple linear regression to estimate the expected regional ancestry proportions, followed by multiple Poisson regression to quantify the association between regional mortality rates and expected ancestry components [28]. Expected regional ancestry proportions were estimated using the model: where each ancestry proportion X depends on the product of a design matrix Z times a fixedeffect vector d, which includes an intercept and the response variables age, gender, educational level, socioeconomic status, salary and region, using the categories listed in Table 1. A stepwise forward model selection was carried out to identify the fixed factors most significantly associated with each ancestry component, fixing the significance level for entrance and for staying in the model to 0.1.
After building the linear regression model, multiple Poisson regression: was used to estimate standardized mortality ratios per 1% increase in ancestry proportions (exp(β)), treating repeatedly measured (once per year) disease-specific 2002-standardized mortality rates as response variable Y, and an intercept, gender and region as explanatory variables (region-level design matrixZ multiplied by the fixed-effect vector α), assuming a standard variance component covariance structure. Summing up, the relationship between genetic ancestry and aggregated mortality data was adjusted for potential confounders at both the individual level (linear regression) and regional level (Poisson regression).
Individual-data to validate the association between Mapuche ancestry and gallbladder cancer was analyzed by univariate Cox regression. Gallbladder-cancer-free survival was defined as the time interval between birth and diagnosis with gallbladder cancer, unaffected subjects at the time of interview were considered censored. Gender, educational level, region and individual Mapuche proportion were taken into account as explanatory variables using the categories listed in Table 4.

Sensitivity analyses
In order to examine the robustness of regional ancestry estimates against possible outliers, subjects were excluded one by one, and the corresponding estimated regional ancestry proportions were visually inspected.
The impact of the differential number of individuals per region (e.g., De Arica y Parinacota N = 794, De Magallanes y de la Antártica Chilena N = 5) on the estimated standardized mortality ratios for gallbladder cancer was evaluated by resampling. Regional ancestry estimates were assumed to be normally distributed with means equal to the expected regional ancestry proportions, and variances proportional to the corresponding standard errors. After drawing 50,000 samples from the corresponding normal distributions, standardized mortality rates were estimated and summarized by the median, 2.5 th and 97.5 th percentiles.
Estimated Native American proportions were relatively low for some Mapuche reference individuals (minimum 74%). In order to investigate the sensitivity of results to these low proportions, we filtered out variants of likely European descent in Mapuche reference individuals and reran statistical analyses. In detail, we calculated 80% confidence intervals (CIs) of the minor allele frequency in European, HGDP and Mapuche reference individuals. Variants with overlapping 80% CIs in European and HGDP reference groups, and with overlapping 80% CIs in European and Mapuche reference individuals were excluded from subsequent sensitivity analyses. This filtering resulted in more than 18000 selected variants after LD pruning, which artificially increased the Native American proportion of Mapuche reference individuals to 98% at least.
In order to disentangle possible mortality-ancestry associations attributable to collinearity among the ancestry components-the sum of African, European, Mapuche and Aymara proportions equals 100%-we conducted a stepwise forward model selection to identify the most significantly associated ancestry components. The significance level for entry and for staying in the model was set to 0.1. Model selection results were visualized in a Venn diagram.
To apportion the relative contributions of Mapuche ancestry and access to the Chilean health system, 2002-standardized hospitalization rates due to gallbladder removal were included into the multiple Poisson regression model as additional explanatory variable (10 cholecystectomies per 10 5 person-years), and standardized mortality ratios per 1% increase in ancestry proportions were re-estimated.
The inclusion of families in the validation study originated a dependency among observed variables-for example, estimated ancestry proportions tend to be similar within families, and consideration of actual incidence rates of gallbladder cancer may improve the accuracy of cancer-free survival estimates. The MENDEL software was used to take into account family relationships and real incidence rates of gallbladder cancer in Chile in validation analyses [43,44].
Data was analyzed using ADMIXTURE version 1.23, plink version 1.90b3s for Linux, the R software environment for statistical computing and graphics (versions 3.2.2 for Linux and 3.1.3 for Windows), MENDEL version 14.5, and SAS version 9.4 [42][43][44][45]. Computer code to reproduce all described analyses is provided as supplementary material.  (Tables 1 and S1). Dependent variable is the Mapuche proportion. Independent variables are age class, gender, educational level, socioeconomic status, salary and region. (DOCX) S2 Source Code (SAS). Analysis of the relationship between genetic ancestry and aggregated mortality data. Ancestry estimates and phenotype info from the aggregate-data study are used to estimate expected regional ancestry proportions by multiple linear regression. Dependent variable is the Native American (HGDP), Mapuche or Aymara proportion. Independent variables were selected using a stepwise forward model selection to identify those most significantly associated with the ancestry components and included age, gender, educational level, socioeconomic status, salary and region. Significance level for entrance and for staying in the model was fixed to 0.1. (DOCX) S3 Source Code (SAS). Estimation of standardized mortality ratios (SMR) due to a disease by 1% increase in the Native American (HGDP), Mapuche, Aymara, European and African proportions (Tables 2 and 3, S2-S13). The focus is on the relationship between genetic ancestry and aggregated mortality data. In SAS program 2 ancestry estimates and phenotype info from the aggregate-data study had been used to estimate expected regional ancestry proportions by multiple linear regression. Now, the association between regional mortality rates and expected ancestry components is quantified by multiple Poisson regression. Repeatedly measured (once per year) disease-specific 2002-standardized mortality rates are used as response variable, gender and region as explanatory variables. A standard variance component covariance structure is assumed. Here, only the association between gallbladder cancer mortality ratios and expected ancestry components is considered. Generalization to other diseases is straightforward. (DOCX) S4 Source Code (SAS). Estimation of hazard ratios of being diagnosed with gallbladder cancer in the validation study (Table 4). Univariate Cox regression is used. The time interval between birth and diagnosis with gallbladder cancer defines the gallbladder-cancer-free survival time. Every unaffected subject at the time of interview is censored. Explanatory variable is gender, educational level, region and individual Mapuche proportion (complete validation study and subgroups defined in sensitivity analyses). (DOCX) S5 Source Code (SAS). Validation pedigrees file according to Mendel standards. The MEN-DEL software is used to take into account family relationships and the incidence of gallbladder cancer in Chile validation analyses. Beforehand data for the validation study have to be made compatible with MENDEL's pedigree file definitions. (DOCX) S6 Source Code (Mendel). Survival analyses with Mendel. The MENDEL software is used to take into account family relationships and the incidence of gallbladder cancer in Chile validation analyses. (DOCX) S7 Source Code (SAS). Outlier analyses. In order to examine the robustness of regional ancestry estimates against possible outliers, subjects were excluded one by one, and the corresponding estimated regional ancestry proportions were visually inspected (S3 Fig). The program below just considers the situation for one specific regional Mapuche ancestry proportion, but can be easily adjusted to all other cases. To save time, parallel computing is recommendable. Please note that regional ancestry proportion estimates were assumed to be independent from each other, i.e. only individuals from the same region are influencing the respective regional ancestry estimate. (DOCX) S8 Source Code (SAS). Sensitivity analyses with resampling: Effect of the different number of individuals per region on the estimated standardized mortality ratios for gallbladder cancer. Regional ancestry estimates are assumed to be normally distributed with means equal to the expected regional ancestry proportions, and variances proportional to the corresponding standard errors. We consider only the case with Mapuche regional ancestry estimates. Application to other regional estimates is straightforward. Please note that regional ancestry estimates had been estimated with SAS program 2. For time reasons parallel computing is recommended. (DOCX) S9 Source Code (SAS). Sensitivity analysis to apportion the relative contributions of Mapuche ancestry and access to the Chilean health system to gallbladder cancer mortality.

S1 Source Code (SAS). Estimation of average Native American, Mapuche, Aymara, European and African proportions, and differences in the ancestry components by several phenotypes relying on a multivariate linear regression analysis
2002-standardized hospitalization rates due to gallbladder removal (cholecystectomy) were included into the multiple Poisson regression model as additional explanatory variable, and standardized mortality ratios per 1% increase in Mapuche ancestry proportions were re-estimated. Please note that gallbladder cancer mortality rates had been computed with S3 Source Code and regional Mapuche ancestry estimates with S2 Source Code. (DOCX) S10 Source Code (R). Sensitivity analysis to the Native American proportions of Mapuche reference individuals (S4 and S5 Figs). Estimated Native American proportions were relatively low for some Mapuche reference individuals (minimum 74%). In order to investigate the sensitivity of results to these low proportions, we filtered out variants of likely European descent in Mapuche reference individuals and rerun statistical analyses. In detail, we calculated 80% confidence intervals (CIs) of the minor allele frequency in European, HGDP and Mapuche reference. Variants with overlapping 80% CIs in European and HGDP reference groups, and with overlapping 80% CIs in European and Mapuche reference individuals were excluded from subsequent sensitivity analyses. The following code illustrates the computation of CIs for the Mapuche reference, adaptation to calculate the corresponding European and HGDP CIs is straightforward. (DOCX)