High-density LD-based structural variations analysis in ten Native and Mestizo Mexican populations

Adriana Griselda Mateos-Valenzuela; Mirvana Elizabeth González-Macías; Carlos Villa-Angulo; Diana Helena Reyes-Godoy; Juan Carlos Fernandez-Lopez; Rafael Villa-Angulo

doi:10.1371/journal.pone.0333193

Abstract

The main objective of this study was to perform a genome-wide characterization of Structural Variations (SV) based on the deviation of the expected short-range Linkage Disequilibrium (LD) between Single Nucleotide Polymorphisms (SNPs) in 10 Native and Mestizo Mexican populations. We used a panel of 785,663 SNP genotypes, sampled from 383 individuals, of which 71 belonged to ethnic populations and 312 belonged to mestizo populations. The total number of variations found among all populations was 4,375, involving an average of 19,438 SNPs per population, which corresponds to the 3.14% of the total average of SNPs per population. The mean SV size varied from 2,845–8,646 kb across populations (with a mean SV size of 6,161 kb over all populations) and an average of 50.14 SNPs per SV. By grouping all variations across all populations in the sample we defined 506 regions, from which in 54 (11%) regions the 10 populations coincided. The total number of genes covered by these variations was 8,443. And, from all genes we identified some specifically related to Mexican health, as the genes FTO and ABCA1 associated with obesity, with the adipose tissue function, and with the distribution of fat in Mexican population; the gene ELMO1 associated with the susceptibility to diabetic nephropathy and diabetes type II, among others. In summary, our results add new evidence in support of the hypothesis that SVs based on the deviation of the expected short-range LD between SNPs capture the structure and the demographic history of populations, and represent potential targets for association of SVs with population-specific diseases.

Citation: Mateos-Valenzuela AG, González-Macías ME, Villa-Angulo C, Reyes-Godoy DH, Fernandez-Lopez JC, Villa-Angulo R (2025) High-density LD-based structural variations analysis in ten Native and Mestizo Mexican populations. PLoS One 20(9): e0333193. https://doi.org/10.1371/journal.pone.0333193

Editor: Nancy Monroy-Jaramillo, INNN: Instituto Nacional de Neurologia y Neurocirugia Manuel Velasco Suarez, MEXICO

Received: June 16, 2025; Accepted: September 10, 2025; Published: September 25, 2025

Copyright: © 2025 Mateos-Valenzuela et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Genotype dataset is available under the term of a data transfer agreement to respect the privacy of the participants for the transfer of genetic data by contacting INMEGEN (http://www.inmegen.gob.mx/).

Funding: The author(s) received no specific funding for this work.

Competing interests: There are not conflicts of interest to declare.

Introduction

The current Mexican population is mainly composed of two groups: Native and Mestizo. Native group corresponds to Indigenous Mexican communities that have remained largely unblended for centuries, owing to their cultural beliefs, and their geographic distance from urban settlements of the modern Mestizo population. Mestizo group corresponds to the admixture between native and foreign populations, which originated as a result of several historical events, including Spain’s conquest of America in 15th century, and African and Asian settlers arriving to Mexico.

Mexican Mestizo population has been a historical trend of increasing, becoming the dominant group. This growth is mainly attributed to intermarriage between indigenous women and European men, as a process that began soon after the arrival of the conquistadors. On the other hand, native groups faced a significant decline following the arrival of Europeans due to factors like social disruption and disease, being mainly affected by outbreaks of smallpox, measles, and typhoid disease between the years of 1521 and 1580 leading to a notable drop in their populations. Additionally, some Native communities predominantly settled in specific regions (center and southeast of the country), assuring the viability of water and rich soil for cultivating crops like corn, beans, and cacao; adapting and surviving to distinct environmental conditions [1,2].

The 2020 population census in México, conducted by the National Institute of Statistics and Geography (INEGI), showed that the total Mexican population in the year of 2020 were 126,014,026 individuals, with around 78% belonging to Mestizo population, 19.4% identified as Indigenous groups (Natives), 2% of Afro-descendant heritage, and 0.6% belonging to other groups [3].

Over the past decade, demographic studies in México, complemented with genetic information have revealed a high level of genetic diversity with an uneven distribution of alleles frequencies that differ according to the analyzed geographic region [4–7]. These studies demonstrated that the evolution of the Mexican population has been shaped by a complex interplay of historical, epidemiological, social, cultural, demographic, and economic events.

Even when Mexican population groups have been sharing territory, cultural traditions and language for centuries, notable genetic differences have been found within the same geographic regions, reflected in contrasting physical and genetic traits. Then, recent studies have been performed to analyze the genetic structure of Native and Mestizo Mexican populations; and to establish patterns in genetic variations of native ancestry or derived from admixture that could be associated with complex medical conditions, as well as physical and physiological characteristics. The analysis of genetic variability is helping to characterize the diversity in Mexican population and its impact on health. However, up to date, previous studies of Mexican genome diversity have mainly been focused in Single Nucleotide Variants (SNVs), small insertions and deletions (indels), Heterozygosity, and genetic differentiation based on Fixation index [2,4,6–8]. However, variations of other type rather than single genomic position have not been analyzed genome-wide. High density SNP markers evenly distributed in the genome enable the detection of regions with significant LD deviation compared to the expected value, which have been interpreted as short-range genomic variations, and could help in future studies for assessing association with other type of SVs [9]. In this work, we used a panel of 785,663 SNPs genome-wide, sampling 383 individuals from 3 Native and 7 Mestizo Mexican populations, provided by the National Institute of Genomic Medicine (INMEGEN). We used these data to inspect the distribution of structural variations based on the short-range LD patterns genome-wide for all the populations in the sample.

Materials and methods

Description of the data

The dataset analyzed in this study was provided by the Mexican government’s National Institute of Genomic Medicine, INMEGEN (http://inmegen.gob.mx). Data consisted of a panel of 785,663 SNP genotypes, sampled from 383 individuals belonging to 3 Native and 7 Mestizo Mexican populations. Populations and individuals were distributed as follow: Native populations were Maya (30 individuals), Tepehuano (20 individuals), and Zapoteca (21 individuals). While Mestizo populations were Guanajuato (48 individuals), Guerrero (50 individuals), Sonora (48 individuals), Tamaulipas (17 individuals), Veracruz (50 individuals), Yucatan (49 individuals), and Zacatecas (50 individuals). Taking in to account the geographic region in which populations are seatled we grouped Mestizos in two clusters: one cluster called Central-Coast Mestizo encompassing Tamaulipas, Guerrero and Veracruz populations; and another cluster called Non-Central-Coast Mestizo encompassing Guanajuato, Sonora, Zacatecas and Yucatan populations.

These data were previously analyzed in a research conducted and published by Silva-Zolezzi and Moreno-Estrada [4,8]. Genotype dataset is available under the term of a data transfer agreement to respect the privacy of the participants for the transfer of genetic data, by contacting INMEGEN (http://www.inmegen.gob.mx/). Written informed consent was obtained from all participants under the research/ethics approval number (2007/06) issued by the INMEGEN Ethics Commission. Data were accessed on 16/03/2022.

Quality control filters

Quality Control filters were applied to data in order to guarantee a global quality over all the samples. All SNPs with Minor Allele Frequency (MAF) < 0.05 were removed. And, all SNPs that did not satisfy the Hardy-Weinberg equilibrium (P-value < 0.0001) were also removed. The initial number of SNPs was 7,856,630 from the 10 populations, and after filters, the finally number of SNPs was 6,195,350. It represents the 78.85% of the initial information.

LD measure

The LD of every pair of SNPs in each chromosome, within each population, was estimated using the Pearson correlation formula (r²):

Where p₁ and p₂ are the minor and major allele frequencies of SNP1 respectively, and q₁ and q₂ are the minor and major allele frequencies of SNP2 respectively. p₁₁ corresponds to the frequency of observing both minor alleles in the same individual throughout the entire population. In addition, to avoid errors due to the sample size of each population, the following correction was applied to the LD:

where n corresponds to the number of haplotypes in the sample [10], the r² values were estimated with the PLINK tool [11].

SV based on short-range LD

To estimate SVs across the entire genome, we implemented the definition of Short-Range LD-based SV made by Salomon, et al. [9]. It is defined as follows:

For each chromosome within each population, calculate the short-range (≤ 100Kb) LD (r²), sort the SNP pairs by distance and obtain a set of LD means (we called them expected means) using bins of 5Kb. Inspect all SNPs, from smallest to largest, looking for segments of at least 1 Kb, consisting of a set of at least 3 adjacent SNPs so that, for each SNP, r² within its neighbors in a 100 Kb range, to the right side of that SNP, are all bigger than, or all smaller than, their corresponding expected means, and their P-values from a t-test for equality of means are significant after a Benjamini-Hochberg multiple testing correction. In addition, to account just for homogeneously distributed regions consider only SNPs having at least 15 SNP neighbors within a 100Kb range. Call these segments: High-Density SV based on short-range LD [9].

Correction for multiple testing

A Benjamini-Hochberg correction for multiple testing [12] was applied to the set of P-values resulted from the application of the t-tests to each SNP with its neighbors in a range of 100 kb, in order to control the False Discovery Rate.The approach is as follows: first, all P-values are sorted from smallest to largest. Denote the i-th smallest P-value by P_(i), for each i between 1 and m (m is the total number of P-values), then, starting from the largest P-value P_(m), compare P_(m) with 0.5 x . Continue as long as P_(i) > 0.5 x . Let k be the first time when P_(k) is less than or equal to 0.5 x , and declare the differences corresponding to the smallest k P-values as significant.

Principal components analysis (PCA)

Using the number SVs found for each chromosome, in each population, vectors of 22 dimensions were generated and a PCA was applied looking for differentiation between populations [13]. PCA was performed using the R software [14].

Results

MAF distribution

To investigate how informative the SNPs sampled in the 10 populations were, we computed the distribution of MAF across all chromosomes and all populations. Fig 1 shows the averaged proportions grouping populations as Native (Zapoteca, Tepehuano, Maya), Central-Coast Mestizo (Tamaulipas, Guerrero, Veracruz), and Non-Central-Coast Mestizo (Guanajuato, Sonora, Zacatecas, Yucatan). Native group presented notably more monomorphic SNPs (~23%) than Mestizo groups (~5%). While Mestizo groups presented consistently bigger polymorphic proportions in the rest of the bins.

Download:

Fig 1. MAF distribution. Average proportions of SNPs of various frequencies by population group (including the intervals´ upper limit).

https://doi.org/10.1371/journal.pone.0333193.g001

S1 File shows MAF proportions from each population. As we can see, the 3 Native populations presented consistently lower polymorphic proportions (between 71% and 83%), compared to Mestizo populations (between 88% and 96%). However, a substantial fraction of loci are informative in all study populations.

The MAF values vary from a maximum of 0.227 (Sonora) to 0.177 (Zapoteca), which is a difference of 10% in the complete scale of 0.0 to 0.5. The average MAF decrease between populations is 1.1% (S2 File). Monomorphic SNPs were excluded from the following analyses.

Extent of LD

The average number of SNPs per population after quality control filters was 619,535. Therefore, they were used to evaluate the extent of LD in a range of 100 kb. The pairwise LD correlation coefficient r² was computed for every pair of SNPs within a range or 100 kb in each chromosome, within each population. Fig 2 shows the average of r² values using 5 kb bins. As we can see, the decline in LD as a function of distance is rapid, such that r² averages ~0.10 over 100 kb. Native populations show uniformly higher LD values relative to other populations. In shorter distances (5 kb), Zapoteca, Tepehuano, and Maya show higher r² values (~0.49 on average), while the rest of the populations show an average of ~0.37. In longer distances (100 kb), Zapoteca and Tepehuano show consistently higher values, with an average of ~0.156. The populations of Guanajuato, Guerrero, Sonora, Veracruz, Yucatan, and Zacatecas show the lowest values, with an average of 0.075. While the Maya and Tamaulipas populations show intermediate values, with an average of 0.115 (S3 File).

Download:

Fig 2. LD. Genome-wide LD decay in all populations.

https://doi.org/10.1371/journal.pone.0333193.g002

Estimation of SVs based on short-range LD

SV based on short-range LD were estimated using the definition from [9]. The algorithm is described in Materials and methods section. Table 1 details the SV characteristics for all populations. In summary, the total number SVs found among the 10 populations was 4,375, involving an average of 19,437.6 SNPs per population, which corresponds to the 3.14% of the total average of SNPs per population. We found that SV mean size varied from 2,844.93 to 8,646.07 kb across populations (with a mean SV size of 6,161.1 kb over all populations) and an average of 50.14 SNPs per SV.

Download:

Table 1. SV statistics genome-wide across all populations.

https://doi.org/10.1371/journal.pone.0333193.t001

The Maya, Tepehuano and Zapoteca populations showed the largest number of SVs, with 582, 678 and 730 respectively, while Tamaulipas, Guanajuato and Zacatecas showed the smallest number, with 288, 325 and 342 SVs, respectively. The average distance covered by SVs genome-wide was 2.4 Mb. The biggest SV was found in Guanajuato population, with 96.2 Mb size, while the smallest SV was found in Zapoteca population, with 1.12 kb size.

The average number of SVs per chromosome was 19.93. The chromosome with the highest average of SVs was chromosome 2, with 37.6 SVs, while the chromosome with the smallest average of SVs was chromosome 19, with 4 SVs.

In order to investigate the closeness in variability among populations, given the number of identified SVs; for each population we constructed a vector of 22 fields, where each field contained the number of SVs in a chromosome (S4 Table). PCA [13] was applied to these vectors. Fig 3 shows a plot of PC1 vs PC2.

Download:

Fig 3. PCA plot. PCA on SVs per chromosome vectors shows a clear differentiation between population groups.

https://doi.org/10.1371/journal.pone.0333193.g003

As we can see, the plot of PC1 vs PC2 shows a clear differentiation, based on the number of SVs per chromosome genome-wide, between the groups we investigated. For PC1, the three Native populations have positive loadings, while all Mestizo populations have negative loadings. Actually, the three main subgroups are clearly differentiated. In the first quadrant of the plot (PC1 and PC2 positive loadings) appear Maya and Zapoteca populations. In the second quadrant (PC1 negative and PC2 positive loadings) appears the Non-Central-Coast Mestizo subgroup. In the third quadrant (PC1 and PC2 negative loadings) appears the Central-Coast Mestizo subgroup. And, in the fourth quadrant (PC1 positive and PC2 negative loadings) appears Tepehuano population. This result supports the hypothesis stated previously [6] that demographic and adaptive processes that occurred in these groups shaped their genetic architecture.

In order to identify genes involved in our defined SVs, we did an inspection in NCBI (https://www.ncbi.nlm.nih.gov/) looking for genes overlapping the defined SVs. For each defined SV in each chromosome, across all populations, we investigated if a gene was overlapping the SV; and for each overlapping gene we investigated if there were more SVs in the rest of populations, overlapped by the same gene. S5 Table presents for each chromosome, the genes, the start and end position of each gene, the gene ID, the gene function, and the populations with SVs covered by the gene. In total, we identified 8,443 genes covering SVs across all populations. From this total, 790 genes contained SVs in all populations; 2,149 genes contained SVs in only one of the populations, and 5,504 genes contained SVs in more than one but less than ten populations. When inspecting by group, we found that 26 genes contain SVs across all Mestizo populations; while 14 genes contained SVs across all Native populations.

The last analysis was to look within our defined SVs for regions that were consistent between populations. We found 506 regions, from which in 54 (11%) regions the 10 populations coincided, while in 90 (18%) regions just 1 population coincided (S6 Table).

Discussion

In this work, we implemented an intuitive and simple definition of SV based on the deviation of the expected short-range LD between SNPs, recently introduced by Salomon, et al., [9]. In his work, Salomon studied SVs in the cattle genome, and concluded that the short-range LD patterns captured by these SVs resume enough genetic information to discern relatedness of breeds given the geographic regions in which they are evolving. In this work, we studied SVs in 10 Native and Mestizo human Mexican populations, and we concluded that our results add new evidence in support of the same conclusion. SVs inferred from our data showed a good populations differentiation when applied to clustering analysis, suggesting that they are defined by the population structure and the demographic history, as Ávila et al. 2020 [5] has previously stated.

From the MAF analysis, the three Native populations presented consistently lower polymorphic proportions (between 71% and 83%), compared to Mestizo populations (between 88% and 96%). And, within Natives, Zapotecas population resulted with the highest proportion of monomorphic SNPs, while at the same time showed consistently bigger LD than the rest populations in the range of 100kb, and presented the highest number of SVs genome-wide. This result can be explained due to the fact that Zapoteca has been historically the most isolated people due to geographic and cultural barriers. Then, is the ethnic group with the lowest genetic exchange.

The total number of SVs found in our study was 4,375 which is a much lower quantity than those reported in previous studies of variations in Native and Mestizo Mexican populations. The reason of this difference is that previous studies have mainly focused on SV variants type, while in our study we focused on short-range LD genomic variations defined by regions that consist of at least three adjacent SNPs that present significant LD deviation compared to the expected value. Avila et al., Romero et al., and Aguilar et al. [5–7], for example, reported 120,735, 332,272 and 8.68 million of the SNV type, in Native and Indigenous Mexican populations, respectively. The difference in the amount of our SVs compared to the SNVs found in previous studies is in accordance with Chiang et al., 2017 [15], who mentions that approximately 5,000–10,000 SVs can be found in the human genome, since they are less abundant compared to SNVs, which can exceed the 4 million.

PCA analysis defined a strong axis of variation separating Native from Mestizo populations when observed from PC1. But when observed from both PC1 and PC2, two clusters are clearly visible. One of the clusters is formed by Tamaulipas, Guerrero and Veracruz populations, all situated in the central coast of Mexico, which is an area with a rich and complex history. Before the arrival of the European conquerors the central coast was the home of diverse native ethnic groups, such as Huastecos, Otomies, Nahuas, Totonacos, and Mixtecos. After the conquest and the beginning of the miscegenation all these ethnic groups saw their population decrease dramatically. The other cluster is formed by Guanajuato, Zacatecas, Yucatan and Sonora populations, all situated out of the central coast area, and with a slightly different history after the conquest, since miscegenation started mainly in the central coast area and spreaded out to the rest of Mexico. This PCA result reflect the role that the geographic location, the degree of isolation, and the interbreeding within Native and between Native and foreign populations have played in the actual conformation of the modern Mexican population diversity, and are in agreement with the study published by Sohail, et al., 2023 [16], who analyzed the genetic and environmental factors of actual Mexican populations, and reported that ancestry differences in Mexican population are present mainly in the center and south of the country, which is the area with the highest american ancestry; besides genetic flow found in the Atlantic-coastal corridor, determining the genetic influence between the center-south and south-east Mexican populations [16].

On their side, the three Native populations appear separated when observed from both PC1 and PC2. At the top right side of the plot in Fig 3, with loadings between 60 and 80 for PC1 and between 5 and 7.5 for PC2, appears Zapoteca population, while in the bottom right side, with loadings of 60 for PC1 and between 12.5 and 15 for PC2, appears Tepehuano population. And, close to top center with loadings between 0 and 40 for PC1 and between 7.5 And 10 for PC2, appears Maya population. Assuming this separation of Native populations as an indicative of genetic differences between Native populations due to population structure and demographic history, then our result is in agreement with the result reported by Moreno-Estrada [4], who analyzed SNVs in 20 Mexican Indigenous groups and reported that they differentiate in three main areas according to its geographic location: the northerns considering the Tepehuano, the southerns considering Zapoteca and Maya. In addition, Romero et al., [6], reported that Native groups are formed by three main ancestral components: a northern, a southern and a Mayan component; where Tepehuano population shows the highest proportion values of the northern component, the Zapoteca population shows highest proportion values of the southern component, and the Maya population shows their own component. This separation was reported too, across all original inhabitants of the American continent population (generally called Native Americans) by Raghavan et al., [17] who, using ancient and modern genome-wide data found that the ancestors of all present-day Native Americans entered the Amercias as a single migration wave from Siberia no earlier than 23 thousand years ago, and from that migration, there was a diversification of ancestral Native Americans leading to the formation of northern and southern branches. Other anthropological studies realized by González et al., [18] indicated that the first human vestiges in México were presented approximately 12,000 years ago, in the center zone of Mexico, and the peninsula of Yucatán.

Next, we did an inspection in NCBI looking for genes involved in our defined SVs. In total we identified 8,443 genes covering SVs across all populations. From this total, 790 genes contained SVs in all populations; 2,149 genes contained SVs in only one of the populations, and 5,504 genes contained SVs in more than one but less than ten populations. When inspecting by group, we found that 26 genes contain SVs across all Metizo populations; while 14 genes contained SVs across all Native populations. In addition to the genes reported by other studies [7,19], we identified genes like FTO and ABCA1 in the 10 populations. These genes have been related to obesity, to the adipose tissue function, and to the distribution of the fat in mexican population [20,21]. Another gene was ELMO1, that is associated with the susceptibility to diabetic nephropathy and diabetes type II, as documented by Aguilar et al., [7]. In addition, we identified the gene BRCA2 in the Guerrero, Maya, and Zapoteca populations, and the gene IKBKB in Guanajuato, Sonora, Tamaulipas, Tepehuano and Yucatan populations. These genes were previously reported by Moreno et al., [4], and Aguilar et al. [7], as associated to the development of breast and ovarian cancer in the Mexican population. Additionaly, we found genes located in only one of the populations, like the MCHR1, implicated in the neural regulation of food consumption, found only in Zacatecas population. The gene SLC30A8, which confers certain disposition to non-insulin-dependent diabetes, found only in the Guanajuato population. And, the gene IGF2 BP2, which plays an important role with metabolism associated to the susceptibility of diabetes, found only in the Tamaulipas population.

Conclusion

We present the first genome-wide characterization of SVs based on the deviation of the expected short-range LD between SNPs in Native and Mestizo Mexican populations. The total number of variations found among all populations was 4,375, involving an average of 19,438 SNPs per population, which corresponds to the 3.14% of the total average of SNPs per population. The mean SV size varied from 2,845–8,646 kb across populations (with a mean SV size of 6,161 kb over all populations) and an average of 50.14 SNPs per SV. By grouping all variations across all populations in the sample we defined 506 regions, from which in 54 (11%) regions the 10 populations coincided. The total number of genes covered by these variations was 8,443. And, from all genes we identified some specifically related to Mexican health, as the genes FTO and ABCA1 associated with obesity, with the adipose tissue function, and with the distribution of fat in Mexican population; the gene ELMO1 associated with the susceptibility to diabetic nephropathy and diabetes type II, among others. Finally, our results add new evidence in support of the hypothesis that SVs based on the deviation of the expected short-range LD between SNPs capture the structure and the demographic history of populations, and represent potential targets for association of SVs with population-specific diseases. In further analyses, the inclusion of phenotypic data would be necessary in order to establish association of SVs with physical traits or disease-related outcomes.

Supporting information

S1_File. MAF Distribution and Minor Allele Frequency distribution.

https://doi.org/10.1371/journal.pone.0333193.s001

(PDF)

S2_File. Average minor allele frequencies (MAF) per population in the study.

https://doi.org/10.1371/journal.pone.0333193.s002

(PDF)

S3_File. Total average of r² per population and LD decay using 5 kb bins.

https://doi.org/10.1371/journal.pone.0333193.s003

(PDF)

S4_Table. Table of Number of SVs per chromosome in each population.

https://doi.org/10.1371/journal.pone.0333193.s004

(XLSX)

S5_Table. Table of Genes involved in Structural Variations.

https://doi.org/10.1371/journal.pone.0333193.s005

(XLSX)

S6_Table. Table of Structural Variaton Regions.

https://doi.org/10.1371/journal.pone.0333193.s006

(XLSX)

Acknowledgments

To INMEGEN who provided the data used in this analysis. The Scientific, Ethic, and Bio-Security Review Boards from the National Institute of Genomics Medicine approved this study. We are grateful to the National Council for Humanities, Sciences, and Technologies of México (CONAHCYT) for supporting a scholarship for doctoral studies to Adriana Griselda Mateos-Valenzuela.

References

1. Rubi-Castellanos R, Martínez-Cortés G, Muñoz-Valle JF, González-Martín A, Cerda-Flores RM, Anaya-Palafox M, et al. Pre-Hispanic Mesoamerican demography approximates the present-day ancestry of Mestizos throughout the territory of Mexico. Am J Phys Anthropol. 2009;139(3):284–94. pmid:19140185
- View Article
- PubMed/NCBI
- Google Scholar
2. Martínez-Cortés G, Salazar-Flores J, Fernández-Rodríguez LG, Rubi-Castellanos R, Rodríguez-Loya C, Velarde-Félix JS, et al. Admixture and population structure in Mexican-Mestizos based on paternal lineages. J Hum Genet. 2012;57(9):568–74. pmid:22832385
- View Article
- PubMed/NCBI
- Google Scholar
3. Instituto Nacional de Estadística y Geografía. Principales resultados del Censo de Población y Vivienda 2020. Estados Unidos Mexicanos. 2022;173. Available from: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.inegi.org.mx/contenidos/productos/prod_serv/contenidos/espanol/bvinegi/productos/nueva_estruc/702825198060.pdf
4. Moreno-Estrada A, Gignoux CR, Fernández-López JC, Zakharia F, Sikora M, Contreras AV, et al. Human genetics. The genetics of Mexico recapitulates Native American substructure and affects biomedical traits. Science. 2014;344(6189):1280–5. pmid:24926019
- View Article
- PubMed/NCBI
- Google Scholar
5. Avila-Arcos C, Mcmanus KF, Sandoval K, Rodríguez-Rodríguez JE, Villa-Islas V, Martin AR. Population history and gene divergence in native Mexicans inferred from 76 human exomes. Molecular Biology and Evolution. 2020.
- View Article
- Google Scholar
6. Romero-Hidalgo S, Ochoa-Leyva A, Garcíarrubio A, Acuña-Alonzo V, Antúnez-Argüelles E, Balcazar-Quintero M, et al. Demographic history and biologically relevant genetic variation of Native Mexicans inferred from whole-genome sequencing. Nat Commun. 2017;8(1):1005. pmid:29044207
- View Article
- PubMed/NCBI
- Google Scholar
7. Aguilar-Ordoñez I, Pérez-Villatoro F, García-Ortiz H, Barajas-Olmos F, Ballesteros-Villascán J, González-Buenfil R, et al. Whole genome variation in 27 Mexican indigenous populations, demographic and biomedical insights. PLoS One. 2021;16(4):e0249773. pmid:33831079
- View Article
- PubMed/NCBI
- Google Scholar
8. Silva-Zolezzi I, Hidalgo-Miranda A, Estrada-Gil J, Fernandez-Lopez JC, Uribe-Figueroa L, Contreras A, et al. Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico. Proc Natl Acad Sci U S A. 2009;106(21):8611–6. pmid:19433783
- View Article
- PubMed/NCBI
- Google Scholar
9. Salomon-Torres R, Matukumalli LK, Van Tassell CP, Villa-Angulo C, Gonzalez-Vizcarra VM, Villa-Angulo R. High density LD-based structural variations analysis in cattle genome. PLoS One. 2014;9(7):e103046. pmid:25050984
- View Article
- PubMed/NCBI
- Google Scholar
10. Villa-Angulo R, Matukumalli LK, Gill CA, Choi J, Van Tassell CP, Grefenstette JJ. High-resolution haplotype block structure in the cattle genome. BMC Genet. 2009;10:19. pmid:19393054
- View Article
- PubMed/NCBI
- Google Scholar
11. Purcell S. PLINK. http://pngu.mgh.harvard.edu/purcell/plink/. 2007.
12. Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I. Controlling the false discovery rate in behavior genetics research. Behav Brain Res. 2001;125(1–2):279–84. pmid:11682119
- View Article
- PubMed/NCBI
- Google Scholar
13. Jolliffe IT. Principal component analysis. 2nd ed. New York: Springer-Verlag New York. 2002.
14. R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. 2021.
15. Chiang C, Scott AJ, Davis JR, Tsang EK, Li X, Kim Y, et al. The impact of structural variation on human gene expression. Nat Genet. 2017;49(5):692–9. pmid:28369037
- View Article
- PubMed/NCBI
- Google Scholar
16. Sohail M, Palma-Martínez MJ, Chong AY, Quinto-Cortés CD, Barberena-Jonas C, Medina-Muñoz SG, et al. Mexican Biobank advances population and medical genomics of diverse ancestries. Nature. 2023;622(7984):775–83. pmid:37821706
- View Article
- PubMed/NCBI
- Google Scholar
17. Raghavan M, Steinrücken M, Harris K, Schiffels S, Rasmussen S, DeGiorgio M, et al. Genomic evidence for the pleistocene and recent population history of native Americans. Science. 2015;349(6250).
- View Article
- Google Scholar
18. Gonzalez S, Jiménez-López JC, Hedges R, Huddart D, Ohman JC, Turner A, et al. Earliest humans in the Americas: new evidence from México. J Hum Evol. 2003;44(3):379–87. pmid:12674097
- View Article
- PubMed/NCBI
- Google Scholar
19. Ramirez-Garcia SA, Cabrera-Pivaral CE, Huacuja-Ruiz L, Flores-Alvarado LJ, Pérez-García G, González-Rico JL, et al. Implications in primary health care of medical genetics and genomic in type 2 diabetes mellitus. Rev Med Inst Mex Seguro Soc. 2013;51(3):e6-26. pmid:23883470
- View Article
- PubMed/NCBI
- Google Scholar
20. Villalobos-Comparán M, Antuna-Puente B, Villarreal-Molina MT, Canizales-Quinteros S, Velázquez-Cruz R, León-Mimila P. Interaction between FTO rs9939609 and the Native American-origin ABCA1 rs9282541 affects BMI in the admixed Mexican population. BMC Med Genet. 2017;18(1):1–6.
- View Article
- Google Scholar
21. Chama-Avilés A, Flores-Viveros KL, Cabrera-Ayala JA, Aguilar-Galarza A, García-Muñoz W, Haddad-Talancón L, et al. Identification and association of single nucleotide polymorphisms of the fto gene with indicators of overweight and obesity in a young Mexican population. Genes (Basel). 2023;14(1):159. pmid:36672899
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Rubi-Castellanos R, Martínez-Cortés G, Muñoz-Valle JF, González-Martín A, Cerda-Flores RM, Anaya-Palafox M, et al. Pre-Hispanic Mesoamerican demography approximates the present-day ancestry of Mestizos throughout the territory of Mexico. Am J Phys Anthropol. 2009;139(3):284–94. pmid:19140185
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Martínez-Cortés G, Salazar-Flores J, Fernández-Rodríguez LG, Rubi-Castellanos R, Rodríguez-Loya C, Velarde-Félix JS, et al. Admixture and population structure in Mexican-Mestizos based on paternal lineages. J Hum Genet. 2012;57(9):568–74. pmid:22832385
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Instituto Nacional de Estadística y Geografía. Principales resultados del Censo de Población y Vivienda 2020. Estados Unidos Mexicanos. 2022;173. Available from: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.inegi.org.mx/contenidos/productos/prod_serv/contenidos/espanol/bvinegi/productos/nueva_estruc/702825198060.pdf

[ref4] 4. Moreno-Estrada A, Gignoux CR, Fernández-López JC, Zakharia F, Sikora M, Contreras AV, et al. Human genetics. The genetics of Mexico recapitulates Native American substructure and affects biomedical traits. Science. 2014;344(6189):1280–5. pmid:24926019
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. Avila-Arcos C, Mcmanus KF, Sandoval K, Rodríguez-Rodríguez JE, Villa-Islas V, Martin AR. Population history and gene divergence in native Mexicans inferred from 76 human exomes. Molecular Biology and Evolution. 2020.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref6] 6. Romero-Hidalgo S, Ochoa-Leyva A, Garcíarrubio A, Acuña-Alonzo V, Antúnez-Argüelles E, Balcazar-Quintero M, et al. Demographic history and biologically relevant genetic variation of Native Mexicans inferred from whole-genome sequencing. Nat Commun. 2017;8(1):1005. pmid:29044207
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref7] 7. Aguilar-Ordoñez I, Pérez-Villatoro F, García-Ortiz H, Barajas-Olmos F, Ballesteros-Villascán J, González-Buenfil R, et al. Whole genome variation in 27 Mexican indigenous populations, demographic and biomedical insights. PLoS One. 2021;16(4):e0249773. pmid:33831079
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref8] 8. Silva-Zolezzi I, Hidalgo-Miranda A, Estrada-Gil J, Fernandez-Lopez JC, Uribe-Figueroa L, Contreras A, et al. Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico. Proc Natl Acad Sci U S A. 2009;106(21):8611–6. pmid:19433783
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref9] 9. Salomon-Torres R, Matukumalli LK, Van Tassell CP, Villa-Angulo C, Gonzalez-Vizcarra VM, Villa-Angulo R. High density LD-based structural variations analysis in cattle genome. PLoS One. 2014;9(7):e103046. pmid:25050984
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref10] 10. Villa-Angulo R, Matukumalli LK, Gill CA, Choi J, Van Tassell CP, Grefenstette JJ. High-resolution haplotype block structure in the cattle genome. BMC Genet. 2009;10:19. pmid:19393054
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref11] 11. Purcell S. PLINK. http://pngu.mgh.harvard.edu/purcell/plink/. 2007.

[ref12] 12. Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I. Controlling the false discovery rate in behavior genetics research. Behav Brain Res. 2001;125(1–2):279–84. pmid:11682119
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref13] 13. Jolliffe IT. Principal component analysis. 2nd ed. New York: Springer-Verlag New York. 2002.

[ref14] 14. R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. 2021.

[ref15] 15. Chiang C, Scott AJ, Davis JR, Tsang EK, Li X, Kim Y, et al. The impact of structural variation on human gene expression. Nat Genet. 2017;49(5):692–9. pmid:28369037
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref16] 16. Sohail M, Palma-Martínez MJ, Chong AY, Quinto-Cortés CD, Barberena-Jonas C, Medina-Muñoz SG, et al. Mexican Biobank advances population and medical genomics of diverse ancestries. Nature. 2023;622(7984):775–83. pmid:37821706
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref17] 17. Raghavan M, Steinrücken M, Harris K, Schiffels S, Rasmussen S, DeGiorgio M, et al. Genomic evidence for the pleistocene and recent population history of native Americans. Science. 2015;349(6250).
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref18] 18. Gonzalez S, Jiménez-López JC, Hedges R, Huddart D, Ohman JC, Turner A, et al. Earliest humans in the Americas: new evidence from México. J Hum Evol. 2003;44(3):379–87. pmid:12674097
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref19] 19. Ramirez-Garcia SA, Cabrera-Pivaral CE, Huacuja-Ruiz L, Flores-Alvarado LJ, Pérez-García G, González-Rico JL, et al. Implications in primary health care of medical genetics and genomic in type 2 diabetes mellitus. Rev Med Inst Mex Seguro Soc. 2013;51(3):e6-26. pmid:23883470
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref20] 20. Villalobos-Comparán M, Antuna-Puente B, Villarreal-Molina MT, Canizales-Quinteros S, Velázquez-Cruz R, León-Mimila P. Interaction between FTO rs9939609 and the Native American-origin ABCA1 rs9282541 affects BMI in the admixed Mexican population. BMC Med Genet. 2017;18(1):1–6.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref21] 21. Chama-Avilés A, Flores-Viveros KL, Cabrera-Ayala JA, Aguilar-Galarza A, García-Muñoz W, Haddad-Talancón L, et al. Identification and association of single nucleotide polymorphisms of the fto gene with indicators of overweight and obesity in a young Mexican population. Genes (Basel). 2023;14(1):159. pmid:36672899
View Article
PubMed/NCBI
Google Scholar

[67] View Article

[68] PubMed/NCBI

[69] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Description of the data

Quality control filters

LD measure

SV based on short-range LD

Correction for multiple testing

Principal components analysis (PCA)

Results

MAF distribution

Extent of LD

Estimation of SVs based on short-range LD

Discussion

Conclusion

Supporting information

S1_File. MAF Distribution and Minor Allele Frequency distribution.

S2_File. Average minor allele frequencies (MAF) per population in the study.

S3_File. Total average of r2 per population and LD decay using 5 kb bins.

S4_Table. Table of Number of SVs per chromosome in each population.

S5_Table. Table of Genes involved in Structural Variations.

S6_Table. Table of Structural Variaton Regions.

Acknowledgments

References

S3_File. Total average of r² per population and LD decay using 5 kb bins.