Fig 1.
Inversion features and genotyping strategies.
Horizontal lines represent the standard (Std, top) and inverted (Inv, bottom) arrangements. Genes are depicted as grey arrows indicating direction of transcription with the disrupted gene shown in red. Blue vertical arrows mark the two inversion breakpoints (BP1 and BP2). The black bar below the Std chromosome indicates the sequence included in the analyzed fosmid containing BP1 in the inverted orientation. Primers used for PCR genotyping are shown in the corresponding breakpoint of each arrangement as small black arrows. The three tag SNP alleles are shown next to the corresponding chromosome. Sanger chromatograms show the sequence of inversion breakpoints in the Inv chromosome.
Table 1.
HsInv0379 genotyping and frequency in different populations.
Number of individuals genotyped by PCR across inversion breakpoints, tag SNPs in high linkage disequilibrium with the inversion, and sequence reads spanning inversion breakpoints (BreakSeq). The number of unrelated individuals genotyped is also given together with genotype counts and allelic frequencies for each analyzed population.
Fig 2.
Geographical distribution of HsInv0379 inversion.
Inversion frequency in 27 different populations corresponding to 2,667 unrelated individuals analyzed by PCR, tag SNPs and reads spanning inversion breakpoints are shown. Blue corresponds to Std and red to Inv arrangement. Inversion frequencies and the origin of the different populations are listed in Table 1.
Fig 3.
ZNF257 gene structure and expression changes in inversion carriers.
A. The two most reliable transcripts of gene ZNF257 are represented as dark and light-colored boxes corresponding to coding and non-coding exons, with their sizes and encoded proteins shown to the right. The longer transcript includes an extra 121-bp alternative non-coding exon in its 5’ UTR and in theory encodes a 32-aa shorter protein because of the shift in the methionine (Met) used as translation initiation from the three last nucleotides of the first exon to the third exon, common to both isoforms. The position of BP2 is indicated by a vertical arrow. A diagram of the longest protein domains (UniProt Q9Y2Q1) is represented by boxes below: green for the Krüppel-associated box (KRAB) domain, yellow for C2H2-type zinc fingers, and grey for degenerate zinc finger structures. B. Box plots of qPCR expression levels for ZNF257 and the seven genes with significant or marginally significant differences between LCLs of 15 Std/Std (dark blue) and 11 Std/Inv (light blue) individuals. For each gene, expression values have been normalized by the average of Std/Std individuals, with every sample represented by grey points and outliers by open circles. Horizontal black lines within each box indicate median values. *, P < 0.05; ***, P < 0.001.
Table 2.
Analysis of RNA-Seq gene-expression changes between Std/Std and Std/Inv individuals by qPCR.
For each gene, fold change and differential expression p-value (P) between inversion genotypes are given both for only the eight samples also analyzed by RNA-Seq and for the total 26 samples, with the RNA-Seq results shown for comparison (a line separating genes with FDR < 0.05 from those with lower significance). R2 represents the percentage of gene-expression variation explained by the inversion genotype according to an additive model, and R the Pearson's correlation between the expression levels of ZNF257 and each analyzed gene.
Fig 4.
Analysis of the new fusion transcript in HsInv0379 breakpoint.
A. RNA-Seq reads mapped to an AC construct corresponding to BP1 (vertical red arrow) in an inverted chromosome reveal a fusion transcript present only in the four Std/Inv individuals (bottom) but not in the four Std/Std (top). Boxes highlight the new exon (yellow) and ZNF257 first exon (green). The structure of the fusion transcript reconstructed by Cufflinks [38] is shown below the RNA-Seq profiles. Small arrows indicate the approximate position of the primers used to validate the transcript. The coordinates of the new exon in the HG19 reference genome are also indicated. B. Analysis of fusion transcript expression by RT-PCR in several individuals. C. Quantification of the fusion transcript levels by qPCR in 15 Std/Std, 11 Std/Inv, and 1 Inv/Inv individuals.
Fig 5.
Recombination and nucleotide diversity patterns in the inversion region.
A. Nucleotide diversity (π). B. Tajima’s D test statistic. C. Cumulative genetic length in 4Ner units. D. Fst values between Std and Inv chromosomes. HG19 assembly coordinates are shown in the X-axis, including the inverted region (marked by vertical dashed lines) and 1.8 Mb of flanking sequence at each side. Blue lines correspond to East Asian Std chromosomes and red lines to Inv chromosomes. All values were estimated in non-overlapping windows of 80 SNPs each considering the 285 phased East Asian individuals, except the recombination rate that was calculated between consecutive SNPs in the 24 Inv chromosomes and 24 random East Asian Std chromosomes. Tajima’s D statistic was Z-transformed to counteract the loss of low frequency SNPs during the phasing process.
Fig 6.
Evolutionary history of HsInv0379 inversion.
Distribution of expected frequencies for a mutation arising 43,450 years ago under a model of human demography and different evolutionary scenarios according to forward-in-time simulations. Violin plots show the allele frequencies in 1,000 simulations for each selection coefficient (Nes between -30 and +10). The vertical solid line corresponds to the average frequency of the inversion in East Asia (4.73%) and dotted lines mark the range of frequencies observed in actual East Asian populations (2.4–8%). The likelihood of each Nes value (probability to obtain the observed frequencies) is shown at the right of the graph.