Combining pangenomics and population genetics finds chromosomal re-arrangements, diversified chromosome segments, copy number variations and transposon polymorphisms in wheat and rye powdery mildew

doi:10.1371/journal.ppat.1013196

Fig 1.

Worldwide dataset of wheat powdery mildew shows varying levels of genomic diversity and clustering populations.

(A) A map with origin of the mildew isolates used in this study. Map outlines were produced with the R packages rnaturalearth and ggplot2, based on Natural Earth data (https://www.naturalearthdata.com/downloads/50m-cultural-vectors/). (B) PCA of the SNPs of 387 isolates of B. g. tritici. Isolates from tetraploid wheat (B.g. dicocci), rye (B.g. secalis) and rye/wheat hybrid Bgtl_THUN_12 were excluded to improve resolution of the PCA for the majority of isolates. (C) Admixture plot of the 399 isolates for K = 13. (D) Nucleotide diversity of the populations with at least eight isolates, based on eight randomly chosen isolates (see Table B in S2 Appendix) and analysing sequence windows of 10 kb. P-values represent the pairwise comparisons between neighboring pairs, while the stars show how significantly different the values are between each population and the ISR powdery mildew population.

More »

Expand

Fig 2.

Chromosomal synteny of the 11 B. graminis isolates in the pangenome.

The lines in colour indicate genomic regions of Minimum Alignment length of 100 kb. Each colour represents a chromosome from 1 to 11. The asterisks indicate the isolates from populations that were shown to be hybrids. The different shades of black and grey represent the different formae speciales. The red blocks indicate centromeres and the light blue ones effector genes.

More »

Expand

Fig 3.

Examples of chromosome collinearity and re-arrangements within B. g. tritici.

(A) Example of synteny and translocations in the genomes of three isolates (two coming from the parental populations, USA and CHN, and the middle genome coming from an isolate of the hybrid population of JPN powdery mildew. (B) Example of synteny in the genomes of two isolates from the same population (CHN powdery mildew). The petrol lines on the chromosomes indicate genes. The triangle and the “NNNs” show the position of a sequence gap.

More »

Expand

Fig 4.

Analysis of structural variation and sequence diversity on B. graminis chr-11.

(A) Synteny in chr-11 for the hybrid B. g. triticale genome and its two-parent f. sp. genomes. The petrol lines on the chromosomes indicate genes. The colour of the names of the isolates corresponds to the forma specialis (black = tritici, darker grey = triticale, lighter grey = secalis). (B) Sequence conservation along chromosome 11. The heat map shows comparisons of B. graminis isolates with the reference isolate Bgt_CHE_96224 in 50 kb windows. Bgtl_THUN_12 and Bgd_ISR_211 were excluded from this analysis. (C) The red line shows the average sequence conservation for all 50 kb windows among B.g. tritici isolates, while the gray line shows sequence conservation between Bgt_CHE_96224 and Bgs_1459. Note that sequence conservation is generally lower in the right chromosome arm. (D) Ratio of sequence conservation between B.g. tritici isolates and Bgs_1459. (E-H) Dot plot comparisons of chr-11 from B. graminis isolates. B.g. secalis and B.g. dicocci both have a segment at the left end (Chr11_LE, blue box) which is absent in B.g. tritici. Furthermore, a terminal segment of ~100 kb is absent in Bgt_CHN_17_40, suggesting it contains accessory genes.

More »

Expand

Fig 5.

Analysis of sequence variation on chromosome 11.

(A) Comparison of pangenome graphs for chromosomes 1 and 11. Approximate location of centromeres are indicated. Note that the right arm of chromosome 11 contains a higher number of nodes and edges than the left arm and than chromosome 1 overall. Additionally, both ends of chromosome 11 contain multiple edges reflecting the presence/absence of large segments between isolates (see also Fig 4). (B) Copy number variation (CNV) along chromosomes 1 and 11. The x-axis indicates the position while the y-axis shows the number of B. graminis isolates that show copy number variation in a given gene. Circles indicate multiple copies while triangles indicate deletions. Genes were separated into effectors (red) and non-effector genes (gray). (C) Number of identified protein variants in 399 B. graminis isolates per gene, normalized for gene length. Data sets for effector and non-effector genes are shown separately, so are data for the left and right arms of chromosome 11. Boxes indicate the inter-quartile range (IQR) with the central line indicating the median and whiskers indicating the minimum and maximum without outliers, respectively. Outliers were defined as minimum – 1.5*IQR and maximum + 1.5*IQR, respectively. Whiskers indicate the standard error.

More »

Expand

Fig 6.

Duplications/Deletions of genes using the populations and the pangenome.

(A) Number of gene duplications per population. The asterisks signify statistical significance of difference between other populations and the TUR powdery mildew population. (B) PCA of effector gene presence/absence summary per isolate, including the B.g. secalis isolates (left) and without them (right). (C) Estimate of the size of the core genome of non-effector genes in B. graminis. A sample curve was generated by adding isolates incrementally and counting the number of genes conserved between all of them. Because isolates differ in their levels of similarity, all possible permutations were performed. Two sample curves were generated, one where the B.g. secalis isolate was excluded (red) and one where it was included (blue). The dashed purple lines indicate the extrapolated core genome size, assuming that the sample curve was asymptotic. (D) The same analysis for candidate effector genes. (E) Summary bar plot with sizes of core and accessory genomes for the isolates analyzed. Numbers of accessory effector and non-effector genes are indicated above and to the right of the bars, respectively. (F) Expansion or shrinkage of effector families in the pangenome. Number of effector genes per family in all 11 B. graminis isolates. The code “Bgt_” is omitted from all the B.g. tritici isolates here.

More »

Expand

Fig 7.

Example for a gene presence/absence polymorphism caused by unequal homologous recombination.

(A) The Dot-plot alignment of the region carrying the polymorphism. Gene and TE annotation for the genome that carries additional sequence is shown above the plot with transcriptional orientation indicated by arrow heads. For easier visibility, the RSX_Yak TEs (red and blue) that served as recombination template are connected by shaded areas to the corresponding point in the plot. (B) Schematic of the molecular mechanism by which gene BgtE-20036 was deleted through unequal crossing-over. (C) Multiple alignment of the sequence that served as template. The recombinant sequence (which is found in the isolate that carries the deletion) is Rec_TE. Note that the Rec_TE sequence shares diagnostic SNPs with RSX_Yak-1 (red) in the 5’ region and with the RSX_Yak-2 (blue) in the 3’ region. The region where the recombination must have occurred is indicated.

More »

Expand

Fig 8.

TE analyses in wheat powdery mildew.

(A) Annotated copies of TE types in the different isolates of the pangenome. (B) Number of all TEs found in various distance ranges downstream and upstream of genes for all the isolates of the pangenome, see legend of Fig 8A. The bars with the dotted lines refer to distance from effector genes, while the ones without any lines refer to the non-effector genes. (C) Boxplot of the number of TIPs per isolate for populations with eight or more isolates (isolates with less than 2000 TIPs are shown, extreme outliers were excluded, because they were interpreted as technical artifacts). The “X” symbol refers to the population that the statistical comparisons have been made, were the “ns”, “*”, “**”, “***”, and “****” show significant difference with said population. (D) Analyses of TE insertions (TIPs) in the different wheat powdery mildew populations normalized for sample size for each population. Depicted are the 17 TE families that showed most insertion polymorphisms in all populations. (E) Phylogenetic network of the alignment of reverse transcriptase (RT) proteins for most of the LINE TE elements, using consensus sequences from isolate Bgt_CHE_96224. The TEs in the circle include the three LINEs with the most TIPs in the populations tested.

More »

Expand