Fig 1.
The relative copy number (A) and size in million base pairs (Mb) (B) of families and superfamilies shown by the size of the rectangle. Superfamilies are denoted by color, and each family is bounded by gray lines within the superfamily. Superfamily names begin with a two letter code: ‘DT’ belong to the order of Terminal Inverted Repeat transposons, ‘DH’ refers to the order Helitron, ‘RL’ belong to the order Long Terminal Repeat retrotransposons, and ‘RI’ and ‘RS’ are nonLTR retrotransposons (LINEs and SINEs). Superfamily names beginning with ‘D’ are Class II DNA transposons, while those starting with ‘R’ are Class I retrotransposons.
Fig 2.
Characteristics of each superfamily of TE.
Superfamilies are classified into orders and classes, as shown at the bottom of the plot. (A-D) Family characteristics of each of the most numerous 10 families (with ≥ 10 copies) of each superfamily. Family names are listed in S1 Table. (A) TE length, (B) Distance to the closest gene, (C) proportion of TE copies found within another TE, and (D) TE age. In (A, B, & D) family medians are shown as points, with lines representing upper to lower quartiles. Superfamilies are shown as colored rectangles, where the dotted line reflects the median and box boundaries reflect lower and upper quartiles. In (C), families are shown as points and superfamily proportions as a barplot.
Table 1.
TE superfamilies in the maize genome.
Fig 3.
Chromosomal distribution of superfamilies and example families.
Counts of number of insertions in 1 Mb bins across chromosome 1 for (A) TE superfamilies and (B-E) the 5 families with highest copy number in each of four superfamilies, DHH (B), DTT (C), RLC (D), and RLG (E). Family names are listed in S1 Table.
Fig 4.
Age distribution of (A) superfamilies and (B-E) five largest families of (B) DHH, (C) DTA, (D) RLC, and (E) RLG.
Family names are listed in S1 Table. Counts of number of insertions in 10,000 year bins are shown. As they are rare, TE copies older than 1.1 million years are not shown. Ages are calculated with terminal branch lengths for all TEs except LTR retrotransposons, which are calculated with LTR-LTR divergence. See S5 Fig for LTR retrotransposon plots with terminal branch length ages.
Fig 5.
TEs code for proteins that are expressed, and expression varies by family across tissues.
In A-D, families are in the same order as presented in Fig 2, and listed in S1 Table. (A) Length of longest open reading frame within the TE, measured in amino acids. (B) Proportion of family with all proteins required for transposition. (C) log10 median TE expression across tissues, per-TE copy. (D) Tissue specificity of TE expression τ, with low values representing constitutive expression, and high values representing tissue specificity. (E) Per copy TE expression across tissues (RPM, reads per million), clustered by expression level. Families with greater than 10 copies are shown in rows, and tissues in columns.
Fig 6.
TEs and their flanking sequences are regulated by their host genome.
Families are presented in the same order as in Fig 2, and listed in S1 Table. CG methylation in TE (A), CHG methylation in TE (B), and CHH methylation in TE (C). CG methylation in 2 kb flanking the TE (D), CHG methylation in 2 kb flanking the TE (E), and CHH methylation in 2 kb flanking the TE (F). All methylation data from anther tissue, other tissues shown in S8 Fig. In (A—C), superfamily median is shown as a dashed line with the interquartile range in the shaded box. In (D—E), median methylation for regions up to 2 kb up and downstream of the TE are plotted for each family, with family size denoted by line transparency (darker lines are larger families).
Fig 7.
Features ranked by importance.
(A) Reduction in mean squared error gained by including a feature in a model, summarized into categories. (B) Correlations of each of the top 30 features with age for the five largest families in each superfamily. Features labeled to the right in (C). Size of point is scaled by correlation coefficient, and color by whether the relationship is positive (blue) or negative (red). Rows without values are features that are fixed within a family, thus have no variance. (C) Reduction in mean squared error for top 30 individual features. Colors match categories in (A). (D) Raw correlations between age and segregating sites per base pair (E) Model predictions for the relationship between age and segregating sites per base pair (F) Raw correlations between age and anther CHH methylation of the TE (G) Model predictions for the relationship between age and anther CHH methylation of the TE.