Figures
Abstract
An in silico study of Aspergillus flavus genome stability uncovered significant variations in both coding and non-coding regions. The non-coding insertions uniformly consisted of AT-rich sequences that are evolutionarily maintained, albeit distributed at widely different sites in an array of A. flavus strains. A survey of ≥ 2kb AT-rich elements (AT ≥ 70%; ATEs) in non-centromeric regions uncovered two major categories of ATEs. The first category is composed of homologous insertions at ectopic, non-allelic sites that contain homology to transposable elements (TEs; Classes B, C, D, and E). Strains differed significantly in frequency, position, and TE type, but displayed a common enrichment in subtelomeric regions. The TEs were heavily mutated, with patterns consistent with the ancestral activity of repeat-induced point mutations (RIP). The second category consists of a conserved set of novel subtelomeric ATE repeats (Classes A, G, G, H, I and J) which lack discernible TEs and, unlike TEs, display a constant polarity relative to the telomere. Members of one of these classes are derivatives of a progenitor ATE that is predicted to have undergone extensive homologous recombination during evolution. A third category of ATEs consists of ~100 kb regions at each centromere. Centromeric ATEs and TE clusters within these centromeres display a high level of sequence identity between strains. These studies suggest that transposition and RIP are forces in the evolution of subtelomeric and centromeric structure and function.
Citation: Lustig AJ (2023) Investigating the origin of subtelomeric and centromeric AT-rich elements in Aspergillus flavus. PLoS ONE 18(2): e0279148. https://doi.org/10.1371/journal.pone.0279148
Editor: Cecile Fairhead, Institut de Genetique et Microbiologie, FRANCE
Received: November 30, 2022; Accepted: January 30, 2023; Published: February 9, 2023
Copyright: © 2023 Arthur J. Lustig. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information Files.
Funding: AJL was funded in part by USDA NCA 58-6054-0-014. Arthur J. Lustig. The funders played no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Aspergillus flavus is an opportunistic pathogen in plants and humans. Its 37.5 Mb genome is separated into 8 chromosomes ranging in size from 6.5 Mb (chromosome 1) to 3.25 Mb (chromosome 8). The organization of Aspergillus genomes is unusual. Like many fungi, the genome is relatively compact containing approximately 13,500 genes in its genome [1] (on average, 1 gene per 3 kb of DNA). However, the genome is uniquely organized into gene clusters encoding proteins involved in the production of secondary metabolites, such as the toxin and carcinogen aflatoxin, as well as many medically and biologically important secondary metabolites [2–5]. While A. flavus gene organization has been well studied, the characteristics of genome stability in Aspergillus are poorly understood. The presence of well-established and characterized strains of A. flavus makes it a unique vehicle for examining factors governing evolutionary challenges to genomic stability.
Transposable elements (TEs) are a significant force in genetic instability in prokaryotic and eukaryotic organisms. TEs in filamentous fungi include both Class 1 "copy and paste" LTR and non-LTR retrotransposition, and Class 2 "cut and paste" DNA-transposon classes. These transposable elements can also promote ectopic recombination [6]. In filamentous fungi, TEs contribute to non-allelic genomic rearrangement and genome restructuring. In A. flavus, active transposon identification has been limited to a single LINE element and a gypsy-related transposon [7–11]. Similarly, only a single functional Mariner class element, Crawler, has been found in A. oryzae [12], an ecotype of A. flavus [9]. A handful of additional active TEs have been identified in other Aspergillus species [13–16].
Additional intact and mutated TEs have been characterized based on comparisons between the A. oryzae and A. nidulans sequences [8]. Among TEs found in A. oryzae are sequences homologous to the DNA Mariner class of TEs [17–19], Gypsy class LTR-retrotransposons [20–22] and non-LTR retrotransposons [23]. LTR-retrotransposons have also been inferred initially by the presence of unique solo LTRs [11,24,25]. The Mariner and non-LTR transposons are highly mutated, inactive, and are likely to be evolutionary remnants of ancestral transposition, while Gypsy candidates retain significant coding capacity.
Repeat-induced point mutation (RIP) was initially identified as a C>T transition and cytosine methylation system during Neurospora pre-meiotic stages [26]. This process is responsible for the inactivation and silencing of fungal repeated elements, including transposons [27]. C to T transitions on either strand of the repeat lead to an elevated A+T bias within Neurospora centromeres and satellite sequences. Repeated transposition into the same genomic regions, possibly coupled with recombination among these regions, can result in an expansion of these A+T-rich regions [28]. Such regions are thought to be drivers of sequence divergence in most filamentous fungi [28–30].
Subtelomeric chromatin domains are often unique. Telomeres in many organisms, including the yeasts, nucleate the formation of densely packed and specialized nucleosomal regions that form heterochromatin [31]. Genes located within this region in Schizosaccharomyces pombe and Saccharomyces cerevisiae, as well as the filamentous fungi Neurospora crassa [32] and Aspergillus nidulans [33], are subjected to a form of transcriptional repression, termed telomere silencing. These position effects are controlled by a discrete set of proteins, the silencer information regulators, involved in histone modification and heterochromatic “spreading” [34].
Recombination between repeated elements, including transposable elements, is a well-known source of rearrangement and variation during mitosis and meiosis. Some of these events can lead to dicentric rearrangements and subsequent breakage-fusion-bridge cycles. Furthermore, recombination among intra-chromatid repeated sequences can result in genome rearrangement [6,35]. Depending upon their orientation, sister-chromatid recombination results in the contraction, expansion, or inversion of repeated elements. Recombination between homologs can also generate copy number variations on each diploid homolog, a factor important in human disease [36].
Subtelomeric recombination can also play critical roles in cellular function. One example is the requirement for recombination of genes to subtelomeric regions for antigenic switching and pathogenesis in Trypanosoma [37]. Another is a role for subtelomeric recombination in the fidelity of meiotic segregation, as suggested by both cytological and molecular data [38].
Centromeres, the sites of kinetochore attachment, are responsible for proper disjunction of chromosomes in meiosis and mitosis. While the function of the centromere is strongly maintained, the sequence composition is strikingly variable in different organisms [39]. Centromere repeats in higher eukaryotes are associated with long tracts of repeated satellite DNA that are involved in centromere function and chromatid cohesion [40]. In filamentous fungi, long 30–100 kb AT-rich sequences are present in centromeric regions that are at least in part the consequence of the insertion of TEs followed by RIP [41].
Here I present an investigation of genome rearrangement in Aspergillus flavus, based initially on the characterization of large indels between genomes of multiple strains. A large fraction of inserted alleles consists of, or includes, a region of ≥ 2 kb AT-rich sequences, defined as regions of ≥70% A+T. These non-coding AT elements (ATEs) were studied further in three well-sequenced strains (NRRL 3357, CA14 and SU-16), with each strain having different ATE sizes, chromosomal positions, and degrees of repetition. Based on the comparison of these patterns in different strains, I identified three categories of AT elements. Two of these ATE categories are hot spots for transposable element integration, inactivation, and elimination in either 1) subtelomeric-enriched regions or 2) centromeric domains. ATEs in centromeric regions are highly conserved in position and identity among homologs. I also found a third novel type of subtelomeric and telomeric-adjacent ATEs, lacking detectable TEs, that undergoes frequent homologous recombination during evolution. I discuss data that support the generation of novel centromeric and subtelomeric functions from mutated TE-generated ATEs during A. flavus evolution.
Materials and methods
Strains
The 9 strains used for my initial analysis were NRRL 3357 (GCA 009017415.1, GCA 014117465.1), CA14 (GCA 014784225.2), AF36 (GCA 012897275.1), AF13 (GCA 014117485.1), K49 (GCA 012896705.1), K54A (GCA 012896555.1), Afla-Guard (GCA 012896875.1), Tox4 (GCA 012896145.1), and A9 (GCA 012895975.1) [42–46]. The data were mined from the most recent sequence version of each strain in the NCBI database. Data from four additional strains, NRRL 2999 (GCA 012897115.1), A1 (GCA 012896995.1), VCG1 (GCA 012896415.1), and VCG4 (GCA 012896275.1) [45], were generated by BLAST homology searches. In addition, SU-16 (GCA 009856665.1) was used for subsequent ATE analyses. The A. oryzae strain KBP3 (accession CP031426) and A. sojae strain SMF134 (GCA 008274985.1) [47] were also used to study the evolutionary relationships among H class elements. The CA14 assembly was reconstructed by Canu v2.2 [48] to generate the entire telomere-telomere chromosomal sequence. The resulting sequence differed from the previous assembly only by the presence of extensions at telomeres. The Canu assembly was previously successful in the generation of a complete telomere-to-telomere chromosomal sequence of NRRL 3357.
Identification of genome wide large indels
Chromosomal alignments and images were generated using Lasergene MegAlign Pro 7.2 software (DNASTAR, Inc). The MegAlign program uses a Mauve algorithm [49] to analyze alignment between homologous chromosomes in pairs of strains, excluding centromeric sequences that are only partially sequenced in many strains. Insertions larger than 6 kb were identified and the sequence exported into BLAST 2.0 and filtered using restrictive criteria (97% identity, 50% coverage) to screen for insertions in allelic sequences. Four strains used in these studies did not contain scaffold assembly gaps: NRRL 3357, AF13, CA14 and SU-16. The remaining strains required nucleotide sequence analysis of the putative insertion junctions in each pair analyzed. Putative indels containing scaffold gaps that surround the junctions were excluded from the analysis. Since scaffold gaps are more common in AT-rich regions, this analysis may underestimate the number of indels. The AT content across each insertion was determined using SnapGene (Dotmatics, Inc).
Quantitative analysis of large insertions
I used the two complete telomere-to-telomere sequences from strains NRRL 3557 and CA14 to assay the frequency of large insertions (≥ 6kb), walking across each chromosome using the MegAlign procedure. Each insertion was selected and exported into a BLAST 2.0 program, filtering for regions with 90–100% identity. Phylogenetic trees were determined by Fast Minimum Evolution with a maximum sequence difference of 0.2–0.3, as noted. CA14 has a chromosome 3/5 translocation [43]. For simplicity, I report CA14 positions as the corresponding coordinates in the NRRL 3357 genome.
AT element identification
I used three strains having the most complete sequences, NRRL 3557, CA14, and SU-16, to assay the frequency of AT elements (≥ 2kb), proceeding from left to right arm across each chromosome, using the MegAlign procedure. For AT-rich elements, BLAST 2.0 parameters were filtered, using the following restrictions: a) identity scores ≥ 80%, b) significance <e-100, c) gap scores of ≤ 2%, d) bit scores ≥ 1000, and normally no less than 25% of a 100% homology score.
This analysis eliminates weak similarities in favor of a restricted subset of the most significant identities. As a control for potential sequencing errors, I used a second independently sequenced NRRL 3357 [46] to confirm results. No differences were observed between the two sequenced NRRL 3357 genomic sequences.
Importantly, the telomere repeats of SU-16 sequences are present only on chromosomes 4L, 4R, 5L, 5R, and 6L ends. AF13, used for studies of the H class repeats, has intact telomeres at 3L, 4L, 4R, 5L, 6R, 7L, 8L and 8R. Since the left and right orientations of the chromosomes differed in the NCBI database, the arm designations are presented relative to the NRRL 3357 orientation.
Centromere AT-rich sequence analysis
The AT-rich sequence at each NRRL 3357 centromere was separated into two halves and analyzed by BLAST 2.0 against strains SU-16 and CA14. The centromeres in each of the three strains have been fully sequenced. To estimate the relative stability of centromeric sequences on homologous chromosomes, I determined the highest BLAST score between pairs of homologs, recorded both the observed identity and the associated gap score, and calculated the fraction of centromeric sequence covered by the two regions of homology.
Censor/Repbase analysis
Analyses of TE-homologous sequences in both ATEs and centromeres were conducted using the Censor program, filtering for fungal species [50]. I restricted the selection of homologies from the Censor read-out using the following parameters: a) mm/Ts (total mismatch mutations/transition mutations) of < 2.2 and b) a bit-score value of > 1000. Both partial and full-length transposable elements fulfilling these criteria were identified and further analyzed. This analysis was extended through BLAST searches for homology to AT-rich derivatives of Gypsy 1, Gypsy 2, and Gypsy 4, first identified by Censor.
RIP modeling of Gypsy element mutation into ATEs
The sequence divergence and high abundance of transition mutations between target and query sequences confounded the identification of diagnostic dinucleotide pairs in RIP [27]. Similarly, programs such as RIPper [51], used to identify large regions of sequence undergoing RIP, were not informative, due to the rather short size of the regions analyzed and the apparent evolutionary distance from the original RIP activity. I therefore modeled a situation to approximate evolutionary change by assaying the ability of RIP to convert a consensus Gypsy element, as a putative evolutionary progenitor (P), into the observed ATE sequence (A). I considered two expectations for RIP process. First, “P” and “A” differences induced by RIP should consist predominantly of transition mutations, i.e., exhibit a high ratio of transition/total mutations. Thus, the reciprocal of this value, which is a standard Censor output measurement, should be close to 1. Second, the transition mutations noted above should predominantly consist of a cytosine residue in “P” mutating to a thymidine (and the corresponding guanine to adenine mutation on the complementary strand to form “A”. The directionality of RIP-induced transition mutations from putative progenitor to ATE Gypsy homologies was estimated by [(pG> aA) + (pC > aT)]/ [(pG > aA) + (pC> aT) + (aC> pA) + aG> pA]. An unbiased direction of mutation would yield a score of 0.5, while a complete G>A and C>T set of transitions would generate a value of 1.
Measurement of Gypsy element homology at centromeric and non-centromeric locations
The levels of identity of Gypsy 1, Gypsy 2, and Gypsy 4 elements were determined by BLAST analysis using the Gypsy 1 homologs of NRRL 3357 ATEs 1–6, NRRL 3357 ATE 2–2, and CA14 ATE 3–2; the Gypsy 2 homologs of NRRL3357 ATE 1–2; and the Gypsy 4 homologs of NRRL 3357 ATE 3–1. Both identity and gap levels were recorded, and the overall mean level values and standard deviations were calculated. In the case of centromere Gypsy repeats, I determined the identity among conserved clusters of elements at similar positions on chromosome homologs in each strain. When the Gypsy element is not present within a cluster, I identified the highest scoring homology within the same centromere. Identities between centromeric non-clustered Gypsy copies were lower than the threshold used for the identification of ATEs, and they include percentages as low as 79% and gapped regions as high as 9%.
Results
Genome variations measured through indel analysis in Aspergillus flavus
The repeated elements within eukaryotes reflect the activities of both recombination and transposition. In A. flavus, there is evidence for recombination and genome rearrangement within subtelomeric regions close to the 120 bp of TTAGGGTCAACA telomere repeats [52,53]. However, little is known about the dispersion of transposable and other repeated elements that contribute to genomic stability through the evolution and divergence of this species. I approached this topic through an in silico analysis of unique and repeated element stability in A. flavus chromosomes, taking advantage of the many genome sequences available in different strains.
As a first approach to this problem, I identified regions of homology between the chromosomes of 10 sequenced Aspergillus flavus strains present at allelic positions in some strains but absent in others. I use the term "large indels" to operationally describe these regions. I conducted a pairwise visual assessment of homologous chromosomes between different strains using the Mauve alignment algorithm of MegAlign Pro 7.2. I sought indels of 6 kb or greater as an initial qualitative screen. I verified and extended the MegAlign results by selecting the region of the putative insertion and exporting the sequence directly into BLAST 2.0. The BLAST search results confirmed the insertions identified by the pairwise analysis. This search also identified the presence or absence of sequences in additional strains not initially analyzed. This analysis excluded centromeric regions that were poorly sequenced in many of the strains. I identified 54 unique insertions, ranging in size from approximately 6.1 to 797 kb at multiple sites at each chromosome [Fig 1 (drawn relative to NRRL 3357 chromosomes); S1 Table].
Non-AT (black) and AT (red) position of deletions or insertions ≥ 6 kb in the 13 strains analyzed, converted to NRRL 3357 coordinates. The data is derived from the S1 Table. Scales in kb are drawn above each chromosome image. Note that the right arm of chromosome 7 is not presented in full since the repeated rDNA genes have been refractory to DNA sequencing. One insertion was previously identified [46].
AT-rich insertions form a significant fraction of insertions in A. flavus
Base composition analysis of the indels revealed the presence of 1500 bp to 31,125 bp regions containing high AT base compositions in 29% of the 54 large insertions scattered throughout the genome (Fig 1, red arrows; S1 Table, red font; S2 Table). The length and base composition of tracts within the insertions are displayed in S3 Table.
I measured quantitatively the frequency of insertions containing either AT elements or non-AT elements in strains NRRL 3357 and CA14. Since the genomes of both strains have a complete sequence of each chromosome (except for chromosome 7R, which contains reiterated rDNA repeats), I performed a telomere-to-telomere analysis of large indels by MegAlign chromosomal walking. I identified large insertions on each chromosome with an equal distribution in both strains. Of the 39 insertions identified, 15 (38%) contained extended regions of AT-rich DNA (S4 and S5 Tables). These AT insertions were enriched within 400 kb of the telomere (8/15; 53%) (Fig 2). Continuous AT-rich sequences ranged in size from 400 bp to 14 kb (S5 Table). All non-AT insertions contained coding regions, as annotated in NRRL 3357 and CA14, that were absent from AT-rich sequences.
Indel quantification in CA14 and NRRL 3357 is summarized in this map, drawn as insertions or deletions relative to NRRL 3357. AT indels are shown as light (deletions, below) or dark red (insertions, above) arrows, and non-AT indels are shown as blue (deletions, below) or black (insertions, above) arrows. CEN 3 and CEN5 are transposed due to the translocation between chromosomes 3 and 5 in CA14. In comparisons of NRRL 3357 and CA14, I used the NRRL 3357 nomenclature for simplicity. Scales in kb are drawn above each chromosome image.
Phylogenetic comparisons reveal dispersion of and selection for AT insertions
To probe the evolutionary characteristics of AT and non-AT insertions, I conducted a phylogenetic comparison of insertions with the tree rooted in either the CA14 or the NRRL3357 strain, depending on which exhibited the inserted sequence [Table 1, S1 and S2 Figs (Figure examples are both rooted in CA14)]. Analysis of two parameters, the insertion dispersion and loss rates, led to two significant findings.
First, non-AT insertions from the genomes of related strains were most often located (in 20/24 cases) at allelic positions in non-AT insertions, when compared to either the NRRL 3357 or CA14. Only 4 out of the 24 non-AT insertions had homology on other chromosomes. In striking contrast, all AT insertions had homologs on non-homologous chromosomes.
Second, I examined the number of strains that lost either the AT or non-AT insertion. The non-AT insertions were absent (at 85% homology) in an average of 5.6 strains, and varied widely from 0 to 12. The AT insertions exhibited an average number of 6.7 strains that had lost the element, ranging from 4–8 strains among the various insertions. These data suggest that while non-allelic dispersion took place primarily in AT insertions, both insertion classes were maintained in the population at similar frequencies. This finding suggests an evolutionary constraint against the loss of the long AT-rich tracts.
AT elements are clustered in subtelomeric domains
To gain a better understanding of the organization of AT elements (ATEs) independent of indel identification, I analyzed all ATEs that were at least a) 2kb in length and b) 70% A+T, and compared results found in NRRL 3357, CA14, and SU-16 (Table 2A, 2B and 2C). I excluded from this analysis centromeric and pericentromeric sequences (within 30 kb of centromeric A+T-rich sequences). I found 31 ATEs in NRRL 3357, 27 ATEs in CA14, and 24 ATEs in SU-16. The majority of these were present within the 100 kb region adjacent to the telomere (termed subtelomeres here, as indicated by the yellow shaded entries in the Table) in the two strains that were fully sequenced from telomere to telomere (68% and 59% in NRRL 3357 and CA14, respectively). In NRRL 3357, 9 of the 31 ATEs were directly adjacent to telomeric tracts. AT elements were also enriched adjacent to the telomeres in CA14 (6 of 27 ATEs). Based on a comparison of both the numbers and coverage of ATEs within and outside of subtelomeric regions (excluding centromeric regions), I estimate an ATE subtelomeric enrichment of 26-fold in CA14 and 56-fold in NRRL 3357. Furthermore, although subtelomeres are only partially represented in SU-16, 25% of AT elements were nonetheless found in this domain. As expected, most of the insertions identified in the indel analysis between CA14 and NRRL 3357 (indicated by the hashtag in Table 2A and 2B) were also found in these more detailed analyses.
B. Approximate positions, lengths, AT contents, and classes of CA14 ATEs. C. Approximate positions, lengths, AT contents, and classes of SU-16 ATEs.
Conversely, most subtelomeric regions exhibited clustering of AT elements. Within the NRRL 3357 100 kb subtelomere, areas of at least 70% A+T and 2 kb in size were present at 12 of the 15 assayable ends but absent from both subtelomeres on chromosome 2 and the right arm of chromosome 6. In the revised CA14 sequence (see Materials and Methods), I found that most (9/15) subtelomeric regions contained ATEs. However, the ATEs in CA14 bore no discernable relationship to the ATEs structure in NRRL 3357.
Identification of multiple classes of homologous AT elements
I surveyed the homologies found between the ATEs within and among the three strains (Table 2). The homologies fell into twelve classes (A-L). Reanalysis of AT-element size in NRRL 3357 led to revised size ranges from 1.9 kb-21.2 kb. 29 of the 31 elements fall into several classes representing shared homology (Table 2A; Classes A-J, Class L), as determined by BLAST analysis, with two unique elements. AT elements in CA14 range in size from 2.0 kb-14.1 kb. 21 of the 26 ATEs fall into classes that share homology (Table 2B; Classes B-L) with five unique elements. AT-elements in SU-16 range in size from 2.0 kb-18.9 kb. Eighteen of the 25 ATEs fall into classes that share homology (Table 2C; Classes B-F, J-L), with four unique classes. These data revealed that ATE lengths and chromosomal positions varied significantly among the different genomes.
Each class shows a different degree of association with subtelomeres. Members of one such class (class H) were present adjacent to (or in one case within 3 kb of) telomeric-repeats, while others (Classes A, F, G, I and J) were present within the subtelomere but not always adjacent to the telomere. Members of a third group were enriched, but not present exclusively, at subtelomeres (Classes B-E). These latter classes were present at subtelomeres in 27/51 (52.9%) cases (compared to the predicted value of 4.44%, if distributed randomly in the genome). The full dataset of the homologies among each member of the A-L class is documented in S3 Fig.
ATEs contain homology to multiple classes of transposable elements
The homologies among ATEs could represent known classes of repeated elements. To study this further, I used the Censor program that examines potential repeated elements within the Repbase database [54]. This database characterizes candidates based on similarity, BLAST score, and the probability that mutations result from gradual evolution, by measuring the prevalence of transition versus transversion mutations. The most significant homologies found in this study represent a diverse series of transposable elements that were previously characterized as consensus sequences in the A. flavus ecotype, Aspergillus oryzae. I identified both Type 1 and Type 2 transposable element homologs (Tables 2A–2C and 3, S3 Fig). First, I identified an inactivated Type 2 Mariner 1 consensus sequence at multiple ATEs in all three strains. Indeed, the Mariner 1 class accounts for the repeated ATE classes C (S3C Fig). Two other members of the Mariner class (Mariner 5 and Mariner 2N-1) were found less frequently.
The major Type 1 LTR-retrotransposon class consists of Gypsy elements and includes both the retrotransposon coding regions and solo-LTR elements. Solo-LTRs are likely formed through homologous recombination [55], but are still indicative of previous retrotransposition. AT-rich mutated forms of the A. oryzae Gypsy elements (designated AO) were identified in A. flavus and subsequently used to screen for additional mutated forms of each Gypsy species by BLAST (designated AF). Using these methods, I identified homology to Gypsy 1/Gypsy 1 LTR in class D ATEs, to Gypsy 2/LTR1 in class B ATEs, and to Gypsy 4-AFLAV/Gypsy 4 LTR in class E ATEs, (Table 3, S3B, S3D and S3E Fig). In all cases, the presence of the relevant TE fully explains the class B-E homologies. BLAST analyses also revealed that LTR1 is an AT-rich mutated form of Gypsy 2 LTR and that AFLAV [10] is the Gypsy 4–1 homolog of A. flavus (data not shown). Another putative Type 1 retrotransposon class, LTR2, was defined solely by its LTR repeat. The composition of each class is summarized in Table 3. Each strain had comparable levels of Mariner and Gypsy homologs as measured by abundance and density (# copies/Mb ATE) (Table 4A and 4B). The level of conservation between Gypsy elements in the non-centromeric ATEs is shown in Table 4C.
A. Distribution of non-centromeric TE homologs. B. Density of TEs in centromeric and non-centromeric ATEs. C. Homology among Gypsy elements in ATEs.
The only non-LTR retrotransposon homologs were highly mutated TAD1 [56] species. Hence, retrotransposition events are evolutionarily frequent, with a different array predominating in each of the three strains.
ATE transposable elements show evidence of evolutionary RIP-like activity
The fungal repeat-induced point mutation (RIP) system inactivates functional repeated sequences, including TEs. This process involves the coupled activity of cytosine methylation and C>T transition mutations. RIP activity in some fungi can be influenced by the local context of the mutated cytosine residues and ultimately gives rise to AT-rich sequences. Hence, a region containing multiple TEs can mutate into an AT-rich region. I sought evolutionary footprints for this process to determine the nature of RIP activity in ATEs. RIPper and RIPCAL [51,57], two algorithms designed for measuring likely RIP frequencies over genomic distances, do not provide interpretable information for the relatively short and highly mutated sequences under analysis here. Hence, I modeled RIP-like activity by considering the preferential accumulation of C>T and G>A transition mutations necessary to convert a TE consensus sequence into the ATE TE homologs. For this analysis, I compared Gypsy 1, Gypsy 2, and Gypsy 4 TEs, since the reference database of these elements had intact coding regions, unlike many mutated consensus sequences present in Repbase. Two criteria were measured. First, the ratio of transitions/total mutations was measured by Censor analysis. The average Gypsy 1, Gypsy 4 and Gypsy 2 total mutations/transition mutations indicated that transversions consisted of less than 0.01% of all mutations (S6 Table). Gypsy LTRs gave similar results. Hence, all three of these classes showed clear evidence of RIP.
Second, using this model, I measured the fractions of all transitions that are the consequence of a unidirectional mutation of the consensus Gypsy elements to form the corresponding ATE Gypsy homologs (see Materials and Methods). A ratio close to 1 indicates a pattern consistent with the G to A and C to T mutational consequence of RIP. Indeed, values ranged from 0.8 to 1.0, arguing in favor of RIP activity over evolution.
Chromosome-specific centromeric ATEs are highly conserved
Filamentous fungi contain conserved features of the eukaryotic kinetochore. However, the centromeric sequences are embedded within large > 30 kb A+T-rich regions (~100 kb in A. flavus) [41]. Given the variation in the position and identity of non-centromeric ATEs in evolution, I asked whether centromeric AT-rich regions were similarly unstable (Table 5). I therefore tested the extent of contiguous homology between centromeres in NRRL 3357 and either CA14 or SU-16. I found the two longest stretches of contiguous homology in SU-16 and CA14 chromosome 1 centromeres, which are 95% identical to NRRL 3357 centromeres over 56% of the length. Centromeres of all chromosomes exhibited similar characteristics. In SU-16, the coverage ranged from 56% on chromosome 1 to 100% on chromosome 4, with identities to NRRL 3357 ranging from 88% to 99%. For CA14, the coverage ranged from 47% in chromosome 4 to 100% in chromosomes 2 and 6, with identities to NRRL 3357 varying between 92% and 99%. Given the inability to check the centromeric sequences against a sequenced repeat, I cannot rule out the presence of sequencing errors that may lower the degree of homology. No significant similarity was observed between the centromeric ATEs of non-homologs (data not shown).
Conservation of Gypsy elements on homologous chromosomes
Given the conservation of the centromeres in homologous chromosomes, I mapped the relative positions of Gypsy elements in the homologs. This analysis revealed a conserved clustering in the three major strains analyzed, although insertions and deletions of elements are still present (Figs 3 and S4). BLAST and sequence analysis revealed that Gypsy 2 and Gypsy 4 elements in these clusters were highly conserved in different strains on chromosomes 1, 4, 6 and 8 (Table 6A and 6B), S5A and S5B Fig). In contrast, centromeric Gypsy 2 and Gypsy 4 elements displayed less similarity at non-clustered unique sites (79–90%) and were often highly gapped (up to 9%), suggesting a pressure for centromeric homology early in evolution of the strains. Indeed, even Gypsy elements in non-centromeric ATEs had a generally higher average level of homology than between non-clustered centromeric Gypsy elements [compare Table 4C with Table 6A and 6B]. Gypsy 1 elements, on the other hand, were only present at one centromeric site, with two others found in pericentromeric regions (data not shown).
A. B. Homology of Gypsy 2 elements at clustered and non-clustered sites in chromosomes 4, 6, and 8.
The positions of the mutated Gypsy 2 and Gypsy 4 homologs within the centromeric domains (between the coordinates listed) were plotted and aligned among the three strains. The red, green, blue, and brown colors refer to regions of high homology as shown in Table 6. The white color refers to regions of lower homology (corresponding to the black color in Table 6A). The GC content of these regions is presented above each chromosomal plot. The coding regions flanking the centromere are depicted by the purple arrows. The sequence range displayed for each strain was edited to align centromere sequences.
Curiously, I found quantitative differences among some transposition classes in centromeric and non-centromeric within the ATE sequences (Table 4B). Specifically, there is a marked increase in the density of non-centromeric Gypsy 1 and Mariner 1 elements relative to centromeres, suggesting that non-centromeric ATEs either represent relative hot spots for these TEs, or TEs are more stable in a non-centromeric context.
Highly conserved subtelomeric repeats form unique classes of ATEs
Homologs to a consensus sequence of highly mutated Mariner 1 and Gypsy elements account for four of the repeat classes (Classes B-E). However, the homology between other termini is qualitatively different from these classes, apart from the failure to find an identifiable TE homolog. First, one class (Class H) lies directly adjacent to telomeric repeats (except in CA14 ATE 1–11, which lies 3 kb from the telomere). Five additional repeats (Classes A, F, G, I and J) are in the subtelomeric region, but not always adjacent to the telomere. None of these repeats are present within centromeric DNA.
Second, in each of these classes (Classes A, F-J), the polarity with respect to the termini is constant (Figs 4, S3A and S3F–S3J), resulting in inverted repeats on the two ends of a given chromosome. This differs from the typical random orientation of TE insertions.
Examples of the position and polarity of H class elements at the subtelomeres, including NRRL 3357 ATE 5–1 (left arm) and ATE 8–4 (right arm), AF13 Chr7 STLA (subtelomeric left arm telomere-adjacent), and CA14 ATE 1–11 (right arm), as defined by homology to NRRL 3357 ATE 4–1 (blue filled line). The nucleotide position of homology in ATE 4–1 homology is given in parentheses. The polarity in each case (shown by the arrowhead) is identical with respect to the telomere (drawn for simplicity as moving away from the telomere) regardless of chromosomal arm.
Third, Class H elements can be explained fully by the recombination of a putative progenitor with homology to NRRL 3357 ATE 4–1. This repeat is itself an imperfect direct repeat of an element most homologous to NRRL 3357 ATE 6–1. We observed multiple patterns of rearrangement of NRRL 3357 ATE 4–1 among the different strains, with some strains maintaining the ATE 4–1 structure (e.g., NRRL 3357 and AF13) and others deleted for part of the sequence (e.g., CA14, SU-16, and the A. oryzae strain KBP3) (Fig 5). The finding of ATE 4–1 in two normally separated strains suggests the conserved presence of ATE 4–1 in the evolutionary progenitor that subsequently underwent different patterns of loss and rearrangement during the formation of the various strains. Further supporting this view is the common presence of downstream TEs and homologies, both in the subtelomeric adjacent sequences of chromosome 4L in AF13 and NRRL 3357 and at other flanking sequences of H class species (S3H Fig).
The regions of homology among different H class sequences relative to the NRRL 3357 4–1 ATE. The different colors correspond to different strains: purple, AF13; green, A. oryzae strain KBP3; blue, NRRL 3357; red, CA14. The AF13 Chromosome 4L telomere is 99% identical to NRRL 3357 ATE 4–1.
To test this evolutionary relationship further, I used the BLAST Tree Widget Profile with the Fast Minimum Evolution algorithm to compare the two adjacent, but divergent, sub-repeats present in the NRRL 3357 4–1 fragment, identified by homology to NRRL ATE 6–1 (Fig 6). Comparison of the two sub-repeats to the elements scattered at different subtelomeric loci revealed that each half is evolutionarily related to different H class homologs. What is also evident is that this homology is maintained (albeit at lower levels) even in the A. oryzae and A. sojae strains of the Flavi clade. This analysis suggests that different portions of the NRRL 3357 ATE 4-1-like imperfect dimer precursor diverged during the formation of the different strains to give rise to the current array of repeats. These results argue for an evolutionarily robust recombinational activity at these telomere-adjacent loci.
BLAST Tree relationships of different strains with the left (A) and right (B) repeats of NRRL 3357 ATE 4–1 are shown anchored to the relevant repeat. The positions of the left and right repeats are depicted by the arrows in the inset (from S3H Fig) within each tree. The strains that are most distant to the left repeat tend to be closest evolutionarily to the right repeat, suggesting that each repeat served independently as an H class progenitor.
There are some similar features uniting the H class repeat with the TEs, however. Sequence alignment of H class repeats (S5B Fig) reveals that much of the variation in homology is due to preferential G>A and C>T transversion events (100% in NRRL 3357 ATE 6–1 and 80% in NRRL 3357 ATE 8–1). These biases are consistent with a RIP-like mutational process acting during the evolution of both TEs and H class repeats and hint at a common origin.
The remaining class repeats are the approximately 3.6 kb K and the 7.5 kb L repeats. Both are unrelated to subtelomeres, centromeres and any identifiable TEs. Both repeats are widely spread in A. flavus and A. oryzae strains and do not exhibit the polarity restrictions of the subtelomeric elements (S3K Fig). It therefore may represent a ‘RIPed’ copy of a hitherto undefined TE, although other possible explanations cannot be excluded.
Discussion
In this study, I take advantage of a group of sequenced genomes from 14 strains of Aspergillus flavus to identify regions of genome instability. I characterized the predominant classes of large insertions representing either coding or non-coding elements. Non-coding sequences are composed of lengthy regions of 65–93% A+T-rich DNA. Phylogenetic analysis revealed that these AT insertions are dispersed to other chromosomal sites at higher frequencies than non-AT insertions. Despite the mobility and a higher mutation rate, AT insertions are nonetheless maintained and have a similar loss rate as non-AT insertions, consistent with a selection for their maintenance.
I conducted a global analysis of AT elements of ≥ 2 kb in three evolutionary-diverse strains: NRRL 3357, CA14, and SU-16. While the number of ATEs were similar in the strains, their length, distribution, and repetition varied significantly. The strains with complete telomere-to-telomere sequences, NRRL 3357 and CA14, showed a far-greater-than-expected frequency of these elements in regions within the subtelomeric 100 kb. The other major site of ATEs is the long ~100 kb that encompasses each centromere. I also identified ATEs either adjacent or closely positioned to the telomere (Class H) and in subtelomeric regions devoid of assayable TE homology (Classes A, F-J) which, unlike TEs, displayed polarities that were constant with respect to the telomere. The involvement of some of these ATEs in recombination is supported by the analysis of the end-specific dispersion of the class H repeats.
Classes B-F contain sequences homologous to transposons and retrotransposons. These elements were identified as evolutionary derivatives of TEs by the prevalence of transition over transversion mutations [58]. Many of these TE remnants have also undergone premeiotic repeat-induced point mutation. Previous studies have found that the percentage of the genome that is subject to RIP varies considerably from the extreme of 55% (in Pyrenophora) to moderate (13.3% in Neurospora) to relatively low levels (1.3% in Aspergillus oryzae) [29]. However, concrete evidence for A. flavus RIP has been challenging to determine, due both to the paucity of intact TEs in currently evolved strains and to the far less distinct AT-rich regions in A. oryzae (and A. flavus) than in Neurospora and some other fungi [59].
Since AT-rich regions may contribute significantly to genetic instability among divergent strains, I was interested in the contribution of RIP to the ATE intra-strain genetic variability. I therefore focused on the Gypsy 1, 2 and 4 intact elements to measure the relevance of RIP activity. The Gypsy class displayed a very high frequency of C>T mutations that may have proceeded close to substrate exhaustion, thereby retaining significant homology, presumably due to the slow rate of random mutation relative to RIP. The high density of sequences homologous to TEs supports multiple cycles of insertion, deactivation, and deletion.
One of the functions of ATE formation is the protection against the damaging effects of transposition through inactivation of these elements by RIP activity. However, my analysis suggests that these regions have evolved to perform new functions, at least at telomeres and centromeres. Several arguments support this view.
First, the similarity of ATEs between the centromeres of homologous chromosomes far exceeds centromeres of non-homologs in the three strains analyzed. Similarly, Gypsy elements in conserved centromeric clusters were nearly identical to one another but were more diverged when present in other regions of the centromere. The more ancient conservation of both homolog centromeric regions and TE homology likely points to a centromere-pairing-dependent role, such as meiotic recombination, centromere cohesion, or sister chromatid recombination [39], or to a role of the repeats themselves in the formation of a topologically competent centromere, as in higher eukaryotes. I note that the divergence of centromeric A+T-rich sequence in the three A. flavus strains analyzed may have deleterious consequences. Yeast studies have demonstrated that centromeres having imperfect homology, while still permitting meiotic separation, do so at lowered efficiency [60,61]. Such a decreased efficiency could ultimately lead to reproductive isolation between strains and may be a step in the process of speciation. The high level of homology between centromeres may protect against such isolation and help to maintain the ability of different strains to undergo a productive meiosis. These elements must therefore have evolved earlier from a common progenitor strain. Gypsy 1 and Mariner elements are also present at lower frequencies at the centromeres relative to non-centromeric locations. While the presence of conserved repeats in centromeres may play a part in centromere function, an excess of TE homologs may be deleterious by providing substrates for unequal sister chromatid exchange and inverted repeats, leading to dicentric or acentric chromosome formation. This may also be the factor in the selection for deleted forms of the TEs.
Second, a set of novel repeats is present at subtelomeres. Unlike TEs that display an overall random orientation, these repeats are oriented with a specific polarity with respect to the telomere and lack any discernable homology to TEs. I provided evidence for high levels of recombination at the evolutionary level at different telomere-adjacent regions, with sequence dispersion (and partial deletion) between homologous subtelomeres within and between strains. The preferential recombination between telomere-adjacent ATEs appears to have a bias towards specific chromosomal ends and is likely to reflect a telomere-specific function arising late after the evolutionary time of transposition and RIP. For example, the Class H element is conserved in telomere-adjacent sites of Chromosomes 4L, 8L, 6L, and 5L in NRRL 3357, CA14, and AF13 and to a less extent in SU-16, A. oryzae KBP3 and A. sojae SMF strains, where they nonetheless maintain their subtelomeric location (data not shown). Indeed, strain AF13 has a 4L subtelomere nearly identical to NRRL 3357. This repeat is therefore likely to be related to a progenitor element present in all strains, which subsequently recombines with a subset of ends in a strain-independent fashion.
Apart from a potential requirement for subtelomeric recombination activity, these subtelomeric elements may also act to disrupt telomere position effects (TPE). TPE is the silencing of telomere-adjacent genes that initiates at the telomere and spreads in a centromere-proximal direction through the formation of histone-modification-dependent heterochromatin [31]. TPE is present in at least some Aspergillus species, including A. nidulans (Palmer et al., 2010). However, conventional heterochromatin formation may be prohibitive in ATEs due to histone depletion that is predicted from the behavior of even short poly(dA): poly(dT) sequences in vivo [62,63]. The specifically modified histones required for telomere position effects are likely be absent in regions containing high A+T-DNA, raising the possibility that the subtelomeric ATEs may act to disrupt the spreading of TPE, thereby insulating coding regions. An additional possibility is that the recombination of ATEs at subtelomeres is involved in some aspect of telomere maintenance, such as found in the human ALT mechanism(s) of telomere elongation [64].
Third, studies using model substrates reveal that such histone-depleted chromatin architectures would be organized into large-scale chromosomal domains. Studies in the filamentous fungus Epichloë festucae have shown that such AT-rich domains establish organizing elements of gene transcription [65]. These regions include areas of repeated element inactivation, significantly affecting the three-dimensional topology of transcription units.
In summary, I propose an ATE cooption model, in which transposition and associated RIP gave rise to ATEs, which subsequently evolved novel cellular functions at telomeres and the centromeres of homologues (and possibly other sites). One of these functions may be homologous recombination, which may be promoted at telomeres, but be more restricted at the centromeres of non-homologous chromosomes. Future studies on the genetic manipulation of unique ATEs may uncover fundamental new directions in telomeric and centromeric function, as well as in gene regulation. I further anticipate that exploitation of the A. flavus strains will be an excellent evolutionary model to provide insights into the genetic instability of subtelomeric, centromeric, and gene cluster domains.
Supporting information
S1 Fig. Phylogenetic comparisons of a CA14 non-AT insertion with other A. flavus and A. oryzae strains.
Strains were compared by the Blast Tree Widget using the Fast Minimum Evolution algorithm, filtered at 90% homology. The tree is rooted in CA14 non-AT insertion 1–11. The scale for 0.009% homology difference is shown on the bottom right.
https://doi.org/10.1371/journal.pone.0279148.s001
(PDF)
S2 Fig. Phylogenetic Comparisons of a CA14 AT Indel with other A. flavus and A. oryzae strains.
Strains were compared as described in S1 Fig. The tree is rooted in CA14 AT insertion 1–4 (S5 Table). The scale for 0.01% homology difference is shown on the bottom right.
https://doi.org/10.1371/journal.pone.0279148.s002
(PDF)
S3 Fig. Classification of repeated ATE homologies into classes A-L.
A) Class A repeat organization measured by homology to NRRL ATEs 1–1 and 3–3; B) Class B repeat organization measured by homology to NRRL ATE 1–2; C) Class C repeat organization as assayed by homology to 3357 ATE 1–4; D); Class D repeat organization as assayed by homology to 3357 ATE 1–6; E) Class E repeat organization as assayed by homology to 3357 ATEs 1–6; F) Class F repeat organization as assayed by homology to 3357 ATE 1–3; G) Class G repeat organization as assayed by homology to CA14 ATE 3–3; H) Class H repeat organization as assayed by homology to 3357 ATE 4–1 and 6–1; I) Class I repeat organization as assayed by homology to 3357 ATE 3–1 and 7–1; J) Class J repeat organization as assayed by homology to 3357 ATE 8–2; K) Class K repeat organization as assayed by homology to SU-16 ATE 5–4. L) Class L repeat organization as assayed by homology to SU-16 ATE 8–1. Each class corresponds to the classes given by the superscripts in Table 2A, 2B, and 2C. In 3B-3E, green line, homology to the relevant ATE; the orange line, Gypsy elements; dark orange line, LTR 1 and LTR 2; red line, Mariner elements; dark red line, Mariner 2N1 and Mariner 5; blue line, TAD 1 elements. The AO designation refers those TEs identified in Censor as A. oryzae derivatives. The AF designation is used for those Gypsy elements identified in A. flavus by BLAST querying for homologs to the Gypsy1 AO, Gypsy 2 AO, and Gypsy 4 AO elements. Some cryptic TEs were also identified by the homology search (e.g., the additional Mariner 1 elements in SU-16 ATE 5–3 and CA14 ATE 1–2). In Classes A and F-L, blue lines indicate homology to the indicated ATE. In the case of 3H, the number in parenthesis of the blue line indicates the position of homology within NRRL ATE 4–1. Relative directionality of homologous species in 3A and 3F-3L is indicated by the arrowhead.
https://doi.org/10.1371/journal.pone.0279148.s003
(PDF)
S4 Fig. Organization of Gypsy element homologs at chromosome 1, 6 and 8 centromeres in NRRL 3357, CA14 and SU-16.
The positions of the mutated Gypsy 2 and Gypsy 4 homologs within the centromeric domains of chromosome 6 (A), chromosome 8 (B) and chromosome 1 (C) (between the coordinates listed) were plotted and aligned among the three strains. The red, green, blue, and brown colors refer to regions of high homology as shown in Table 6A and 6B. The white color refers to regions of lower homology [corresponding to the black color in Table 6A and 6B)]. The GC content of these regions is presented above each chromosomal plot to aid in the identification of the centromere. Coding regions flanking the centromere are depicted by the purple arrows. Some chromosomes are inverted relative to left (L) and right (R) arms of NRRL 3357 chromosomes in the NCBI database, as indicated. Specifically, chromosomes 1 and 6 of CA14 are presented in inverted orientations. Chromosome 8 species are presented in inverted orientations in both CA14 and SU-16.
https://doi.org/10.1371/journal.pone.0279148.s004
(PDF)
S5 Fig.
Sequence analysis of representative H class repeats (a) and chromosome 4 centromeric Gypsy 4 elements (b). (A) BLAST search of NRRL 3357 ATE 1–4 (query) showing a portion of the homology with 3357 ATEs 6–1 (left) and 8–1 (right) (subjects). In this format, mismatches and gaps are represented as red residues. Quantification of the entire repeat in ATE 8–1 displayed 100% transition mutations, while ATE 6–1 displayed 80% transition mutations. All transversion mutations were A/T switches. (B) A portion of chromosome 4 centromeric Gypsy homolog (denoted by the green hue in Table 6 and Fig 3) homology identified in a BLAST search comparing strain NRRL 3357 (query) with strains SU-16 (A) and CA14 (B) (subjects), revealing a high level of sequence conservation.
https://doi.org/10.1371/journal.pone.0279148.s005
(PDF)
S1 Table. Positions and lengths of indels.
Positions of indels are presented using the coordinates of the inserted (top) strain in each strain a/strain b configuration. Red: AT-rich indel, Green: Reiterated AT-rich indel, White: non-AT indel, Blue: Reiterated non-AT indels. Other combinations of strains did not result in the identification of additional elements. Data, after conversion to the relative positions in NRRL 3357, are plotted in Fig 1.
https://doi.org/10.1371/journal.pone.0279148.s006
(PDF)
S2 Table. Summary analysis of 13 strains indicating the number of unique indels and the range of approximate insertion sizes.
https://doi.org/10.1371/journal.pone.0279148.s007
(PDF)
S3 Table. Tabulation of AT Insertion in various chromosomes.
% AT content, approximate insertion position relative to NRRL 3357, and total fragment length, with the % of the insertion containing the AT-rich sequence in parentheses. CHR, chromosome number. When multiple AT-rich regions are presented within the identified insert, they are separated by semi-colons.
https://doi.org/10.1371/journal.pone.0279148.s008
(PDF)
S4 Table. Summary of the total number of indels vs.
AT-indels on each chromosome in a comparison between CA14 and NRRL 3357 genomes.
https://doi.org/10.1371/journal.pone.0279148.s009
(PDF)
S5 Table. Tabulation of insertions in CA14 and NRRL 3357 genomes.
Data is derived from the indel analysis between the two strains, together with the chromosomal location, the approximate length of AT-rich regions, the site of the insertion, and the total length/AT element length % (parentheses) observed in the insertion. Consecutive numbers (separated by a semi-colon) indicate noncontiguous ATEs within the insertion.
https://doi.org/10.1371/journal.pone.0279148.s010
(PDF)
S6 Table. Mutations in Gypsy homologs are characteristic of RIP.
(left) The values of total mismatches/transition mutations (mm/Ts) between the AT-rich homologs and the Gypsy consensus sequence as determined by the Censor program are listed together with their standard deviation (SD) and sample size (n). (right) The fraction of the total transition mutations accounted for by G>A and C>T mutations within the consensus sequence was determined as described in Materials and Methods. Both full length and deleted forms of candidates were analyzed; nm, not measured.
https://doi.org/10.1371/journal.pone.0279148.s011
(PDF)
Acknowledgments
The author acknowledges Melody C. Baddoo and the Tulane Cancer Center Cancer Crusaders Next Generation Sequence Analysis Core for assistance with Next-Gen-Seq and bioinformatic analyses. I also thank Jeffrey Destruel, M.A. for technical assistance; Drs. Jeffrey Cary, Brain Mack, Mathew Lebar (Southern Regional ARS, USDA, New Orleans) and Dr. Astrid Engel for critical discussion of data; and Bonnie Hoffman for assistance with the preparation of this manuscript.
References
- 1. Gilbert MK, Mack BM, Moore GG, Downey DL, Lebar MD, Joardar V, et al. Whole genome comparison of Aspergillus flavus L-morphotype strain NRRL 3357 (type) and S-morphotype strain AF70. PLoS One. 2018;13(7):e0199169. Epub 20180702. pmid:29966003; PubMed Central PMCID: PMC6028093.
- 2. Georgianna DR, Fedorova ND, Burroughs JL, Dolezal AL, Bok JW, Horowitz-Brown S, et al. Beyond aflatoxin: four distinct expression patterns and functional roles associated with Aspergillus flavus secondary metabolism gene clusters. Mol Plant Pathol. 2010;11(2):213–26. pmid:20447271; PubMed Central PMCID: PMC4116135.
- 3. Gacek A, Strauss J. The chromatin code of fungal secondary metabolite gene clusters. Appl Microbiol Biotechnol. 2012;95(6):1389–404. Epub 20120720. pmid:22814413; PubMed Central PMCID: PMC3427479.
- 4. Brakhage AA. Regulation of fungal secondary metabolism. Nat Rev Microbiol. 2013;11(1):21–32. Epub 20121126. pmid:23178386.
- 5. Palmer JM, Keller NP. Secondary metabolism in fungi: does chromosomal location matter? Curr Opin Microbiol. 2010;13(4):431–6. Epub 20100602. pmid:20627806; PubMed Central PMCID: PMC2922032.
- 6. Daboussi MJ, Capy P. Transposable elements in filamentous fungi. Annu Rev Microbiol. 2003;57:275–99. pmid:14527280.
- 7. Pennerman KK, Gonzalez J, Chenoweth LR, Bennett JW, Yin G, Hua SST. Biocontrol strain Aspergillus flavus WRRL 1519 has differences in chromosomal organization and an increased number of transposon-like elements compared to other strains. Mol Genet Genomics. 2018;293(6):1507–22. Epub 20180811. pmid:30099586.
- 8. Galagan JE, Calvo SE, Cuomo C, Ma LJ, Wortman JR, Batzoglou S, et al. Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature. 2005;438(7071):1105–15. pmid:16372000.
- 9. Payne GA, Nierman WC, Wortman JR, Pritchard BL, Brown D, Dean RA, et al. Whole genome comparison of Aspergillus flavus and A. oryzae. Med Mycol. 2006;44(Supplement_1):S9–S11. pmid:30408940.
- 10. Hua SS, Tarun AS, Pandey SN, Chang L, Chang PK. Characterization of AFLAV, a Tf1/Sushi retrotransposon from Aspergillus flavus. Mycopathologia. 2007;163(2):97–104. Epub 20070207. pmid:17286166.
- 11. Okubara PA, Tibbot BK, Tarun AS, McAlpin CE, Hua SS. Partial retrotransposon-like DNA sequence in the genomic clone of Aspergillus flavus, pAF28. Mycol Res. 2003;107(Pt 7):841–6. pmid:12967211.
- 12. Ogasawara H, Obata H, Hata Y, Takahashi S, Gomi K. Crawler, a novel Tc1/mariner-type transposable element in Aspergillus oryzae transposes under stress conditions. Fungal Genet Biol. 2009;46(6–7):441–9. Epub 20090306. pmid:19269345.
- 13. Hey P, Robson G, Birch M, Bromley M. Characterisation of Aft1 a Fot1/Pogo type transposon of Aspergillus fumigatus. Fungal Genet Biol. 2008;45(2):117–26. Epub 20071023. pmid:18053753.
- 14. Li Destri Nicosia MG, Brocard-Masson C, Demais S, Hua Van A, Daboussi MJ, Scazzocchio C. Heterologous transposition in Aspergillus nidulans. Mol Microbiol. 2001;39(5):1330–44. pmid:11251848.
- 15. Monroy F, Sheppard DC. Taf1: a class II transposon of Aspergillus fumigatus. Fungal Genet Biol. 2005;42(7):638–45. pmid:15896988.
- 16. Carr PD, Tuckwell D, Hey PM, Simon L, d’Enfert C, Birch M, et al. The transposon impala is activated by low temperatures: use of a controlled transposition system to identify genes critical for viability of Aspergillus fumigatus. Eukaryot Cell. 2010;9(3):438–48. Epub 20100122. pmid:20097738; PubMed Central PMCID: PMC2837977.
- 17. Kapitonov VV, Jurka J. Mariner1_AO, a family of Mariner/Tc1 DNA transposons in the Aspergillus oryzae genome. Repbase Reports. 2006;6(1):29.
- 18. Kapitonov VV, Jurka J. Mariner2N1_AO, a family of nonautonomous Mariner DNA transposons in the Aspergillus oryzae genome. Repbase Reports 2006;6(1):30.
- 19. Kapitonov VV, Jurka J. Mariner5_AO, a family of Mariner DNA transposons in the Aspergillus oryzae genome. Repbase Reports. 2006;6:34.
- 20. Kapitonov VV, Jurka J. Gypsy1_AO, a family of Gypsy LTR retrotransposons in the Aspergillus oryzae genome. Repbase Reports. 2006;6(1):1–2.
- 21. Kapitonov VV, Jurka J. Gypsy2_AO, a family of Gypsy LTR retrotransposons in the Aspergillus oryzae genome. Repbase Reports. 2006;6(1):3.
- 22. Kapitonov VV, Jurka J. Gypsy4_AO, a family of Gypsy LTR retrotransposons in the Aspergillus oryzae genome. Repbase Reports. 2006;6(1):6.
- 23. Kapitonov VV, Jurka J. I-2_AO, a family of non-LTR retrotransposons in the Aspergillus oryzae genome. Repbase Reports. 2006;6(1):10.
- 24. Kapitonov VV, Jurka J. LTR1_AO, a family of LTR transposons in the Aspergillus oryzae genome. Repbase Reports. 2006;6:21.
- 25. Kapitonov VV, Jurka J. LTR2_AO, a family of LTR retrotransposons in the Aspergillus oryzae genome. Repbase Reports 2006;6(1):23.
- 26. Selker EU, Cambareri EB, Jensen BC, Haack KR. Rearrangement of duplicated DNA in specialized cells of Neurospora. Cell. 1987;51(5):741–52. pmid:2960455.
- 27. Selker EU. Repeat-induced gene silencing in fungi. Adv Genet. 2002;46:439–50. pmid:11931234.
- 28. Gladyshev E. Repeat-Induced Point Mutation and Other Genome Defense Mechanisms in Fungi. Microbiol Spectr. 2017;5(4). pmid:28721856; PubMed Central PMCID: PMC5607778.
- 29. van Wyk S, Wingfield BD, De Vos L, van der Merwe NA, Steenkamp ET. Genome-Wide Analyses of Repeat-Induced Point Mutations in the Ascomycota. Front Microbiol. 2020;11:622368. Epub 20210201. pmid:33597932; PubMed Central PMCID: PMC7882544.
- 30. van Wyk S, Wingfield BD, De Vos L, van der Merwe NA, Santana QC, Steenkamp ET. Repeat-Induced Point Mutations Drive Divergence between Fusarium circinatum and Its Close Relatives. Pathogens. 2019;8(4):298–318. Epub 20191214. pmid:31847413; PubMed Central PMCID: PMC6963459.
- 31. Ottaviani A, Gilson E, Magdinier F. Telomeric position effect: from the yeast paradigm to human pathologies? Biochimie. 2008;90(1):93–107. Epub 20070806. pmid:17868970.
- 32. Smith KM, Kothe GO, Matsen CB, Khlafallah TK, Adhvaryu KK, Hemphill M, et al. The fungus Neurospora crassa displays telomeric silencing mediated by multiple sirtuins and by methylation of histone H3 lysine 9. Epigenetics Chromatin. 2008;1(1):5. Epub 20081103. pmid:19014414; PubMed Central PMCID: PMC2596135.
- 33. Palmer JM, Mallaredy S, Perry DW, Sanchez JF, Theisen JM, Szewczyk E, et al. Telomere position effect is regulated by heterochromatin-associated proteins and NkuA in Aspergillus nidulans. Microbiology. 2010;156(Pt 12):3522–31. Epub 2010/08/21. mic.0.039255–0 [pii] pmid:20724388.
- 34. Lustig AJ. Mechanisms of silencing in Saccharomyces cerevisiae. Curr Opin Genet Dev. 1998;8(2):233–9. Epub 1998/06/04. pmid:9610415.
- 35. Mieczkowski PA, Lemoine FJ, Petes TD. Recombination between retrotransposons as a source of chromosome rearrangements in the yeast Saccharomyces cerevisiae. DNA Repair (Amst). 2006;5(9–10):1010–20. Epub 20060623. pmid:16798113.
- 36. Cardoso AR, Oliveira M, Amorim A, Azevedo L. Major influence of repetitive elements on disease-associated copy number variants (CNVs). Hum Genomics. 2016;10(1):30. Epub 20160923. pmid:27663310; PubMed Central PMCID: PMC5035501.
- 37. Li B. Keeping Balance Between Genetic Stability and Plasticity at the Telomere and Subtelomere of Trypanosoma brucei. Front Cell Dev Biol. 2021;9:699639. Epub 20210705. pmid:34291053; PubMed Central PMCID: PMC8287324.
- 38. Calderon MC, Rey MD, Cabrera A, Prieto P. The subtelomeric region is important for chromosome recognition and pairing during meiosis. Sci Rep. 2014;4:6488. Epub 20141001. pmid:25270583; PubMed Central PMCID: PMC4180820.
- 39. Bloom K, Costanzo V. Centromere Structure and Function. Prog Mol Subcell Biol. 2017;56:515–39. pmid:28840251; PubMed Central PMCID: PMC6535225.
- 40. Hartley G, O’Neill RJ. Centromere Repeats: Hidden Gems of the Genome. Genes (Basel). 2019;10(3). Epub 20190316. pmid:30884847; PubMed Central PMCID: PMC6471113.
- 41. Smith KM, Galazka JM, Phatale PA, Connolly LR, Freitag M. Centromeres of filamentous fungi. Chromosome Res. 2012;20(5):635–56. pmid:22752455; PubMed Central PMCID: PMC3409310.
- 42. Skerker JM, Pianalto KM, Mondo SJ, Yang K, Arkin AP, Keller NP, et al. Chromosome assembled and annotated genome sequence of Aspergillus flavus NRRL 3357. G3 (Bethesda). 2021;11(8):1–5. pmid:34849826; PubMed Central PMCID: PMC8496237.
- 43. Weaver MA, Mack BM, Gilbert MK. Chromosome-Level Aspergillus flavus Strain CA14 Genome Assembly. Microbiol Resour Announc. 2021;10(1):1–2. Epub 20210107. pmid:33414302; PubMed Central PMCID: PMC8407702.
- 44. Kjaerbolling I, Vesth T, Frisvad JC, Nybo JL, Theobald S, Kildgaard S, et al. A comparative genomics study of 23 Aspergillus species from section Flavi. Nat Commun. 2020;11(1):1106–18. Epub 20200227. pmid:32107379; PubMed Central PMCID: PMC7046712.
- 45. Fountain JC, Clevenger JP, Nadon B, Wang H, Abbas HK, Kemerait RC, et al. Draft Genome Sequences of One Aspergillus parasiticus Isolate and Nine Aspergillus flavus Isolates with Varying Stress Tolerance and Aflatoxin Production. Microbiol Resour Announc. 2020;9(37). Epub 20200910. pmid:32912903; PubMed Central PMCID: PMC7484062.
- 46. Fountain JC, Clevenger JP, Nadon B, Youngblood RC, Korani W, Chang PK, et al. Two New Aspergillus flavus Reference Genomes Reveal a Large Insertion Potentially Contributing to Isolate Stress Tolerance and Aflatoxin Production. G3 (Bethesda). 2020;10(10):3515–31. Epub 20201005. pmid:32817124; PubMed Central PMCID: PMC7534430.
- 47. Kim KU, Kim KM, Choi YH, Hurh BS, Lee I. Whole genome analysis of Aspergillus sojae SMF 134 supports its merits as a starter for soybean fermentation. J Microbiol. 2019;57(10):874–83. Epub 20190627. pmid:31250400.
- 48. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36. Epub 20170315. pmid:28298431; PubMed Central PMCID: PMC5411767.
- 49. Darling AC, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14(7):1394–403. pmid:15231754; PubMed Central PMCID: PMC442156.
- 50. Kohany O, Gentles AJ, Hankus L, Jurka J. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006;7:474. Epub 20061025. pmid:17064419; PubMed Central PMCID: PMC1634758.
- 51. van Wyk S, Harrison CH, Wingfield BD, De Vos L, van der Merwe NA, Steenkamp ET. The RIPper, a web-based tool for genome-wide quantification of Repeat-Induced Point (RIP) mutations. PeerJ. 2019;7:e7447. Epub 20190826. pmid:31523495; PubMed Central PMCID: PMC6714961.
- 52. Diotti R, Esposito M, Shen CH. Telomeric and Sub-Telomeric Structure and Implications in Fungal Opportunistic Pathogens. Microorganisms. 2021;9(7):1405–24. Epub 20210629. pmid:34209786; PubMed Central PMCID: PMC8305976.
- 53. Chang PK, Horn BW, Dorner JW. Sequence breakpoints in the aflatoxin biosynthesis gene cluster and flanking regions in nonaflatoxigenic Aspergillus flavus isolates. Fungal Genet Biol. 2005;42(11):914–23. Epub 20050909. pmid:16154781.
- 54. Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11. Epub 20150602. pmid:26045719; PubMed Central PMCID: PMC4455052.
- 55. Vitte C, Panaud O. Formation of solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol Biol Evol. 2003;20(4):528–40. Epub 20030305. pmid:12654934.
- 56. Anderson C, Tang Q, Kinsey JA. Elimination of active tad elements during the sexual phase of the Neurospora crassa life cycle. Fungal Genet Biol. 2001;33(1):49–57. pmid:11407885.
- 57. Hane JK, Oliver RP. RIPCAL: a tool for alignment-based analysis of repeat-induced point mutations in fungal genomic sequences. BMC Bioinformatics. 2008;9:478. Epub 20081112. pmid:19014496; PubMed Central PMCID: PMC2621366.
- 58. Stoltzfus A, Norris RW. On the Causes of Evolutionary Transition:Transversion Bias. Mol Biol Evol. 2016;33(3):595–602. Epub 20151125. pmid:26609078; PubMed Central PMCID: PMC7107541.
- 59. Testa AC, Oliver RP, Hane JK. OcculterCut: A Comprehensive Survey of AT-Rich Regions in Fungal Genomes. Genome Biol Evol. 2016;8(6):2044–64. Epub 20160703. pmid:27289099; PubMed Central PMCID: PMC4943192.
- 60. Kemp B, Boumil RM, Stewart MN, Dawson DS. A role for centromere pairing in meiotic chromosome segregation. Genes Dev. 2004:1946–51. pmid:15289462.
- 61. Maxfield Boumil R, Kemp B, Angelichio M, Nilsson-Tillgren T, Dawson DS. Meiotic segregation of a homeologous chromosome pair. Mol Genet Genomics. 2003;268(6):750–60. Epub 20030212. pmid:12655401.
- 62. Barnes T, Korber P. The Active Mechanism of Nucleosome Depletion by Poly(dA:dT) Tracts In Vivo. Int J Mol Sci. 2021;22(15):8232–52. Epub 20210730. pmid:34360997; PubMed Central PMCID: PMC8347975.
- 63. Lorch Y, Maier-Davis B, Kornberg RD. Role of DNA sequence in chromatin remodeling and the formation of nucleosome-free regions. Genes Dev. 2014;28(22):2492–7. pmid:25403179; PubMed Central PMCID: PMC4233242.
- 64. Draskovic I, Londono Vallejo A. Telomere recombination and alternative telomere lengthening mechanisms. Front Biosci. 2013;18:1–20. pmid:23276906.
- 65. Winter DJ, Ganley ARD, Young CA, Liachko I, Schardl CL, Dupont PY, et al. Repeat elements organise 3D genome structure and mediate transcription in the filamentous fungus Epichloe festucae. PLoS Genet. 2018;14(10):e1007467. Epub 20181024. pmid:30356280; PubMed Central PMCID: PMC6218096.