Insights into the Loblolly Pine Genome: Characterization of BAC and Fosmid Sequences

doi:10.1371/journal.pone.0072439

Figure 1.

Repeat detection methodology.

BAC sequences (103) and fosmid sequences (90,954) were analyzed for tandem repeats (TRF), interspersed repeats by homology (CENSOR against CPRD), and interspersed repeats via de novo methods (REPET). Details of the annotation process are also shown.

More »

Expand

Table 1.

BAC and Fosmid Sequence Set Summary.

More »

Expand

Table 2.

Summary of tandem repeats from BAC and fosmid sequences.

More »

Expand

Table 3.

Most frequent periods for three categories of tandem repeats in conifer genomic sequence.

More »

Expand

Table 4.

Summary of full-length repetitive content.

More »

Expand

Figure 2.

Microsatellite density across multiple species.

Cross-species comparison of microsatellites ranging from dinucleotide to octanucleotide, as calculated by TRF (microsatellite/Mbp). Analysis included two gymnosperm BAC sets (Picea glauca, Taxus mairei) and four angiosperms genomes (Cucumis sativus, Arabidopsis thaliana, Vitis vinifera, Populus trichocarpa).

More »

Expand

Figure 3.

Distribution of homology-based repeat annotations by species.

Interspersed repeats were analyzed via a redundant similarity search (CENSOR against CPRD). Percentage in each sector represents base pair coverage over the redundant annotations. (A) Displays species coverage for full-length and partial elements. Species with contributions less than 3%, were categorized as ‘Other’. (B) Displays species coverage for full-length elements only.

More »

Expand

Figure 4.

Distribution of transposable elements from similarity search.

A combination of the non-redundant CENSOR results from the BAC sequences (103) and fosmid sequences (90,954) were used to ascertain the major contributing classes of TEs. (A) Compares partial and full-length TE content by homology against other species. (B) Examines the full-length TE content in loblolly pine annotated in homology based and de novo searches.

More »

Expand

Table 5.

Filtered (full-length) vs. Unfiltered (partial and full-length) repetitive content estimates.

More »

Expand

Figure 5.

Genomic sequence represented by the highest coverage elements.

Base pair coverage attributed to copies of the high coverage LTR TEs.

More »

Expand

Table 6.

High coverage LTR families identified with the de novo methodology.

More »

Expand

Figure 6.

Annotated high copy LTR repeat families.

Multiple alignments of the top ten high coverage and novel elements were performed using MUSCLE and visualized in Jalview. The final consensus sequence was exported with substitutions resolved, annotated (LTRdigest), and visualized (AnnotationSketch). (A) Multiple sequence alignment of the 24 sequences in the representative cluster of the PtOuachita family. (B) Multiple sequence alignment of the 67 sequences in the representative cluster of the PtAppalachian family. (C) Multiple sequence alignment of the 68 sequences in the representative cluster of the PtPineywoods family.

More »

Expand