Conceived and designed the experiments: MKS RS LEB PCR. Performed the experiments: MKS RS PC JG JS PC JJ MQ. Analyzed the data: MKS RS PC JG JS PC JJ MQ. Contributed reagents/materials/analysis tools: DR. Wrote the paper: MKS RS PCR.
Current address: Department of Botany and Microbiology, The University of Oklahoma, Norman, Oklahoma, United States of America
The authors have declared that no competing interests exist.
The perennial grass, switchgrass (
The C4 perennial grass, switchgrass (
The work reported here is part of an effort directed towards generating the genetic and genomic resources for switchgrass needed for gene discovery and breeding efforts
Here, we describe the generation and characterization of two high-quality BAC libraries using two different restriction endonucleases (
We constructed two BAC libraries, Pv_ABa and Pv_ABb, from AP13 clone of switchgrass using
DNA was digested from >180 randomly selected BAC clones from Pv_ABa and Pv_ABb libraries and analyzed by Pulsed Field Gel Electrophoresis. A, C), Representative gel pictures of
Characteristic | Pv_ABa | PV_ABb |
Cloning vector used | plindigoBAC536 | plindigoBAC536 |
Restriction enzyme | ||
Total number of clones | 101,376 | 101,376 |
Percent empty clones | <1% | <1% |
Maximum insert size | 280 Kb | 200 Kb |
Minimum insert size | 30 Kb | 50 Kb |
Average insert size | 144 Kb | 110 Kb |
Chloroplast DNA contamination | 0.57% | 0.17% |
Mitochondrial DNA contamination | 0.79% | 0.29% |
Number of genome equivalents | 9X | 7X |
To assess the quality of the BAC libraries, high-density colony filters were hybridized with chloroplast/mitochondria-specific probes spanning the whole genome of respective organelle. Using a pool of chloroplast-specific genes, viz.,
Switchgrass BAC libraries (Pv_ABa and Pv_ABb) were screened by high-density filter hybridizations to estimate chloroplast- or mitochondrial-specific DNA and representation of single/low copy genes. Representative filter hybridization data used to estimate chloroplast (A, D) and mitochondrial (B, E) contaminants, and the library coverage based on the presence of a single copy gene,
Prior analyses suggest that switchgrass is an allotetraploid with an effective genome size of 2x = 1n = 1600 Mbp
We empirically validated the coverage using filter hybridizations with single/low copy genes (
Because BES data represent a random snapshot of a genome, it can be used to perform a genome-wide survey of structural features. We sequenced paired ends of 101,376 and 84,480 clones from Pv_ABa and Pv_ABb, respectively. After removing
The x-axis represents read length with the numbers of sequences indicated on y-axis.
We identified a total of 50,206 SSRs from BES that includes 1–3 nt repeats (at least 12 nt in length) and 4–6 nt repeats (having at least four tandem repeat units) adding up to 870,808 bases. The density of SSRs is therefore, estimated to be one SSR per 5.2 kb of sequence. The most abundant of these were trimeric SSRs (55%,), followed by dimers (20.4%) and monomers (16.6%;
Based upon homology with known plant repeat elements, 279,099 repeat elements were identified from the switchgrass BES (
Type of Element | Total number of elements |
Total length occupied (bp) | %age of total sequence analyzed |
|
178318 | 64563658 | 24.53% |
SINEs | 2306 | 348493 | 0.13% |
Penelope | 27 | 3174 | 0.00% |
LINEs | 14816 | 6018230 | 2.29% |
R2/R4/NeSL | 2 | 106 | 0.00% |
RTE/Bov-B | 3780 | 1911841 | 0.73% |
L1/CIN4 | 11001 | 4102823 | 1.56% |
LTR elements | 161196 | 58196935 | 22.11% |
Ty1/Copia | 50870 | 18995037 | 7.22% |
Ty3/Gypsy/DIRS1 | 108785 | 38941535 | 14.79% |
|
63616 | 14149503 | 5.38% |
hobo-Activator | 7917 | 1872711 | 0.71% |
Tc1-IS630-Pogo | 5764 | 867375 | 0.33% |
En-Spm | 22287 | 6169405 | 2.34% |
MuDR-IS905 | 11351 | 2572080 | 0.98% |
Tourist/Harbinger | 8620 | 1441495 | 0.55% |
Others | - | 1226437 | 0.47% |
|
4424 | 804772 | 0.31% |
Total interspersed repeats: | 79517933 | 30.21% | |
Small RNA: | 2382 | 568637 | 0.22% |
Satellites: | 1634 | 244518 | 0.09% |
Low complexity: | 28725 | 1199167 | 0.46% |
most repeats fragmented by insertions or deletions have been counted as one element.
Similarity-based repeat detection is generally limited by the size and diversity of the available databases. To identify switchgrass-specific novel repeat elements, we carried out a self-comparison of the BES. Even with the stringent threshold requirement that each 100 bp window matches another BES with at least 90% identity, 61.2% (202,280) of the switchgrass BES matched at least one other BES (
The x-axis represents the number of matches and y-axis contains total number of BAC-end sequences.
To better characterize this valuable resource and provide an overview of the expanse of biological functions encoded by the switchgrass genome, we performed functional annotation and GO analysis of protein-coding signatures obtained from the BES with regard to the three major gene ontology terms viz., molecular function, biological process and cellular locations. Out of the 330,297 BES, 5052 could be associated with at least one GO term (
A, Cellular locations −12 groups of gene ontology; B, Biological processes −11 groups of gene ontology; C, Molecular functions −15 groups of gene ontology terms.
For comparative mapping, we initially mapped switchgrass BES to rice peptides, which were subsequently mapped onto sorghum and
BES having base pair identity >75% with e value <1e-20 and coverage of >50% were placed on to rice peptides. The equivalent regions in sorghum and
Forty-seven randomly selected BACs from Pv_ABa were sequenced to essentially full-length using Sanger's method. The average size of these BACs was 153.6 kb. The distribution of SSRs and repeat elements in the full-length BAC sequences (
We compared the order of switchgrass genes and their transcriptional orientations with orthologous regions in sorghum, maize, rice and
Colored boxes along the physical location in the genome of each species represent genes and arrows in the colored boxes indicate the transcriptional orientation of each gene. Orthologous genes are given the same color and are connected by dotted lines. Grey bars represent genes from respective genomes lacking syntenic match in switchgrass. Dashed lines represent breaks in contiguity to allow larger genomic regions of the chromosomes to fit in the scale of the figures and the genes from these regions lacking syntenic match have not been plotted. The scale is shown at the bottom of each section. NCBI accession numbers for switchgrass BAC clones are given at top left of each section. Detailed information on accession numbers and gene names is given in
While trying to assemble the tetraploid genome of switchgrass, a major challenge will be to discriminate between paralogous, orthologous and homoeologous regions. Further repetitive regions longer than the read length and similarity in homoeologous regions may lead to potential misassemblies, which could require a great deal of directed sequencing to accurately resolve
Here we report construction of two BAC libraries from switchgrass accounting for ∼16 haploid genome equivalents of switchgrass with >99.9% probability of finding a particular sequence. The large insert size, high coverage and low organellar DNA contamination indicate that these libraries provide a useful resource for diverse genetic and genomic studies including genetic and physical mapping, exon trapping, isolation of closely-linked polymorphic markers, FISH analysis, as well as functional and comparative genomics studies
Microsatellites play an important role in genome evolution and gene regulation. They have been extensively used in several research areas including linkage mapping, comparative genomics and population genetics
Transposable elements are abundant in plant genomes and play an important role in determining the size of grass genomes and driving genome evolution in response to environmental cues
In addition to the repetitive DNA fraction identified by classical analysis (30.97%), novel SREs (∼2.3%) bring the total repetitive DNA content of switchgrass to a minimum of ∼33% which is similar to estimated repeat content in rice in spite of the much greater genome size of switchgrass (
GC content is an important feature of a genome as indicated in several studies of prokaryotes, vertebrates and plants
Due to its large genome size, the genes in switchgrass are expected to have longer intergenic regions as compared to rice and other shorter genomes. Based on BAC-end sequence analysis, the estimated gene density in switchgrass is one gene per 16.4 kb, which varies in gene-rich and gene poor or repetitive regions. The highest density observed among the full-length BAC sequences is one gene per 6.8 kb (AC243226) and lowest was one gene per 59.4 kb (AC243244). Conversely, gene density in rice, sorghum and
Investigation of genomic organization and comparative mapping to other grasses using RFLP (restriction fragment length polymorphism) markers revealed several syntenic regions between the rice and switchgrass genomes.
Single-pass BAC-end sequences are generally very specific and hence can be used as markers for comparative genomic studies. The BES reported here covers 16.4% of switchgrass genome and thus provides a reliable resource for anchoring switchgrass sequences to related grass model genomes. We picked four genomes with varying evolutionary distances viz., sorghum, maize, rice and
Comparisons of full-length BAC sequences of switchgrass also revealed its higher similarity to sorghum followed closely by rice and then maize and
Due to difficulty of cloning and characterizing genes in polyploids like switchgrass; rice and
It will be intriguing to investigate what makes switchgrass so different from these crops in terms of morphology, effective genome size (∼1600 Mbp; four times than that of rice), ploidy level (polyploid vs diploid rice) and physiological processes (C4 vs C3 in rice and
The results reported here represent an important milestone for advancement of functional and comparative genomic studies of switchgrass. The BAC library resources and comparative anchoring of BES will be useful for SSR marker development, saturating existing linkage maps, anchoring physical and genetic maps, and assembly of ongoing genome sequence of switchgrass.
Leaf tissue from young plantlets of
BAC libraries were constructed at Clemson University Genomics Institute (CUGI) according to a published protocol
For Southern blot analysis, total genomic DNA was isolated from leaf tissue of
Aliquots of genomic DNA (12 µg each) were digested with four different restriction enzymes (
Approximately, 180 BAC clones were randomly selected from each library and inoculated to 2 mL overnight cultures of LB media containing 12.5 µg/mL chloramphenicol in 15 mL culture tubes. Cells were collected at 16,000 g for 10 min and BAC DNA was prepared using Qiagen's plasmid isolation kit. BAC DNA was digested with 10 U of
Essentially full-length sequences for randomly selected BAC clones were obtained at the HudsonAlpha Institute of Biotechnology (
The BES reads were obtained by Sanger's method on ABI 3730XL capillary sequencing machines at the HudsonAlpha Institute of Biotechnology. The resulting trace data was base called using Phred V 0.020425 and vector sequences were masked using cross_match. Masked terminal vector sequences and BES less than 50 bp in length were removed. High quality sequences were then filtered for plant-organelle genomes-specific or
We used mreps
Gene predictions from switchgrass BES was performed using Geneid v 1.4.4
To map BAC-end sequences onto grass genomes, the BES were first aligned to rice peptide sequences using BlastX. The equivalent regions in sorghum and
To produce high-quality non-redundant genomic sequences, repeat elements from full-length BAC sequences were masked using RepeatMasker 3.3.0
Genomic sequences of sorghum and
The libraries and filters have been made available to the public through the Clemson University Genomics Institute (CUGI;
(JPG)
(XLS)
(TXT)
(XLS)
(DOC)
(XLS)
(XLS)
(XLS)
(XLSX)
(XLS)
(DOC)
We would like to thank Uffe Hellsten and Kerry Berry (JGI) for sequencing and technical advice and Malay Saha (Nobel Foundation), Michael Udvardi, Jiyi Zhang and Yuhong Tang (BESC, Nobel Foundation) for providing AP13 material. We thank Christopher Saski at Clemson University for construction of the BAC libraries.