The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs) and insertions/ deletions (InDels) were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family– based QTL and association analysis in Eucalyptus.
Citation: Dasgupta MG, Dharanishanthi V, Agarwal I, Krutovsky KV (2015) Development of Genetic Markers in Eucalyptus Species by Target Enrichment and Exome Sequencing. PLoS ONE 10(1): e0116528. doi:10.1371/journal.pone.0116528
Academic Editor: Swarup Kumar Parida, National Institute of Plant Genome Research (NIPGR), INDIA
Received: July 31, 2014; Accepted: December 8, 2014; Published: January 20, 2015
Copyright: © 2015 Dasgupta et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: All aligned sequence files are deposited in NCBI Short Read Archive with the accession number SRP045253 for E. tereticornis (SRX747331), E. camaldulensis (SRX669390) and E. grandis (SRX747330).
Funding: The funding for the research work was provided to MDG by Department of Biotechnology, Government of India under the DBT-CREST Awardship with grant number BT/IN/CREST-Awards/15/MDG/2010–11 and under the research project with grant number BT/PR10055/PBD/16/772/2007. The funding support as research fellowship was provided to VD by Department of Biotechnology, Government of India under the research project with grant number BT/PR10055/PBD/16/772/2007. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: Ms. Ishangi Agarwal employed in Genotypic Technology Private Limited, Bangalore, India was associated with the analysis of the sequence data and has no potential competing interest and financial disclosure. The authors declare that her authorship does not alter their adherence to all PLOS ONE policies on sharing data and materials as detailed online in the journal homepage.
The genus Eucalyptus belongs to family Myrtaceae and consists of over 700 species  that occupy a broad range of environmental conditions. Most of the species are native to Australia and have been introduced to India, France, Chile, Brazil, South Africa and Portugal in the first quarter of 1800s . It is one of the most widely planted hardwood crop in the world because of its superior growth, adaptability and wood properties and occupies 20.07 M hectares globally. India ranks second in area under Eucalyptus plantation (3.943 M ha) after Brazil (4.259 M ha) . In tropical and subtropical regions, E. grandis, E. urophylla and their hybrids are highly preferred for pulp production and solid wood, while E. globulus is favored in the temperate regions . Six species including E. camaldulensis, E. grandis, E. globulus, E. pellita, E. tereticornis and E. urophylla are reported to be suitable for Indian agro-climatic conditions and widely planted in the subcontinent [5–6].
Eucalyptus is a potential out-crosser and due to unlimited free natural hybridizations, the populations are highly heterozygous. Hence, extensive studies were conducted to determine genetic diversity at species and population levels using different marker systems [7–16].
Linkage maps in different species of Eucalypts have been widely reported [17–21]. QTL mapping in this genus has been conducted tagging important traits like wood properties, vegetative propagation, response to biotic and abiotic stress, juvenile traits, stem growth, water stress tolerance and frost tolerance [22–27]. QTL studies in Eucalyptus species was recently reviewed in detail by Grattapaglia et al. . Population based association studies were reported for E. nitens and E. globulus targeting wood property traits [29–31]. Recently, the first experimental study of Genomic Selection was reported by Resende and co workers  in two Eucalyptus populations for growth and wood property traits.
The genomic data in Eucalyptus species are well-documented and available in public databases, private collections and consortia as EST resources [33–34] and transcriptome resources [16, 35–42]. Several dedicated databases are available for Eucalyptus genome research, such as EUCANEXT, EucalyptusDB, Eucspresso , EUCATOUL, EUCAWOOD , EucaCold , EucGenIE  and Phytozome10.
Subsequently, the Eucalyptus genome sequencing project was initiated independently for E. grandis at the US Department of Energy Joint Genome Institute, USA and E. camaldulensis at Kazusa DNA Research Institute in Japan. Recently, the complete genome sequence of E. grandis (‘BRASUZ1’) was published  and the assembled non-redundant chromosome-scale reference (v1.0) was released with 640 Mb (94%) genome coverage organized into 11 pseudomolecules. It was also reported that 34% of the protein-coding genes occur as tandem duplication and 84% share similarity to rosid lineages.
The draft genome sequence of E. camaldulensis sequenced in Japan had a total length of 655,922,307 bp of non-redundant genomic sequences consisting of 81,246 scaffolds and 121,194 singlets. These sequences accounted for approximately 92% of the gene-containing regions. A total of 77,121 complete and partial structures of protein-encoding genes were annotated . The database containing the draft sequence can be accessed at http://www.kazusa.or.jp/eucaly.
In the last decades several generic DNA markers have been employed for molecular breeding. These markers are usually effective but their development is labor-intensive and time consuming. However, with the advent of ‘next generation’ sequencing technologies, a paradigm shift has occurred in DNA sequencing approach, resulting in high throughput and cost effective sequencing methods [46–47]. Nevertheless, sequencing of large number of genomes is still not feasible due to the substantial cost, time, management and storage of the enormous informatics data. Hence, considerable effort has been directed towards sequencing of genome sub-regions by ‘target enrichment’ methods. Re-sequencing of these enriched genomic regions is time and cost effective and the data analysis is less complex .
In the present study, we conducted target enrichment of exomes for 94 genes involved in xylogenesis and re-sequenced them in three Eucalyptus species, which were used in developing mapping pedigrees. Presence of SNVs and InDels across different species in pair-wise comparisons and in comparison to the E. grandis reference genome was documented. This study presents an efficient and high throughput method on development of genetic markers for family – based QTL and Association analysis in Eucalyptus.
Materials and Methods
Plant Material and DNA Isolation
Three genotypes from Eucalyptus camaldulensis, E. tereticornis and E. grandis were selected for target enrichment. E. camaldulensis (Ec111) belonging to Kennedy River Provenance from Queensland, Australia is a selection from the Provenance Resource Stand, Pudukkotai, Tamil Nadu, India while E. tereticornis (Et86) is a selection from Seed Production Area, Pudukkotai, Tamil Nadu, India. E. grandis (Eg9) is a selection from the Lorne provenance trial at Hossammund, Ootacamund, Tamil Nadu, India. These genotypes were used as parents for development of mapping populations targeting wood property traits.
The leaf tissues from the three genotypes were harvested and immediately frozen at −80°C. Genomic DNA was isolated from the leaf tissues using the GenElute Plant Genomic DNA isolation kit (Sigma Aldrich, USA) and quantified using NanoDrop ND1000 spectrophotometer (Thermo Scientific, USA).
Selection of Genes and Probe Design for Sequence Capture Array
Genes involved in different steps of secondary xylem formation including cell division, cell expansion, cell wall thickening, cell wall proteins, lignin biosynthesis and programmed cell death in Arabidopsis, Populus, Zinnia and Eucalyptus spp. were short-listed from literature and 94 genes were selected for target enrichment and re-sequencing. Their respective gene orthologs were downloaded from E. grandis genome database hosted by Phytozome portal (http://www.phytozome.net/cgi-bin/gbrowse/Eucalyptus). The sequences were functionally annotated and their position in chromosome, protein domains, biological pathways and gene ontology were defined based on the recent assembly of E. grandis using Phytozome v10 .
Hundred and twenty bp long hybridization probes (“baits”) were designed with 1bp tilling using SureSelect eArray software (Agilent Technologies, Santa Clara, California, USA) targeting exons and UTRs in 94 genes. A total of 169,700 baits were designed to capture the exons and UTRs in the three species. Using this design, a customized array was synthesized at Agilent Technologies.
Library Preparation, Target Enrichment and Validation
Ten micrograms of DNA from each sample in 100 μl of nuclease free water were sonicated to fragment DNA to size range of 100 to 500 bp. The size distribution was checked on the Agilent 2100 Bioanalyzer, and the DNA was cleaned using the Agencourt AMPure XP SPRI beads (Beckman Coulter, Australia). The libraries for each sample were prepared using the Illumina TruSeq DNA Sample Preparation Kit (Illumina Inc., San Diego, CA, USA). The sheared DNA was subjected to a series of enzymatic reactions that repair frayed ends, phosphorylated the fragments, added a single nucleotide overhang to code the libraries and ligated adaptors using manufacturer’s protocol for the Illumina TruSeq DNA sample preparation kit. Subsequently, PCR enrichment (10 cycles) was performed to amplify the library. The three barcoded libraries were pooled in equimolar amounts and approximately 20mg of DNA was hybridized on the Agilent 244Kmicroarray (AMADID: EA560-037734) following manufacturers’ protocol. The hybridization was carried out at 65°C for 65 hrs as described by Hodges et al. . After standard washing procedures, DNA was eluted in nuclease free water by incubating the array at 95°C for 10 min. The captured library was PCR amplified for 18 cycles and purified using the Agencourt AMPure XP SPRI beads (Beckman Coulter, Australia).
The enriched library was quantified using a NanoDrop Spectrophotometer and the quality was checked on the Agilent High Sensitivity Bioanalyzer Chip. RT-qPCR was conducted on pre- and post-captured library using primer pairs designed for the target (EtCesA1, EtCesA2 and EtCesA5) and non-target (EteIF4 and EtH2B) genes (S1 Table) to confirm enrichment of the targeted regions. The qRT-PCR data was analyzed using the ΔΔCT method described by Livak and Schmittgen .
Sequencing and Analysis
The three pooled barcoded libraries were subjected to cluster generation and 2 × 100bp paired end sequencing was conducted using the Illumina GAII Analyzer. High Quality (HQ) reads were filtered from raw data using SeqQC_V2.2 (a proprietary QC tool of Genotypic Technologies Ltd., Bangalore, India) with cutoff Phred quality scores (Q) of 20 (the probability of 1 in 100 bases sequenced may be due to an error). Further, the quality passed sequencing reads were trimmed for Adapter, B Block and low quality end sequences with 50bp cut off using Raw Data Processing Script. The trimmed reads were aligned (gapped alignment) to the E. grandis reference sequence using bowtie 2-2.0.0-beta5  with affine read gap penalty and affine reference gap penalty of 5 for gap open and 3 for gap extension. The un-gapped alignment was done using bowtie version 0.12.7 . The variations across the aligned sequences were taken into account from both gapped and un-gapped alignments to overcome the possibilities of false variations induced by allowing gaps. Variations reported in both alignments are expected to be of higher confidence. SNV calling and InDel detection was done using SAMtools version 0.1.7a (http://samtools.sourceforge.net) with default parameters . The cut off thresholds of 3 and 10 were set for the minimum number of reads showing variation and for the minimum RMS mapping quality for SNVs, respectively. The same tool was used to generate the consensus sequence of the aligned reads, while multiple alignments were done using ClustalW version 2.0.12. Pair wise comparison of the sequence data for the three species was conducted to identify SNVs and InDels based on their positions using R Bioconductor code. The ambiguous SNVs generated due to genetic divergence of the three species were not considered for analysis.
Results and Discussion
Selection of Candidate Genes
Ninety four xylogenesis-related genes involved in different stages of wood formation including biosynthesis of lignin, cellulose, pectin, monoterpene, xyloglucan, cell wall related genes, genes involved in carbohydrate metabolism, programmed cell death, phyto-hormone signaling, transcription factors and regulatory proteins were selected for the present study (Table 1). The position of the genes in chromosomes and their biological functions in respect to the E. grandis reference genome are presented in S2 Table. As many as 14 genes were localized on chromosome 7, while only 4 genes localized on chromosome 8. Two genes, monoterpene glucosyl transferase and IAA binding domain were not assigned to any chromosome.
The formation of the secondary cell wall is driven by the coordinated expression of numerous genes involved in the biosynthesis of cellulose and hemicellulose, lignin, pectin, cell wall proteins and minor soluble and insoluble compounds [54–59], [33, 38–39]. Expressed wood-formation genes show high functional conservation across plant genera and up to 90% of genes expressed in loblolly pine have homologs in Arabidopsis . Similarly, a high proportion of poplar ESTs appear to have homologs in the Arabidopsis genome [61–62].
The role of transcription factors as master switches in vascular and xylem development has been investigated in detail in poplar, eucalypts, pine and Arabidopsis. Highly expressed transcription factors like MYB and NAC families are implicated as critical regulators of vascular differentiation, phenylpropanoid metabolism, xylem differentiation and secondary wall formation. The other important regulators include the homeodomain superfamily of transcription factors (HD-Zip, WOX, KNOX, and ZF-HD), ethylene responsive elements (AP2/ERF domain), bZIP, WRKY and LIM [63–70].
Hormonal regulation of wood formation is well documented and major phyto-hormones playing pivotal role in cambial activity and wood formation include auxin, cytokinin, gibberellic acid, brassinosteroids and ethylene. The receptors of hormone responsive genes and transcription factors are reported to be expressed during cambial development and wood formation [71–74].
The selection of genes in the present study was based on the literature survey as described above and major functional and regulatory genes presumably involved in cambial development and wood formation were selected.
Validation of Target Enrichment
The array based hybridization enrichment was conducted to capture the 94 xylogenesis-related genes in three species of Eucalyptus. The enrichment of the targeted regions after hybridization was validated using the RT-qPCR on pre- and post-capture libraries for target genes EtCesA1, EtCesA2 and EtCesA5 and non target genes EteIF4 and EtH2B. The comparison of pre and post hybridization data demonstrated 64 fold, 165 fold and 59 fold enrichments of the target genes, EtCesA1, EtCesA2 and EtCesA5 respectively, while no enrichment was observed for the non target genes, EteIF4 and EtH2B.
Read and Alignment Statistics
The 2 × 100 bp paired end raw reads were subjected to quality checking using SeqQC_V2.2. In E. camaldulensis (Ec111), a total of 15.75 million reads were generated and the total number of HQ reads were 13.86 million (88.02%), while in E. tereticornis (Et 86), the total number of reads were 17.07 million and the number of HQ reads were 15.14 million (88.69%). In E. grandis (Eg9), the total number of reads was 11.41 million with 10.22 million HQ reads (89.59%).
The HQ reads from all the three species were aligned with the E. grandis reference sequence using both gapped and un-gapped alignment tools. In E. camaldulensis, 170866bp (98.43% read coverage) were aligned with the reference sequence, which had a total sequence length of 173593bp, while in E. tereticornis, 170825bp sequence length was aligned with reference with 98.41% coverage. Similarly, in E. grandis, 170671bp was aligned with the reference sequence with coverage of 98.32%. The total percent of reference covered with at least 5X depth was 97.71%, 97.86% and 97.12% in E. camaldulensis, E. tereticornis and E. grandis, respectively, while reference covered with at least 10X read depth was 96.99%, 97.36% and 95.67%, respectively. Similarly, the alignment statistics for reference covered with 20X depth was 95.9%, 96.34% and 93.53% in E. camaldulensis, E. tereticornis and E. grandis, respectively. The optimized average read depth in E. camaldulensis was ∼223X, while in E. tereticornis it was calculated as ∼227X. The optimized average read depth in E. grandis was ∼199X. The aligned sequence data was deposited in NCBI Short Read Archive with the accession number SRP045253 for E. tereticornis (SRX747331), E. camaldulensis (SRX669390) and E. grandis (SRX747330).
Next generation sequencing platforms produce robust sequence output making high throughput DNA marker discovery feasible and cost effective [75–76]. It was reported that considering all available NGS platforms, Illumina was preferred for de novo sequencing, re-sequencing and high-throughput SNP discovery, due to generation of high read depth leading to reference based contig assembly with high confidence [75–77]. The efficiency of this platform in SNP discovery has been well documented in E. camaldulensis ; Arabidopsis ; wheat [80–82]; olive ; Solanum spp. ; Douglas—fir ; soybean [86–87]; apple  and pine .
Another important consideration while conducting target enrichment and re-sequencing is the read depth to reliably detect SNPs. It was reported that a minimum of 8X coverage  and up to 200X  was optimal for SNP calling. In the present study, the read depth was significantly high at ∼223X in E. camaldulensis, ∼227X in E. tereticornis and ∼199X in E. grandis. Similar studies in Fragaria vesca documented the average depth as 120X , while in E. camaldulensis, the average read depth for all the bases was 6124X .
Specificity (the number of reads that map to the targeted sequence) is an important aspect of target enrichment experiments. The present study documented high read coverage with E. camaldulensis showing 98.43% coverage, E. tereticornis with 98.41% coverage and E. grandis with coverage of 98.32% with reference sequence, suggesting high specificity of the hybridized probes to the target sequences. Similarly, in an earlier study in E. camaldulensis, 94.2% coverage was reported with reference genome of E. grandis . In the wheat, NimbleGen array with genomic DNA derived from eight wheat varieties was used for target enrichment and exome sequencing and an average of 38.1% (22%–44.5%) was aligned to the reference sequence , while Saintenac and co workers  reported an increase in specificity of reads on target to 60% and the number of covered target bases reported was 92%. In Populus trichocarpa, an average of 86.8% of base pairs in the bait regions was mapped on the reference sequence . Hence, the high read depth and coverage achieved in the present investigation can be considered optimal for identification of variation with high confidence.
Identification of Variants (Snvs And Indels) in Three Eucalyptus Species across E. Grandis Reference Genome
The SNVs and InDels present in the sequences aligned with the reference were individually determined for each species. A total of 5905 SNVs were discovered in all three species, which included 2294 SNVs in E. camaldulensis (604 and 299 SNVs from gapped and un-gapped alignments, respectively and 1391 SNVs common for both gapped and un-gapped alignments), 2383 SNVs in E. tereticornis (636 and 303 SNVs from gapped and un-gapped alignments, respectively and 1444 SNVs common for both alignments), and 1228 SNVs in E. grandis (460 and 122 SNVs from gapped and un-gapped alignments, respectively and 646 SNVs common for both alignments) (Table 2).
The presence of SNVs in UTRs and exons were also identified and maximum number of SNVs was recorded in the exon region (4187), while 1226 SNVs were documented in the 3’UTR. A total number of 492 SNVs were identified in the 5’UTR across all the three species (Table 3, 4 & 5). In E. tereticornis, the maximum number of SNVs was recorded in SuSy1 (85), while only one SNV was observed in PTM5 (S3a Table). In E. camaldulensis, a similar trend was observed with maximum of 72 SNVs identified in SuSy1 and only one SNV recorded in PTM5 (S4a Table). However, when the E. grandis sequences were compared with the reference genome, a maximum of 60 SNVs was observed in C3H while a single SNV was documented in several genes, including AP2L, ARF, ARF2, EXPA, GATA1, LAC2, PTM5, VND6. No SNVs were detected in CCAAT, FLA1, and LBD (S5a Table).
The SNV frequency was calculated for exon and the UTR regions individually in each species. The SNV frequency in 5′UTR of E. tereticornis, E. camaldulensis and E. grandis was 1/78.49bp, 1/101.11bp and 1/170.42 respectively, while SNV frequency in the exon region was 1/126.78, 1/125.61 and 1/306.72 for E. tereticornis, E. camaldulensis and E. grandis respectively. In 3′UTR, the SNV frequency was 1/86.61, 1/100.23 and 1/176.08 for E. tereticornis, E. camaldulensis and E. grandis respectively (Table 3, 4 & 5).
Further, the presence of SNVs in pair-wise combination between the three Eucalyptus species was also conducted. The gene-wise presence of ambiguous nucleotides was not considered and SNV with no ambiguity was mapped on the candidate genes (S6 Table). When E. camaldulensis and E. tereticornis were compared, a total of 317 SNVs were documented with a minimum of one SNV in 4CL, bZIP, CCoAOMT1, CesA3, EXPA, GRAS1, NAM1, PIP1, PTM5, SBP1, SND1, STM, SuSy1, TUA1, VND7 and a maximum of 25 SNVs in LAC. Larger number of SNVs were recorded when E. grandis was compared with E. tereticornis and E. camaldulensis with 875 and 1014 SNVs respectively. In both pair-wise combinations, the maximum number of SNVs was observed in LAC with 53 SNVs when compared across E. camaldulensis and 46 SNVs when compared across E. tereticornis.
The presence of InDels were also detected when the sequences of 94 genes were compared individually across the reference and a total of 1406 InDels were discovered with the size range of 1–24 nucleotides (Table 2). The position of InDels in exons and UTRs was also determined and the total number documented was 843, 309 and 254 in exons, 3’UTR and 5’UTR, respectively (Table 6). In E. tereticornis, a total of 518 InDels were detected and a maximum of 20 InDels was recorded in the transcription factor HB1 Class III, while a single InDel was documented in several genes including CCAAT, DUF1,ERF, MUR3,MYB2,PL, PTM5,UXS1 and WUS1. No InDels were recorded in ASP, CAld5H, DOF1, F5H, DIR1, and FLA1 (S3b Table). In E. camaldulensis, a total of 479 InDels were recorded and the maximum number of InDels was discovered in HB1ClassIII (18), while only a single InDel was identified in DIR1, DUF1, ERF, GRAS1, GT, IAA, MUR3, PAAPA, PTM5, UGT and WUS1. InDels were not detected in ASP, CAld5H, DOF1, F5H, FLA1, GATA1, PL and UXS1 (S4b Table). In E. grandis, a total of 409 InDels were discovered and a maximum of 17 InDels was documented in HB1ClassIII, while only a single InDel was identified in FLA1, DUF1, IAA, MUR3, PTM5, CCAAT, LBD, DHN, MYB2, C4H and HCT. InDels were not found in ASP, CAld5H, DOF1, F5H, GATA1, PL, UXS1, DIR1 and WUS1 (S5b Table). The InDel frequency was calculated for each species (Table 6). The InDel frequency (bp/InDel) was the highest in the exon region for all the three species with 411.14, 446.38 and 482.58 in E. tereticornis, E. camaldulensis and E. grandis, respectively. The total InDel frequency was 332.05, 359.08 and 420.54 bp per InDel in E. tereticornis, E. camaldulensis and E. grandis respectively, across the all the genes selected (Table 6).
Similarly, the presence of InDels was also documented in pair-wise combination and a total of 731 and 699 InDels were detected across E. grandis & E. tereticornis and E. grandis & E. camaldulensis, respectively. A total of 702 InDels were detected between E. camaldulensis and E. tereticornis. Maximum number of InDels across all combinations was observed in HB1 Class III transcription factor with 26 InDels when compared between E. grandis and E. tereticornis, 27 InDels between E. grandis and E. camaldulensis and 27 InDels between E. camaldulensis and E. tereticornis. A minimum of one InDel was documented across several genes like FLA1; DIR1, EXPB, FLA1, WUS1 and DIR1, DUF1, PL, UXS1 in E. grandis & E. tereticornis; E. grandis & E. camaldulensis and E. camaldulensis & E. tereticornis respectively (S7a,b,c Table).
The abundance of SNPs / SNVs in plant genome and the availability of cost effective technologies for genotyping has made high-throughput SNP genotyping pivotal for genetic mapping, gene discovery, germplasm characterization and population genomics . NGS based SNP discovery is reported in several crop like wheat , , ; Eucalyptus ; rice ; barley ; cotton ; soybean ; potato ; Arabidopsis ; maize  and several other species. Use of SNP marker panels for genetic analysis has been widely explored in less domesticated crop  and trees [103–105]. SNP genotyping in Eucalypts species is reported from E. grandis , E. globulus, E. nitens, E. camaldulensis and E. loxophleba , inter-specific hybrids of Eucalyptus , E. pilularis , E. globulus  and E. camaldulensis [41,78].
The SNP frequency in Eucalyptus species is considered to be one of the highest in woody species due to its recent domestication, large population size and outbred mating system . Kulheim and coworkers  reported that the SNP density in E. nitens was 1/33bp, 1/31 bp in E. globulus, while in E. camaldulensis and E. loxophleba it was significantly high at 1/16bp and 1/17bp respectively. However, a later study showed that the SNP frequency was 1/83.9bp in E. camaldulensis . In the present study, the SNV frequency ranged from 1/78.49bp to 1/306.72bp across different genic regions of E. camaldulensis, E. tereticornis and E. grandis. Recently, the SNP frequency in inter-specific hybrids of Eucalypts was documented as 1/133bp , suggesting that the SNP frequency was depended on the target region. In heterozygous species, the SNP frequency is generally high as documented in pine with 1/102.6bp , grapevine with 1/64bp , maize with 1/60bp  and rye which registered a SNP frequency of 1 SNP at 52bp interval .
Insertion and deletion polymorphisms (InDels) are an important source of genomic variation in plant and animal genomes. Mechanisms such as insertion and excision of transposable elements, slippage in simple sequence replication, errors in DNA synthesis and repair, recombination and unequal crossover can result in the formation of InDels [114–115]. However, accurate genotyping from low-coverage sequence data can be challenging . Further, polymorphism in short InDels is increasingly being used as an important marker in humans , Drosophila melanogaster  and G. gallus . Report on InDel genotyping in plants are limited to rice , Arabidopsis thaliana , Citrus clementina  and Phaseolus vulgaris . In tree species, InDel discovery is reported from Salix spp.  and Populus spp. [125–126]. InDel markers for species discrimination have been reported in E. grandis and E. gunnii  and Populus spp. [125,127].
In the present study, high number InDels in the size range of 1–24 nucleotides were documented in the three Eucalypts species at a frequency of 332.05, 359.08 and 420.54 bp per InDel in E. tereticornis, E. camaldulensis and E. grandis, respectively. This is higher than the earlier reported InDel frequency of 1.5 InDel/1000 bp  in Eucalyptus genome and 1/2756bp in inter-specific hybrid population . Similarly, in Pinus taeda, Kong et al.  reported that InDels were infrequent with only 0.67% frequency in targeted regions. The probable reason for this variance in the present investigation could be due to the highly divergent genotypes selected in the present study, indicating that InDels could be a useful marker for genetic analysis in Eucalyptus species.
The NGS platforms have brought in paradigm shift in understanding the different aspects of plant biology especially in model species and plants with small genome. Its downstream usefulness in linkage map construction, genetic diversity analyses, association mapping, and marker—assisted selection has been demonstrated in several plants . However, sequencing of complete genomes cannot be regularly employed due to high cost and computational limitations in handling robust informatics data. With availability of complexity reduction strategies, sequencing of sub-genomic regions by on-array/in-solution target enrichment technology has provided an efficient alternate strategy to amplicon re-sequencing for SNP/ SNV discovery . In the present study, this strategy was implemented in re-sequencing ninety four genes across three Eucalypts species. This study has also revealed that target enrichment strategy can be successfully used for identification of markers (SNVs and InDels) for future use in QTL and association mapping studies in Eucalyptus species.
S1 Table. Primer pairs used for RT-qPCR to confirm enrichment of targeted genes.
S2 Table. Functional Annotation of selected genes across E. grandis genome sequence using Phytozome v10.
S3 Table. A, Details of SNVs documented in E. tereticornis across reference sequence.
B, Details of InDels documented in E. tereticornis across reference sequence.
S4 Table. A, Details of SNVs documented in E. camaldulensis across reference sequence.
B, Details of InDels documented in E. camaldulensis across reference sequence.
S5 Table. A, Details of SNVs documented in E. grandis across reference sequence.
B, Details of InDels documented in E. grandis across reference sequence.
S6 Table. Presence of SNVs in Pair-wise comparison across three Eucalyptus species.
S7 Table. A, Presence of InDels in Pair-wise comparison across E. grandis and E. tereticornis.
B, Presence of InDels in Pair-wise comparison across E. grandis and E. camaldulensis. C, Presence of InDels in Pair-wise comparison across E. camaldulensis and E. tereticornis.
The authors acknowledge Dr. V. Sivakumar and Shri D.R.S. Sekar, Scientists, Institute of Forest Genetics and Tree Breeding, Coimbatore, India for providing the plant material for the study. The authors are grateful to Genotypic Technologies Private Limited, Bangalore, India for array design, library construction and analysis of the data. MGD acknowledges the funding support by Department of Biotechnology, Government of India under the DBT-CREST Awardship. VD acknowledges the Department of Biotechnology, Government of India for research fellowship.
Conceived and designed the experiments: MGD KVK. Performed the experiments: MGD VD. Analyzed the data: MGD VD IA. Contributed reagents/materials/analysis tools: MGD IA KVK. Wrote the paper: MGD KVK.
- 1. Brooker MIH (2000) A new classiﬁcation of the genus Eucalyptus L’Her. (Myrtaceae). Aust Syst Bot 13: 79–148.
- 2. Doughty RW (2000) The Eucalyptus. A natural and commercial history of the gum tree. London, UK: Johns Hopkins University Press. 256 p.
- 3. Iglesias I, Wiltermann D (2009) Eucalyptologics Information Resources on Eucalypt Cultivation. GIT Forestry Consulting, Available: http://www.git-forestry.com. Accessed 2009 March 29.
- 4. Potts BM (2004) Genetic improvement of eucalypts. In: Burley J, Evans J, Youngquist JA, editors. Encyclopedia of forest science. Oxford, UK: Elsevier Science. pp. 1480–1490.
- 5. Kallarackal J, Somen CK (1997) An ecophysiological evaluation of the suitability of Eucalyptus grandis for planting in the tropics. For Ecol Manag 95:53–61. doi: 10.1016/S0378-1127(97)00004-2.
- 6. Kallarackal J, Somen CK, Rajesh N (2002) Studies on water use of six tropical eucalypt species in Kerala. In: Bagchi SK, Varghese M, Siddappa , editors. Recent Eucalypt Research in India. Coimbatore: Inst. Forest Genetics and Tree Breeding. pp. 94–115.
- 7. Grattapaglia D, O’ Malley D, Sederoff R (1992) Multiple applications of RAPD markers to genetic analysis in Eucalyptus sp. In: Resolving Tropical Forest Resources Concerns through Tree Improvement, Gene Conservation and Domestication of New Species. Proceedings of the IUFRO meeting, Cali, Colombia: 436–450.
- 8. Gaiotto FA, Grattapaglia D (1997) Estimation of genetic variability in a breeding population of Eucalyptus urophylla using AFLP (Amplified Fragment Length Polymorphism) markers. Silviculture and Improvement of Eucalyptus. Proc. IUFRO Conference on Salvador, Colombo EMBRAPA-CNPF. 2: 46–52.
- 9. Byrne M, Parrish TL, Moran GF (1998) Nuclear RFLP diversity in Eucalyptus nitens. Heredity 81: 225–233. doi: 10.1046/j.1365-2540.1998.00386.x.
- 10. Poltri SNM, Zelener N, Traverso JR, Gelid P, Hopp HE (2003) Selection of a seed orchard of Eucalyptus dunnii based on diversity criteria calculated using molecular markers. Tree Physiol 23: 625–632. doi: 10.1093/treephys/23.9.625. pmid:12750055
- 11. Balasaravanan T, Chezhian P, Kamalakannan R, Ghosh M, Yasodha R, et al. (2005) Determination of inter- and intra-species genetic relationships among six Eucalyptus species based on inter-simple sequence repeats (ISSR). Tree Physiol. 25: 1295–1302. doi: 10.1093/treephys/25.10.1295. pmid:16076778
- 12. Muro-Abad JI, Rocha RB, Cruz CD, Araujo EFD (2005) Obtainment of Eucalyptus spp. hybrids aided by molecular markers-SSR analysis. Scientia Forestalis 67: 53–63.
- 13. Chezhian P, Yasodha R, Ghosh M (2010) Genetic diversity analysis in a seed orchard of Eucalyptus tereticornis. New Forests 40:85–99.
- 14. Sansaloni CP, Petroli CD, Carling J, Hudson CJ, Steane DA, et al. (2010) A high-density Diversity Arrays Technology (DArT) microarray for genome-wide genotyping in Eucalyptus. Plant Methods 6: 16. doi: 10.1186/1746-4811-6-16. pmid:20587069
- 15. Arumugasundaram S, Ghosh M, Veerasamy S, Ramasamy Y (2011) Species discrimination, population structure and linkage disequilibrium in Eucalyptus camaldulensis and Eucalyptus tereticornis using SSR markers. PLoS ONE 6(12): e28252. doi: 10.1371/journal.pone.0028252. pmid:22163287
- 16. Kulheim C, Yeoh SH, Maintz J, Foley WJ, Moran GF (2009) Comparative SNP diversity among four Eucalyptus species for genes from secondary metabolite biosynthetic pathways. BMC Genomics 10: 1–11. doi: 10.1186/1471-2164-10-1. pmid:19121221
- 17. Grattapaglia D, Sederoff R (1994) Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics 137:1121–1137. pmid:7982566
- 18. Verhaegen D, Plomion C (1996) Genetic mapping in Eucalyptus urophylla and Eucalyptus grandis using RAPD markers. Genome 39:1051–1061. doi: 10.1139/g96-132. pmid:18469954
- 19. Marques CM, Araujo JA, Ferreira JG, Whetten R, O’Malley DM, et al. (1998) AFLP genetic maps of Eucalyptus globulus and E. tereticornis. Theor Appl Genet 96:727–737. doi: 10.1007/s001220050795.
- 20. Thamarus K, Groom K, Murrell J, Byrne M, Moran G (2002) A genetic linkage map for Eucalyptus globulus with candidate loci for wood, fibre and floral traits. Theor Appl Genet 104:379–387. doi: 10.1007/s001220100717. pmid:12582710
- 21. Myburg AA, Griffin RA, Sederoff RR, Whetten RW (2003) Comparative genetic linkage maps of Eucalyptus grandis, Eucalyptus globulus and their F1 hybrid based on a double pseudo-backcross mapping approach. Theor Appl Genet 107:1028–1042. doi: 10.1007/s00122-003-1347-4. pmid:12838392
- 22. Verhaegen D, Plomion C, Gion JM, Poitel M, Costa P, et al. (1997) Quantitative trait dissection analysis in Eucalyptus using RAPD markers.1. Detection of QTL in interspecific hybrid progeny, stability of QTL expression across different ages. Theor Appl Genet 95: 597–608. doi: 10.1007/s001220050601. pmid:12838392
- 23. Marques CM, Vasquez JK, Carocha VJ, Ferreira JG, O’Malley DM, et al. (1999) Genetic dissection of vegetative propagation traits in Eucalyptus tereticornis and E. globulus. Theor Appl Genet 99: 936–946. doi: 10.1007/s001220051400.
- 24. Shepherd M, Chaparro JX, Teasdale R (1999) Genetic mapping of monoterpene composition in an interspecific eucalypt hybrid. Theor Appl Genet 99: 1207–1215. doi: 10.1007/s001220051326.
- 25. Junghans D, Alfenas AC, Brommonschenkel SH, Oda S, Mello EJ, et al. (2003) Resistance to rust in Eucalyptus: mode of inheritance and mapping of a major gene with RAPD markers. Theor Appl Genet 108: 175–180. doi: 10.1007/s00122-003-1415-9. pmid:14504745
- 26. Kirst M, Myburg AA, De Leon JPG, Kirst ME, Scott J, et al (2004) Coordinated genetic regulation of growth and lignin revealed by Quantitative Trait Locus analysis of cDNA microarray data in an interspecific backcross of Eucalyptus. Plant Physiol 135: 2368–2378 doi: 10.1104/pp.103.037960. pmid:15299141
- 27. Teixeira J, Missiaggia A, Dias D, Scarpinati E, Viana J, et al. (2011) QTL analyses of drought tolerance in Eucalyptus under two contrasting water regimes. BMC Proc. 5 (Suppl 7): P40. doi: 10.1186/1753-6561-5-s7-p40.
- 28. Grattapaglia D, Vaillancourt RE, Shepherd M, Thumma BR, Foley W, et al. (2012) Progress in Myrtaceae genetics and genomics: Eucalyptus as the pivotal genus. Tree Genetics & Genomes 8: 463–508. doi: 10.1007/s11295-012-0491-x.
- 29. Thumma BR, Nolan MR, Evans R, Moran GF (2005) Polymorphisms in cinnamoyl CoA reductase (CCR) are associated with variation in microfibril angle in Eucalyptus spp. Genetics 171:1257–1265. doi: 10.1534/genetics.105.042028. pmid:16085705
- 30. Southerton SG, MacMillan CP, Bell JC, Bhuiyan N, Dowries G, et al. (2010) Association of allelic variation in xylem genes with wood properties in Eucalyptus nitens. Australian Forestry 73: 259–264. doi: 10.1080/00049158.2010.10676337.
- 31. Kulheim C, Yeoh SH, Wallis IR, Laffan S, Moran GF, et al. (2011) The molecular basis of quantitative variation in foliar secondary metabolites in Eucalyptus globulus. New Phytologist 191: 1041–53. doi: 10.1111/j.1469-8137.2011.03769.x. pmid:21609332
- 32. Resende MDV, Resende MFR, Sansaloni CP, Petroli CD, Missiaggia AA, et al. (2012) Genomic selection for growth and wood quality in Eucalyptus: capturing the missing heritability and accelerating breeding for complex traits in forest trees. New Phytol 194: 116–128. doi: 10.1111/j.1469-8137.2011.04038.x. pmid:22309312
- 33. Rengel D, Clemente HS, Servant F, Ladouce N, Paux E, et al. (2009) A new genomic resource dedicated to wood formation in Eucalyptus. BMC Plant Biology 9: 36. doi: 10.1186/1471-2229-9-36. pmid:19327132
- 34. Keller G, Marchal T, San Clemente H, Navarro M, Ladouce N, et al. (2009) Development and functional annotation of an 11,303-EST collection from Eucalyptus for studies of cold tolerance. Tree Genetics & Genomes 5: 317–327. doi: 10.1186/1471-2229-9-36. pmid:19327132
- 35. Novaes E, Drost DR, Farmerie WG, Pappas GJ, Grattapaglia D, et al. (2008) High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics 9:312–326. doi: 10.1186/1471-2164-9-312. pmid:18590545
- 36. Rasmussen-Poblete S, Valdes J, Gamboa MC, Valenzuela PDT, Krauskopf E (2008) Generation and analysis of an Eucalyptus globulus cDNA library constructed from seedlings subjected to low temperature conditions. Electronic Journal of Biotechnology 11: p1
- 37. Paux E, Carocha V, Marques C, Mendes de Sousa A, Borralho N, et al. (2005) Transcript profiling of Eucalyptus xylem genes during tension wood formation. New Phytol 167: 89–100. doi: 10.1111/j.1469-8137.2005.01396.x. pmid:15948833
- 38. Mizrachi E, Hefer C, Ranik M, Joubert F, Myburg AA (2010) De novo assembled expressed gene catalog of a fast-growing Eucalyptus plantation tree produced by Illumina mRNA-Seq. BMC Genomics 11:681. doi: 10.1186/1471-2164-11-681. pmid:21122097
- 39. Paiva JA, Prat E, Vautrin S, Santos MD, San-Clemente H, et al. (2011) Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep- coverage BAC libraries. BMC Genomics 12:137. doi: 10.1186/1471-2164-12-137. pmid:21375742
- 40. Villar E, Klopp C, Noirot C, Novaes E, Kirst M, et al. (2011) RNA-Seq reveals genotype-specific molecular responses to water deficit in Eucalyptus. BMC Genomics 12:538. doi: 10.1186/1471-2164-12-538. pmid:22047139
- 41. Thumma BR, Sharma N, Southerton SG (2012) Transcriptome sequencing of Eucalyptus camaldulensis seedlings subjected to water stress reveals functional single nucleotide polymorphisms and genes under selection. BMC Genomics 13:364. doi: 10.1186/1471-2164-13-364. pmid:22853646
- 42. Klocko AL, Vining K, Amarasinghe V, Romanel E, Alves-Ferreira M, et al. (2013) Floral transcriptome of Eucalyptus grandis. In: Proceedings of Plant and Animal Genome XXI, held from January 11–16, 2013 at San Diego, CA.
- 43. Hefer C, Mizrachi E, Joubert F, Myburg A (2011) The Eucalyptus genome integrative explorer (EucGenIE): a resource for Eucalyptus genomics and transcriptomics. BMC Proc. 5 (Suppl 7): O49. doi: 10.1186/1753-6561-5-s7-o49.
- 44. Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, et al. (2014) The genome of Eucalyptus grandis. Nature doi: 10.1038/nature13308. pmid:24919147
- 45. Hirakawa H, Nakamura Y, Kaneko T, Isobe S, Sakai H, et al. (2011) Survey of the genetic information carried in the genome of Eucalyptus camaldulensis. Plant Biotechnology 28: 471–480. doi: 10.1038/nature13308. pmid:24919147
- 46. Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24: 133–141. doi: 10.1016/j.tig.2007.12.007. pmid:18262675
- 47. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26: 1135–1145. doi: 10.1038/nbt1486. pmid:18846087
- 48. Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, et al. (2010) Target-enrichment strategies for next-generation sequencing. Nat Methods 7:111–118. doi: 10.1038/nmeth.1419. pmid:20111037
- 49. Hodges E, Rooks M, Xuan Z, Bhattacharjee A, Benjamin Gordon D, et al. (2009) Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing. Nat Protocols 4: 960–974. doi: 10.1038/nprot.2009.68. pmid:19478811
- 50. Livak KJ, Schmittgen TD (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2∆∆ C(T) Method. Methods 25: 402–408. doi: 10.1006/meth.2001.1262. pmid:11846609
- 51. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nature Methods 9: 357–359. doi: 10.1038/nmeth.1923. pmid:22388286
- 52. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25. doi: 10.1186/gb-2009-10-3-r25. pmid:19261174
- 53. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. doi: 10.1093/bioinformatics/btp352. pmid:19505943
- 54. Hertzberg M, Aspeborg H, Schrader J, Andersson A, Erlandsson R, et al. (2001) A transcriptional roadmap to wood formation. Proc Natl Acad Sci USA 98: 14732–14737. doi: 10.1073/pnas.261293398. pmid:11724959
- 55. Harakava R (2005) Genes encoding enzymes of the lignin biosynthesis pathway in Eucalyptus. Genet Mol Biol 28: 601–607. doi: 10.1590/S1415-47572005000400015.
- 56. Shi R, Sun YH, Li Q, Heber S, Sederoff R, et al. (2010) Towards a systems approach for lignin biosynthesis in Populus trichocarpa: transcript abundance and specificity of the monolignol biosynthetic genes. Plant Cell Physiol 51:144–163. doi: 10.1093/pcp/pcp175. pmid:19996151
- 57. Yang X, Ye CY, Bisaria A, Tuskan GA, Kalluri UC (2011) Identification of candidate genes in Arabidopsis and Populus cell wall biosynthesis using text-mining, co-expression network analysis and comparative genomics. Plant Sci 181: 675–687. doi: 10.1016/j.plantsci.2011.01.020. pmid:21958710
- 58. Ruprecht C, Mutwil M, Saxe F, Eder M, Nikoloski Z, et al. (2011) Large-scale co-expression approach to dissect secondary cell wall formation across plant species. Front Plant Sci 2: 23. doi: 10.3389/fpls.2011.00023. pmid:22639584
- 59. Wong MM, Cannon CH, Wickneswari R (2011) Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing. BMC Genomics 12:342. doi: 10.1186/1471-2164-12-342. pmid:21729267
- 60. Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, et al. (2003) Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana. Proc Natl Acad Sci USA 100: 7383–7388. doi: 10.1073/pnas.1132171100. pmid:12771380
- 61. Park S, Oh S, Han KH (2004) Large-scale computational analysis of poplar ESTs reveals the repertoire and unique features of expressed genes in the poplar genome. Mol Breed 14: 429–440. doi: 10.1007/s11032-005-0603-5.
- 62. Sterky F, Regan S, Karlsson J, Hertzberg M, Rohde A, et al. (1998) Gene discovery in the wood-forming tissues of poplar: Analysis of 5,692 expressed sequence tags. Proc Natl Acad Sci USA 95:13,330–13,335. doi: 10.1073/pnas.95.22.13330. pmid:9789088
- 63. Lenhard M, Jürgens G, Laux T (2002) The WUSCHEL and SHOOTMERISTEMLESS genes fulfill complementary roles in Arabidopsis shoot meristem regulation. Development 129: 3195–3206. pmid:12070094
- 64. Cseke LJ, Zheng J, Podila GK (2003) Characterization of PTM5 in aspen trees: a MADS-box gene expressed during woody vascular development. Gene 318: 55–67. doi: 10.1016/s0378-1119(03)00765-0. pmid:14585498
- 65. Kubo M, Udagawa M, Nishikubo N, Horiguchi G, Yamaguchi M, et al. (2005) Transcription switches for protoxylem and metaxylem vessel formation. Genes Dev 19: 1855–1860. doi: 10.1101/gad.1331305. pmid:16103214
- 66. Zhong R, Demura T, Ye ZH (2006) SND1, a NAC domain transcription factor, is a key regulator of secondary wall synthesis in fibers of Arabidopsis. Plant Cell 18: 3158–3170. doi: 10.1105/tpc.106.047399. pmid:17114348
- 67. Hu R, Qi G, Kong Y, Kong D, Gao Q, et al. (2010) Comprehensive analysis of NAC domain transcription factor gene family in Populus trichocarpa. BMC Plant Biology 10:145 doi: 10.1186/1471-2229-10-145. pmid:20630103
- 68. Ohashi-Ito K, Fukuda H (2010) Transcriptional regulation of vascular cell fates. Curr Opin Plant Biol 13: 670–676. doi: 10.1016/j.pbi.2010.08.011. pmid:20869293
- 69. Jensen JK, Kim H, Cocuron JC, Orler R, Ralph J, et al. (2011) The DUF579 domain containing proteins IRX15 and IRX15-L affect xylan synthesis in Arabidopsis. Plant J 66: 387–400. doi: 10.1111/j.1365-313X.2010.04475.x. pmid:21288268
- 70. Li E, Bhargava A, Qiang W, Friedmann MC, Forneris N, et al. (2012) The Class II KNOX gene KNAT7 negatively regulates secondary wall formation in Arabidopsis and is functionally conserved in Populus. New Phytol 194: 102–115. doi: 10.1111/j.1469-8137.2011.04016.x. pmid:22236040
- 71. Andersson-Gunnerås S, Hellgren JM, Björklund S, Regan S, Moritz T, et al. (2003) Asymmetric expression of a poplar ACC oxidase controls ethylene production during gravitational induction of tension wood. Plant J 34: 339–349. doi: 10.1046/j.1365-313x.2003.01727.x. pmid:12713540
- 72. Kalluri UC, DiFazio SP, Brunner AM, Tuskan GA (2007) Genome-wide analysis of Aux/IAA and ARF gene families in Populus trichocarpa. BMC Plant Biology 7:59 doi: 10.1186/1471-2229-7-59. pmid:17986329
- 73. Nieminen K, Immanen J, Laxell M, Kauppinen L, Tarkowski P, et al. (2008) Cytokinin signaling regulates cambial development in poplar. Proc Natl Acad Sci USA 105: 20032–20037. doi: 10.1073/pnas.0805617106. pmid:19064928
- 74. Mauriat M, Moritz T (2009) Analyses of GA20ox- and GID1-over-expressing aspen suggest that gibberellins play two distinct roles in wood formation. Plant J 58: 989–1003. doi: 10.1111/j.1365-313X.2009.03836.x. pmid:19228336
- 75. Harismendy O, Ng PC, Strausberg RL, Wang X, Stockwell TB, et al. (2009) Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol 10: R32 doi: 10.1186/gb-2009-10-3-r32. pmid:19327155
- 76. Paszkiewicz K, Studholme DJ (2012) High-throughput sequencing data analysis software: current state and future developments. In:Rodriguez-Ezpeleta N, Hackenberg M, Aransay AM, editors. Bioinformatics for high throughput sequencing. New York: Springer Science. pp. 231–248.
- 77. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59. doi: 10.1038/nature07517. pmid:18987734
- 78. Hendre PS, Kamalakannan R, Varghese M (2012) High-throughput and parallel SNP discovery in selected candidate genes in Eucalyptus camaldulensis using Illumina NGS platform. Plant Biotechnol J 10: 646–656. doi: 10.1111/j.1467-7652.2012.00699.x. pmid:22607345
- 79. Schneeberger K, Weigel D (2011) Fast-forward genetics enabled by new sequencing technologies. Trends Plant Sci. 16: 282–288. doi: 10.1016/j.tplants.2011.02.006. pmid:21439889
- 80. Winfield MO, Wilkinson PA, Allen AM, Barker GL, Coghill JA, et al. (2012) Targeted re-sequencing of the allohexaploid wheat exome. Plant Biotechnol J 10:733–742. doi: 10.1111/j.1467-7652.2012.00713.x. pmid:22703335
- 81. Trick M, Adamski NM, Mugford SG, Jiang CC, Febrer M, et al. (2012) Combining SNP discovery from next-generation sequencing data with bulked segregant analysis (BSA) to fine-map genes in polyploid wheat. BMC Plant Biology 12:14. doi: 10.1186/1471-2229-12-14. pmid:22280551
- 82. Saintenac C, Jiang D, Akhunov ED (2011) Targeted analysis of nucleotide and copy number variation by exon capture in allotetraploid wheat genome. Genome Biology 12: R88. doi: 10.1186/gb-2011-12-9-r88. pmid:21917144
- 83. Kaya HB, Cetin O, Kaya H, Sahin M, Sefer F, et al. (2013) SNP Discovery by Illumina-based transcriptome sequencing of the olive and the genetic characterization of Turkish olive genotypes revealed by AFLP, SSR and SNP markers. PLoS ONE 8: e73674. doi: 10.1371/journal.pone.0073674. pmid:24058483
- 84. Jupe F, Witek K, Verweij W, Śliwka J, Pritchard L, et al. (2013) Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations. Plant J 76: 530–544. doi: 10.1111/tpj.12307. pmid:23937694
- 85. Howe GT, Yu J, Knaus B, Cronn R, Kolpak S, et al. (2013) A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation. BMC Genomics 14:137 doi: 10.1186/1471-2164-14-137. pmid:23445355
- 86. Wu X, Ren C, Joshi T, Vuong T, Xu D, et al. (2010) SNP discovery by high-throughput sequencing in soybean. BMC Genomics 11:469 doi: 10.1186/1471-2164-11-469. pmid:20701770
- 87. Hyten DL, Cannon SB, Song Q, Weeks N, Fickus EW, et al. (2010) High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence. BMC Genomics 11:38 doi: 10.1186/1471-2164-11-38. pmid:20078886
- 88. Chagné D, Crowhurst RN, Troggio M, Davey MW, Gilmore B, et al. (2012) Genome-Wide SNP detection, validation, and development of an 8K SNP array for Apple. PLoS ONE 7(2): e31745. doi: 10.1371/journal.pone.0031745. pmid:22363718
- 89. Neves LG, Davis JM, Barbazuk WB, Kirst M (2014) A High-Density gene map of Loblolly Pine (Pinus taeda L.) based on exome sequence capture genotyping. G3 (Bethesda) 4: 29–37. doi: 10.1534/g3.113.008714. pmid:24192835
- 90. Kenny EM, Cormican P, Gilks WP, Gates AS, O’Dushlaine CT, et al. (2011) Multiplex target enrichment using DNA indexing for ultra-high throughput SNP detection. DNA Res 18: 31–38. doi: 10.1093/dnares/dsq029. pmid:21163834
- 91. Mokry M, Feitsma H, Nijman IJ, de Bruijn E, van der Zaag PJ, et al. (2010) Accurate SNP and mutation detection by targeted custom microarray-based genomic enrichment of short-fragment sequencing libraries. Nucleic Acids Res 38:e116. doi: 10.1093/nar/gkq072. pmid:20164091
- 92. Tennessen JA, Govindarajulu R, Liston A, Ashman TL (2013) Targeted sequence capture provides insight into genome structure and genetics of male sterility in a gynodioecious diploid strawberry, Fragaria vesca ssp. bracteata (Rosaceae). G3 (Bethesda) 3:1341–1351. doi: 10.1534/g3.113.006288. pmid:23749450
- 93. Zhou L, Holliday JA (2012) Targeted enrichment of the black cottonwood (Populus trichocarpa) gene space using sequence capture. BMC Genomics 13:703. doi: 10.1186/1471-2164-13-703. pmid:23241106
- 94. Grattapaglia D, Silva-Junior OB, Kirst M, de Lima BM, Faria DA, et al. (2011) High-throughput SNP genotyping in the highly heterozygous genome of Eucalyptus: assay success, polymorphism and transferability across species. BMC Plant Biol 11:65. doi: 10.1186/1471-2229-11-65. pmid:21492434
- 95. Allen AM, Barker GL, Berry ST, Coghill JA, Gwilliam R, et al. (2011) Transcript-specific, single-nucleotide polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.). Plant Biotechnol J 9:1086–1099. doi: 10.1111/j.1467-7652.2011.00628.x. pmid:21627760
- 96. Feltus AF, Wan J, Schulze SR, Estill JC, Jiang N, et al. (2004) An SNP resource for rice genetics and breeding based on subspecies Indica and Japonica genome alignments. Genome Res 14: 1812–1819. doi: 10.1101/gr.2479404. pmid:15342564
- 97. Waugh R, Jannink JL, Muehlbauer GJ, Ramsay L (2009) The emergence of whole genome association scans in barley. Curr Opin Plant Biol 12:218–222. doi: 10.1016/j.pbi.2008.12.007. pmid:19185530
- 98. Byers RL, Harker DB, Yourstone SM, Maughan PJ, Udall JA (2012) Development and mapping of SNP assays in allotetraploid cotton. Theor Appl Genet 124: 1201–1214. doi: 10.1007/s00122-011-1780-8. pmid:22252442
- 99. Hamilton JP, Hansey CN, Whitty BR, Stoffel K, Massa AN, et al. (2011) Single nucleotide polymorphism discovery in elite north American potato germplasm. BMC Genomics 12:302. doi: 10.1186/1471-2164-12-302. pmid:21658273
- 100. Zhang X, Borevitz JO (2009) Global analysis of allele-speciﬁc expression in Arabidopsis thaliana. Genetics 182: 943–954. doi: 10.1534/genetics.109.103499. pmid:19474198
- 101. Jones E, Chu WC, Ayele M, Ho J, Bruggeman E, et al. (2009) Development of single nucleotide polymorphism (SNP) markers for use in commercial maize (Zea mays L.) germplasm. Molecular Breeding 24: 165–176. doi: 10.1007/s11032-009-9281-z.
- 102. Muchero W, Diop N, Bhat P, Fenton R, Wanamaker S, et al. (2009) A consensus genetic map of cowpea [Vigna unguiculata (L) Walp.] and synteny based on EST-derived SNPs. Proc Natl Acad Sci USA 106:18159–18164. doi: 10.1073/pnas.0905886106. pmid:19826088
- 103. Pavy N, Pelgas B, Beauseigle SP, Blais S, Gagnon F, et al. (2008) Enhancing genetic mapping of complex genomes through the design of highly-multiplexed SNP arrays: application to the large and unsequenced genomes of white spruce and black spruce. BMC genomics 9(1):21. doi: 10.1186/1471-2164-9-21. pmid:18205909
- 104. Eckert A, Pande B, Ersoz E, Wright M, Rashbrook V, et al. (2009) High throughput genotyping and mapping of single nucleotide polymorphisms in loblolly pine (Pinus taeda L.). Tree Genetics & Genomes 5: 225–234. doi: 10.1007/s11295-008-0183-8.
- 105. Wegrzyn JL, Eckert AJ, Choi M, Lee JM, Stanton BJ, et al. (2010) Association genetics of traits controlling lignin and cellulose biosynthesis in black cottonwood (Populus trichocarpa, Salicaceae) secondary xylem. New Phytol 188: 515–532. doi: 10.1111/j.1469-8137.2010.03415.x. pmid:20831625
- 106. Lima BM, Silva-Junior OB, Faria DA, Mamani EMC, Pappas GJ, et al. (2011) Assessment of SNPs for linkage mapping in Eucalyptus: construction of a consensus SNP/microsatellite map from two unrelated pedigrees. BMC Proc. 5 (Suppl 7): P31. doi: 10.1186/1753-6561-5-s7-p31.
- 107. Sexton T (2011) Candidate gene SNP discovery, genotyping and association with wood quality traits in Eucalyptus pilularis (blackbutt). PhD thesis, Southern Cross University, Lismore, NSW. Available: http://epubs.scu.edu.au/theses/285. Accessed 2014 June 30.
- 108. Thavamanikumar S, McManus , L J, Tibbits JFG, Bossinger G (2011) The significance of Single Nucleotide Polymorphisms (SNPS) in ‘Eucalyptus globulus’ Breeding Programs. Australian Forestry 74: 23–29. doi: 10.1080/00049158.2011.10676342.
- 109. Singh P, Mizrachi E, Myburg Z (2014) Genetic load and allelic imbalance estimated in Eucalyptus hybrids using RNAseq. Proc Plant & Animal Genome XXII, held on January 10–15, 2014 at San Diego, CA.
- 110. Dantec LL, Chagne D, Pot D, Cantin O, Garnier-Gere P, et al. (2004) Automated SNP detection in expressed sequence tags: statistical considerations and application to maritime pine sequences. Plant Mol Biol 54: 461–470. doi: 10.1023/B:PLAN.0000036376.11710.6f. pmid:15284499
- 111. Lijavetzky D, Cabezas JA, Iba´n˜ez A, Rodrı´guez V, Martı´nez-Zapater JM (2007) High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing approach and SNPlex technology. BMC Genomics 8: 424. doi: 10.1186/1471-2164-8-424. pmid:18021442
- 112. Ching A, Caldwell KS, Jung M, Dolan M, Smith OS, et al. (2002) SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC Genet 3: 19. doi: 10.1186/1471-2156-3-19. pmid:12366868
- 113. Li Y, Haseneyer G, Schon C-C, Ankerst D, Korzun V, et al. (2011) High levels of nucleotide diversity and fast decline of linkage disequilibrium in rye (Secale cereale L.) genes involved in frost response. BMC Plant Biol 11: 6. doi: 10.1186/1471-2229-11-6. pmid:21219606
- 114. Britten RJ, Rowen L, Williams J, Cameron RA (2003) Majority of divergence between closely related DNA samples is due to indels. Proc Natl Acad Sci USA 100: 4661–4665. doi: 10.1073/pnas.0330964100. pmid:12672966
- 115. Singh TR, Gupta A, Riju A, Mahalaxmi M, Seal A, et al. (2011) Computational identification and analysis of single nucleotide polymorphisms and insertions/deletions in expressed sequence tag data of Eucalyptus. J Genet 90: e34–e38. pmid:21873771
- 116. Shao H, Bellos E, Yin H, Liu X, Zou J, et al. (2013) A population model for genotyping indels from next-generation sequence data. Nucleic Acids Res 41(3):e46. doi: 10.1093/nar/gks1143. pmid:23221639
- 117. Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, et al. (2006) An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res 16(9):1182–1190. doi: 10.1101/gr.4565806. pmid:16902084
- 118. Ometto L, Stephan W, De Lorenzo D (2005) Insertion/deletion and nucleotide polymorphism data reveal constraints in Drosophila melanogaster introns and intergenic regions. Genetics 169:1521–1527. pmid:15654088
- 119. Brandström M, Ellegren H (2007) The genomic landscape of short insertion and deletion polymorphisms in the chicken (Gallus gallus) genome: a high frequency of deletions in tandem duplicates. Genetics 176:1691–1701. doi: 10.1534/genetics.107.070805. pmid:17507681
- 120. Hayashi K, Yoshida H, Ashikawa I (2006) Development of PCR-based allele-specific and InDel marker sets for nine rice blast resistance genes. Theor Appl Genet 113: 251–260. doi: 10.1007/s00122-006-0290-6. pmid:16791691
- 121. Salathia N, Lee HN, Sangster TA, Morneau K, Landry CR, et al. (2007) Indel arrays: an affordable alternative for genotyping. Plant J 51: 727–737. doi: 10.1111/j.1365-313x.2007.03194.x. pmid:17645438
- 122. Ollitrault F, Terol J, Martin AA, Pina JA, Navarro L, et al. (2012) Development of indel markers from Citrus clementina (Rutaceae) BAC-end sequences and interspecific transferability in Citrus. Am J Bot 99: E268–E273. doi: 10.3732/ajb.1100569. pmid:22733984
- 123. Moghaddam SM, Song Q, Mamidi S, Schmutz J, Lee R, et al. (2014) Developing market class specific InDel markers from next generation sequence data in Phaseolus vulgaris L. Front Plant Sci 5:185. doi: 10.3389/fpls.2014.00185. pmid:24860578
- 124. Perdereau AC, Douglas GC, Hodkinson TR, Kelleher CT (2013) High levels of variation in Salix lignocellulose genes revealed using poplar genomic resources. Biotechnol Biofuels 6(1):114. doi: 10.1186/1754-6834-6-114. pmid:23924375
- 125. Meirmans PG, Lamothe M, Pierre P, Nathalie I (2007) Species-specific single nucleotide polymorphism markers for detecting hybridization and introgression in poplar. Canadian Journal of Botany 85: 1082–1091. doi: 10.1139/b07-069.
- 126. Chu Y, Huang Q, Zhang B, Ding C, Su X (2014) Expression and molecular evolution of two DREB1 genes in black poplar (Populus nigra). PLoS One 9(6):e98334. doi: 10.1371/journal.pone.0098334. pmid:24887081
- 127. Schroeder H, Höltken A, Fladung M (2011) Chloroplast SNP-marker as powerful tool for differentiation of Populus species in reliable poplar breeding and barcoding approaches. BMC Proc. 5(Suppl 7): P56. doi: 10.1186/1753-6561-5-s7-p56.
- 128. Kong F, Wang X, Chen Y, Bian A, Xu J, et al. (2013) Analyzing the nucleotide variations within the Expressed Sequence Tags of loblolly Pine (Pinus taeda). J Plant Biochem Physiol 1:2.
- 129. Kumar S, Banks TW, Cloutier S (2012) SNP Discovery through Next-Generation Sequencing and its applications. International Journal of Plant Genomics. doi: 10.1155/2012/831460. pmid:23227038
- 130. Nordborg M, Weigel D (2008) Next-generation genetics in plants. Nature 456: 720–723. doi: 10.1038/nature07629. pmid:19079047