Towards Plant Species Identification in Complex Samples: A Bioinformatics Pipeline for the Identification of Novel Nuclear Barcode Candidates

doi:10.1371/journal.pone.0147692

Fig 1.

Bioinformatics pipeline for the identification of DNA barcodes in the nuclear genome.

Scheme representing the flow of the different steps in the bioinformatics pipeline designed to process input sequences (top left) in order to output potential primer pairs amplifying novel DNA barcoding targets (bottom right). See text and Supporting Information for details.

More »

Expand

Table 1.

Primer pairs producing a unique amplicon in the highest number of species, based on the analysis of their genome sequences, using the input set from Reneker et al (2012).

The primers sequences can be found in Table C in S1 Text.

More »

Expand

Fig 2.

Analysis of the output produced by the pipeline using LIME sequences as an input.

(A) Distribution of the amplicon lengths produced by in silico PCR on plant genomes. The inset shows a sample agarose gel obtained with the primers on various samples (NT: no template, Ma: Maize, Mu: Muesli, S: Strawberry), with the main band the expected size (the * shows the 200bp marker on the DNA ladder). (B) Consensus of all the amplicon sequences of size 194bp, showing the higher conservation in the primer binding sites (purple) compared to the intervening region (blue). The primer consensuses, produced by Weblogo, are also shown. (C) Distribution of the amplicon lengths produced by in silico PCR on the whole ENA nucleotide database. (D) Number of species for which at least one unique amplicon can be produced by the 23579-aaa primer pair, shown to relative scale.

More »

Expand

Table 2.

Barcode analysis of a Zea mays sample following clustering of the NGS reads with the reference sequences at 100% identity.

The 21 clusters including at least one NGS read and at least one reference sequence are shown, ordered by the number of NGS reads. When reference reads from more than one species were found in the same cluster, the conclusion is shown as the highest taxonomy common to all the species.

More »

Expand

Fig 3.

Species detection in various samples using the 23579-aaa barcoding primers.

The tested samples were (A) Purchased maize (Zea mays) DNA. (B) Purchased soya (Glycine max) DNA. (C) A leaf from a rice (Oryza sativa) plant. (D) Fresh strawberries from the supermarket (Fragaria x ananassa). (E) A commercial pack of fruit and cereal muesli. The results are shown on a simplified taxonomy line, with the number of NGS reads assigned to that specific location as a blue circle whose area is proportional to the numerical value. The values in parenthesis represent the (number of clusters:total number of reads).

More »

Expand

Table 3.

Relative quantification in mixed maize:soy samples.

Purified DNA from the two species were mixed in the shown weight:weight ratio prior to barcode PCR and sequencing. The number of reads reported corresponds to the reference read selected based on the results of the individual samples (see text for details), corrected for the C values of both species.

More »

Expand