A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis

doi:10.1371/journal.pone.0178005

Fig 1.

Illustration of locus definitions as defined by flanking sequence landmarks (FSLs).

The genomic sub-sequence between GRCh38/hg38 positions chr2:68,011,892 and chr2:68,012,004 includes the forensic STR locus D2S441 and SNP locus rs74640515. Flanking sub-sequences can be used as landmarks to delimit and define sub-segments of PCR amplicons for forensic analysis. FSLs of 10 nucleotides are indicated in underlined font. FSLs need not be unique in the genome, but only unique within the extent of the genome sequenced in an assay. Three separate locus definitions are illustrated: (a) the polymorphic repeat region of the STR locus D2S441, (b) The polymorphic SNP locus rs74640515, and (c) the haplotype sub-segment including both D2S441 and rs74640515. FSLs may be set equal to the PCR binding sites, making the delimited locus the entire PCR amplicon.

More »

Expand

Table 1.

Frequency spectrum of the exhaustive and mutually exclusive set of sequence types generated for simple and compound repeat loci D5S818 and D12S391, respectively, observed in sample T36814.

The number of individual reads (tokens) comprising each type is indicated by N, and counts of types one repeat motif shorter than either allele are highlighted bold font. For ease of reading, selected repeat motifs are bracketed with a number following the bracket indicating the number of tandem repeats.

More »

Expand

Fig 2.

Per-locus analytical threshold (AT) values calculated from the range of observed noise by each of two methods.

AT values by method A (blue boxes) are set at twice the range of observed per-locus noise intensities, where per-locus noise is defined as responses not attributable to alleles or molecular artifacts (stutter). AT values by method B (orange boxes) are set using the same formula, but where per-locus noise is defined as responses not attributable to authentic alleles or N-1 molecular artifacts. Boxes summarize data across 4 DNA samples, and the plot Y-axis is truncated at 100 read counts for purposes of readability.

More »

Expand

Fig 3.

Comparison of analytical threshold (AT) values calculated from noise with ATs calculated as a percentage of allele coverage (signal).

AT values by method A (blue boxes) are set at twice the range of observed per-locus noise intensities, where per-locus noise is defined as responses not attributable to alleles or molecular artifacts (stutter). AT values by method B (orange boxes) are set using the same formula, but where per-locus noise is defined as responses not attributable to alleles or N-1 molecular artifacts. The black line represents AT values set as a constant 1.5% of allele coverage. In all cases, AT values are converted to percentages of the average allele coverage on a per-locus basis.

More »

Expand

Fig 4.

Effect of read coverage on analytical thresholds (AT) calculated by three different methods.

Each data point represents a single instance of a sample-locus combination. AT values calculated as a fixed 1.5% percentage of total read coverage increase linearly with increasing read coverage (black diamonds) and are calculated for instances with a minimum of 650 total reads per Illumina ForenSeq protocol. This minimum corresponds to noise levels of 10 reads which for purposes of comparison has been used as the minimum AT value for all three methods. AT values based on background noise defined as the residual after removal of alleles and all stutter artifacts (Method A) are less sensitive to locus coverage (blue discs). An AT of 10 reads is generally sufficient for instances with coverages below 5,000 reads. AT values based on background noise defined as the residual after removal of alleles and N-1 stutter artifacts (Method B) trend upward with increasing locus coverage. Both Methods A and B are applied to instances with fewer than 650 reads.

More »

Expand