A crowdsourced set of curated structural variants for the human genome

doi:10.1371/journal.pcbi.1007933

Fig 1.

SVCurator web application interface.

A) 13 images with reads aligned to reference or alternate alleles, as well as dotplot images from svviz2 for several technologies were also visible in addition to the IGV image shown here. B) Curators were asked whether a variant exists within ±20% of the predicted size, its genotype, and their confidence in the curated genotype. C) Example svviz2 haplotype-partitioned image with haplotype 1 reads aligned to the reference and haplotype 2 reads aligned to the alternate allele for a heterozygous 307 bp insertion.

More »

Expand

Fig 2.

Events displayed in SVCurator randomly sampled from the GIAB union callset in 7 different size ranges.

A) 579 deletions and B) 716 insertions.

More »

Expand

Fig 3.

Responses from curators who evaluated over 648 events.

A) Distribution of the number of events evaluated after filtering survey responses. B) Concordance of responses from curators that evaluated over 648 events with expert consensus genotype labels. Plot key: Density plot (blue) represents the overall distribution of the data. Box plot (center of density plot): the thick bar represents the interquartile range of the data; the thin lines extending from the center bar represents the upper and lower adjacent values (i.e., Lower Limit: Q1 − 1.5 × IQR. Upper Limit: Q3 + 1.5 × IQR) in the data. White dot: represents the data median.

More »

Expand

Fig 4.

Schematic summarizing how SVCurator responses were processed to determine the final label for each event.

A) Data Collection and Data Cleaning: Curators evaluated the 1295 events within SVCurator. After removing events that received a low confidence score for genotype assigned and an ‘unsure’ response for whether an event exists at a particular site, 1273 event remained for analysis. B) Screen Curator Responses: To determine the curator responses that were used to find final labels for the SVCurator events, first consensus labels assigned by 7 ‘expert’ curators were determined. These 7 ‘expert’ curators were members of the Genome in a Bottle (GIAB) analysis team. Of the 1273 events, 541 were assigned a consensus label by the ‘expert’ curators, where each event had 68% or greater concordance on the assigned label and 4 or more experts that agreed on the assigned label. Using a leave-one-out strategy, a percent concordance score was found for each ‘expert’ curator, and the two lowest percent concordance scores (90.9% and 77.7%) were used as a threshold for screening top curators. To find the top curators, labels assigned by each curator were compared to the 541 events and percent concordance with experts was found for each curator. Curators that had 90.9% or greater concordance and 77.7% or greater concordance were considered top curators and their responses were placed in two threshold groups. The responses for these curators were used to find final labels for the SVCurator events. C) Determine crowdsourced labeled data: There were 935 events that were assigned final labels by top curators. These events had at least 60% concordance amongst top curators and at least 3 top curators that agreed on the final label assigned.

More »

Expand

Fig 5.

Concordance evaluation of labels assigned to SVCurator calls by top curators.

A) Percent concordance amongst Threshold 1 top curators on assigned label. B) Fraction of top curators within Threshold 1 that agreed on the assigned label. C) Percent concordance amongst Threshold 2 top curators on assigned label. D) Fraction of Threshold 2 curators that agreed on the assigned label.Concordance_Percent: High (80% or more concordance); Medium (60–80% concordance); Low (60% or less concordance). Concordance_Count: High (5 or more curators agreed on the final label); Medium (3–4 curators agreed on the final label); Low (3 or fewer curators agreed on the final label assigned).

More »

Expand

Fig 6.

Comparison of deletions and insertions where SVCurator labels assigned by top curators were either concordant or discordant with v0.6 GIAB benchmark genotypes.

(A) The fraction of calls with high, medium, and low percent concordance among top curators. (B) The fraction of calls with high, medium, and low counts of top curators agreeing on the assigned label.

More »

Expand

Fig 7.

A summary of the final crowdsourced SVCurator labels.

More »

Expand

Fig 8.

svviz2 genotypes support the 879 SVCurator crowdsourced labels.

A) A summary of the number of technologies whose svviz2 genotypes support the SVCurator genotype label. 92.2% of the events were supported by at least 2 technologies. B) A count of the number of genotypes from each technology that match the SVCurator crowdsourced labels. C) A summary of the number of technologies that had genotype scores supporting the crowdsourced label as summarized based on label and variant type; and, D) by size of the event.

More »

Expand

Table 1.

Summary of data used to generate images within SVCurator.

More »

Expand

Table 2.

Description of genotype label assignment based on responses to survey questions.

More »

Expand

Table 3.

Heuristics used to determine HG002 genotypes.

More »

Expand