Skip to main content
Advertisement

< Back to Article

A crowdsourced set of curated structural variants for the human genome

Fig 4

Schematic summarizing how SVCurator responses were processed to determine the final label for each event.

A) Data Collection and Data Cleaning: Curators evaluated the 1295 events within SVCurator. After removing events that received a low confidence score for genotype assigned and an ‘unsure’ response for whether an event exists at a particular site, 1273 event remained for analysis. B) Screen Curator Responses: To determine the curator responses that were used to find final labels for the SVCurator events, first consensus labels assigned by 7 ‘expert’ curators were determined. These 7 ‘expert’ curators were members of the Genome in a Bottle (GIAB) analysis team. Of the 1273 events, 541 were assigned a consensus label by the ‘expert’ curators, where each event had 68% or greater concordance on the assigned label and 4 or more experts that agreed on the assigned label. Using a leave-one-out strategy, a percent concordance score was found for each ‘expert’ curator, and the two lowest percent concordance scores (90.9% and 77.7%) were used as a threshold for screening top curators. To find the top curators, labels assigned by each curator were compared to the 541 events and percent concordance with experts was found for each curator. Curators that had 90.9% or greater concordance and 77.7% or greater concordance were considered top curators and their responses were placed in two threshold groups. The responses for these curators were used to find final labels for the SVCurator events. C) Determine crowdsourced labeled data: There were 935 events that were assigned final labels by top curators. These events had at least 60% concordance amongst top curators and at least 3 top curators that agreed on the final label assigned.

Fig 4

doi: https://doi.org/10.1371/journal.pcbi.1007933.g004