Swarm: A federated cloud framework for large-scale variant analysis
Fig 2
Runtime and amount of data processed for computing allele frequency for an input set of rsIDs in BigQuery and Athena.
Average values and standard deviations were plotted. (A) depicts the average execution time in seconds. The light blue and light green bars represent configurations without any optimizations (i.e., the entire input data used as it was), and the dark blue and dark green bars represent configurations with optimizations (i.e., the input data was divided by partitioning or clustering); (B) shows the amount of data processed in megabytes, and the y-axis is logarithmic in scale. Significance differences between groups are indicated on top of the bars (two samples t-test). Note that for each rsID experiment, differences in runtimes between any BigQuery and Athena runs in (A) were highly significant (P < 1e-5), and for (B), differences within the BigQuery or Athena runs were also highly significant (P < 1e-5).