Cross-species regulatory sequence activity prediction

doi:10.1371/journal.pcbi.1008050

Cross-species regulatory sequence activity prediction

Fig 2

Training on human and mouse data improves generalization accuracy.

We trained three separate models with the same architecture on human data alone, mouse data alone, and both human and mouse data jointly. For each model, we computed the Pearson correlation of test set predictions and observed experimental data for thousands of datasets from various experiment types. Points in the scatter plots represent individual datasets, with single genome training accuracy on the x-axis and joint training accuracy on the y-axis. For CAGE, training on multiple genomes increases test set accuracy on nearly all datasets for both (a) human and (c) mouse. (b,d) For DNase/ATAC/ChIP-seq, test set accuracy improves by a smaller average margin. See S3 Fig for additional splits by assay and ChIP immunoprecipitation target.

doi: https://doi.org/10.1371/journal.pcbi.1008050.g002