Bacteria-Human Somatic Cell Lateral Gene Transfer Is Enriched in Cancer Samples

Figure 2

Cloud-based method for identifying putative LGT reads.

Sequencing files containing paired-end sequences were uploaded to a CloVR virtual machine on the DIAG. Complete bacterial genomes from RefSeq and the human genome reference hg19 were downloaded from a persistent data node in the DIAG. The sequencing queries were mapped to the two references using BWA. The mappings were processed using LGTSeek, which classifies reads based on their mapping profiles. All mappings except for human/human were downloaded to local storage at the completion of the analysis. Next, putative LGT reads were run through automated curation steps including a BLAST search against NT and PrinSEQ dereplication to remove PCR duplicates and low complexity filtering (Figure S1). These filtered reads were then loaded into a database and inspected manually through a custom graphical interface.

