Skip to main content
Advertisement

< Back to Article

Transposon-insertion sequencing screens unveil requirements for EHEC growth and intestinal colonization

Fig 3

Schematic of analytic scheme to identify conditionally depleted genes.

The TIS data was analyzed with Con-ARTIST and CompTIS. The first step in both pipelines is correction for bottleneck effects to facilitate identification of attenuated mutants. To do this, simulation-based normalization is performed on the inoculum dataset to model the stochastic loss observed in the colon datasets. Next, relative abundance of mutants (as represented by abundance of reads at each TA site) in the normalized inoculum and colon dataset were compared to determine a mean fold-change across each gene. Next, genes with mean fold-change values that are consistent with a signature of attenuation in vivo were identified. With Con-ARTIST, genes are first categorized based on their phenotype in each animal replicate, and then a consensus is determined for the phenotype across all replicates. To achieve this, in each animal, genes are first filtered by the number of informative sites—the number of unique TA sites between the input and output that have transposon-insertions. Genes with less than 5 informative sites are classified as insufficient data (ID, black). Genes with sufficient data (≥ 5 TA sites disrupted) are classified based on a dual standard of attenuation (mean log2 fold change ≤ -2) and consistency (Mann Whitney U p-value <0.01) as either queried (Q, blue) or conditionally depleted (CD, red). To choose genes with consistent phenotypes across the replicates, genes were classified as CD if they were classified as CD in 5 or more replicates. CompTIS synthesizes data across all animal replicates to identify genes important for colonization. Genes are first filtered to remove those without fold-change information across all replicates (ID, black). Gene-level PCA is performed on genes with sufficient data, and genes are classified as Q or CD based on their glPC1 score. Genes with a score in the bottom 10% of the glPC1 score distribution are considered to be attenuated (CD). The output of this workflow is 4 groups of genes summarized beneath the flowchart.

Fig 3

doi: https://doi.org/10.1371/journal.ppat.1007652.g003