TSSpredator-Web: A web-application for transcription start site prediction and exploration

Mathias Witte Paz; Alexander Herbig; Kay Nieselt

doi:10.1371/journal.pone.0326299

Abstract

Background

With the rapid development of high-throughput RNA-seq technologies, the transcriptome of prokaryotes can now be studied in unprecedented detail. Transcription start site (TSS) identification provides critical insights into transcriptional regulation. Still, current command-line tools for the prediction of TSS remain challenging with respect to their usability and lack of integrated exploration features.

Results

We introduce TSSpredator-Web, an interactive web application that enhances the usability of the established yet unpublished tool, TSSpredator. TSSpredator-Web facilitates TSS prediction from non-enriched and enriched RNA-seq data, classifies TSS relative to annotated genes, and allows users to explore results through dynamic visualizations and interactive tables. For the visualizations, we provide an UpSet plot summarizing the TSS distribution across experiments or classes, and a genome viewer that integrates transcriptomic and genomic data, which contextualizes the insights of the TSS predictions. To illustrate the usage of TSSpredator-Web, we provide a use case with Cappable-seq data from Escherichia coli. TSSpredator-Web is available on the TueVis visualization web-server at https://tsspredator-tuevis.cs.uni-tuebingen.de/.

Conclusions

By combining user-friendly accessibility with interactive data exploration, TSSpredator-Web significantly facilitates genome-wide TSS analysis and interpretation in prokaryotes, empowering a broader range of researchers to generate biological insights from transcriptomic data.

Citation: Witte Paz M, Herbig A, Nieselt K (2026) TSSpredator-Web: A web-application for transcription start site prediction and exploration. PLoS One 21(3): e0326299. https://doi.org/10.1371/journal.pone.0326299

Editor: António Machado, Universidade dos Açores Departamento de Biologia: Universidade dos Acores Departamento de Biologia, PORTUGAL

Received: May 28, 2025; Accepted: February 15, 2026; Published: March 13, 2026

Copyright: © 2026 Witte Paz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The raw data used for the provided use case can be found under the GEO accession number GSE215300. The processed data to reproduce the provided use case can be accessed from the main page of TSSpredator-Web https://tsspredator-tuevis.cs.uni-tuebingen.de/. If furtherly needed, the processed data can be uploaded in a public repository.

Funding: MWP and KN are supported by infrastructural funding from the Cluster of Excellence EXC 2124 ‘Controlling Microbes to Fight Infections’ [project ID 390838134] from the DFG (Deutsche Forschungsgemeinschaft, German Research Foundation). We acknowledge support from the Open Access Publication Fund of the University of Tübingen. There was no additional external funding received for this study.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Due to the development of high-throughput RNA-seq technologies, the transcriptome of a prokaryotic organism can now be studied in unprecedented detail. Far beyond the quantification of known transcripts, the precision of the data allows for the detection of novel transcripts or the definition of transcript starting boundaries [1,2].

To clearly define the base-specific starting boundary of a transcript, researchers require the identification of transcription start sites (TSS). Their identification plays a crucial role in understanding prokaryotic transcriptional regulation, as they define the exact positions where the RNA polymerase initiates transcription as well as the location of the promoter region. Promoter regions contain essential regulatory elements such as binding sites for various sigma and transcription factors [3,4]. This facilitates the generation of insights on how gene expression is regulated under varying environmental conditions [5,6]. Beyond this, the base-specific TSS identification also defines untranslated regions (UTRs), which contain riboswitches and RNA thermometers [7] and helps to identify other regulatory instances in bacteria, such as non-coding RNAs (ncRNAs) or antisense RNAs [8]. Despite the critical role of TSS identification, accurately predicting these sites remains challenging. Most genome annotations only provide predicted translation start sites and coding regions. Moreover, relying solely on standard RNA-seq data for a base-exact genome-wide TSS prediction does not produce comprehensive results, as this protocol does not distinguish unprocessed, so-called primary transcripts from processed transcripts (i.e., transcripts with a degraded 5’-end). Reads originating from processed RNA transcripts will shift the coverage profile towards the 3’-direction and thus bias the exact TSS prediction [1,9]. To overcome these challenges, special library preparation protocols have been developed for the determination of genome-wide TSS maps.

In 2010, Sharma et al. presented differential RNA-seq (dRNA-seq), an experimental approach to enrich for reads originating from the 5’ ends of primary transcripts (i.e., transcripts containing a -triphosphate instead of a monophosphate) in prokaryotes [1]. This is achieved by treating the cDNA with a terminator exonuclease (TEX), which specifically degrades processed RNAs with a 5’-monophosphate. A more recent method for enrichment is Cappable-seq [2]. Instead of degrading processed RNAs, a vaccinia capping enzyme (VCE) is used to positively enrich primary RNAs with a 5’-triphosphate end. A more general approach is tagRNA-seq [10], which attaches specific tags to primary and processed RNA molecules. Besides the difference in methodology, all enriched libraries can be sequenced to produce enriched expression profiles, thereby increasing the sensitivity and specificity of TSS annotation strategies, making genome-wide TSS identification possible. Such a comprehensive view of transcriptional activity enables the identification of not only local regulatory elements but also global regulatory patterns throughout the organism, such as unannotated ncRNAs throughout a genome [11]. With the genome-wide TSS mapping, a high-resolution view of gene regulation is provided by identifying differences in promoter usage, identification of the length of 5’ untranslated regions (5’-UTRs), and the presence of novel transcripts across species, going beyond what the typical RNA-seq experiments offer, in terms of comparative transcriptomics [12]. However, as the scale of the data generated by these approaches grows, manual curation becomes impractical, necessitating the development of fully automated computational methods to process and analyze TSS data efficiently.

Various tools have been developed to automate the prediction of TSS from enriched RNA sequencing libraries. Here, we focus on one of the first methods developed for TSS prediction, TSSpredator [12]. Other tools, such as TSSAR [13] and TSSer [14], have been developed to identify TSS from standard and dRNA-seq data using statistical models based on the Poisson and Skellam distributions or Bayesian approximation based on the binomial distribution, respectively, to identify significantly enriched primary transcripts. A newer approach, ToNER [15], has been developed for the analysis of Cappable-seq data. Similarly to the previous methods, it relies on the computation of an enrichment score for each position and the identification of those that are statistically significant. Lastly, a support vector machine learning-based approach has also been explored to improve predictive accuracy [16]. Differently from all previous methods, TSSpredator follows a heuristic approach by mimicking the manual TSS annotation process originally described by Sharma et al. [1]. TSSpredator identifies peaks in expression across the genome by comparing the values to thresholds set by the user. Afterwards, enrichment scores are computed by comparing both RNA-seq libraries (enriched and standard) to identify transcription start sites, without relying on a statistical approach. The latest version of TSSpredator now includes support for three prominent experimental protocols (dRNA-seq [1], Cappable-seq [2], and tagRNA-seq [10]), alongside new features, such as the analysis of multi-contig assemblies. While neither TSSpredator nor its further developments have been released as a stand-alone publication, the tool has proven effective in various studies. For instance, it has been used to compare the transcriptome of one organism under different conditions (e.g., [6,17]), as well as cross-strain analyses to identify conserved and divergent transcriptional characteristics (e.g., [12,18]).

Still, given the complexity and volume of genome-wide TSS data, TSSpredator as well as the other command-line tools mentioned above, often fail to provide the efficient data exploration needed for meaningful interpretation and insight generation. Interactive web applications address this challenge by simplifying both the use of the tool and the exploration of results through user-friendly interfaces. By including dynamic and interactive representations of TSS distributions, promoter sequences, and expression patterns, such platforms enable users to quickly inspect genomic regions, compare transcriptional landscapes, and identify patterns that might otherwise remain hidden. Features such as summary visualizations, interactive genome browsers, expression data overlays, and customizable filtering options enhance accessibility and facilitate hypothesis generation, making genome-wide TSS analysis more accessible, efficient, and informative.

To achieve this, we introduce TSSpredator-Web, a web application designed to predict and explore TSS identified by TSSpredator. The web interface facilitates the interaction with TSSpredator, for example, by reducing the hurdles of dependencies installation and by providing enhanced interactions for data upload and data sharing. Moreover, TSSpredator-Web enhances the exploration of results by offering interactive visualizations, providing an overview of genome-wide TSS maps, as well as detailed views on the TSS predictions, thus facilitating deeper insights into transcriptional regulation.

2 Methods

2.1 Comparative detection of TSS

Before introducing TSSpredator-Web, we introduce the underlying algorithm of TSSpredator for TSS detection. TSSpredator is able to analyze different sequencing protocols for TSS detection in prokaryotes, as long as they produce a control and an enriched library. So far, it has been tested with data provided by all established protocols mentioned above: dRNAseq [1], Cappable-seq [2] and tagRNA-seq [10]. The sequencing results need to be processed to produce coverage profiles of the complete reads for both libraries, either in wiggle or in bedGraph format, using mapping workflows such as READemption [19]. TSSpredator then identifies the TSS by comparing both profiles and expecting an increase in expression for the TSS in the enriched library versus its non-enriched counterpart, independently for each genomic strand (forward and reverse). For this, TSSpredator expects the expression data pairs (enriched and non-enriched) to be normalized beforehand. For example, one possibility provided by READemption is to normalize the coverage by the total number of aligned reads and then multiply each position by the lowest number of aligned reads of all considered libraries (coverage-tnoar_min_normalized) or by a million (coverage-tnoar_mil_normalized). TSSpredator then conducts a further normalization between libraries (see S1 Algorithm). For each enriched library, by default, the 90th percentile () is computed and used as a factor for percentile normalization of both libraries—enriched and its corresponding non-enriched counterpart. To recover the original data range, the minimal percentile value across samples () is multiplied by the normalized values. Moreover, to normalize for different enrichment factors across replicates and sets, the median enrichment factor (i.e., enriched value divided by non-enriched library) is computed by default for all library pairs (enriched and non-enriched). The largest enrichment factor is then used to normalize all non-enriched libraries. While not activated by default, TSSpredator can export the normalized libraries for further downstream analyses after TSS prediction.

After normalization, for each position in each enriched expression graph, TSSpredator extracts the expression height , and computes, with respect to the previous position , the height change and the factor of height change (Fig 1, S2 and S3 Algorithms). These values are compared to predefined thresholds. If a position exceeds the thresholds for step height and step factor , it is considered a TSS candidate and classified as detected. Here, it is important to note that the step height threshold is relative to the 90th percentile value used for the inter-library normalization. If too many of the detected TSS are found close to each other within a window size , they are reduced by selecting either only the first TSS or the one with the highest expression. This produces a set of putative TSS per strand and per replicate.

Download:

Fig 1. Sketch of the computation of the three parameters step height, step factor and enrichment factor required for TSS prediction.

The yellow line shows the enriched profile (), with an increased expression value upstream of the annotated gene.

https://doi.org/10.1371/journal.pone.0326299.g001

In case of more than one replicate per experiment (that is, an organism’s genome or a tested condition), predicted TSS that are within a distance of 1 bp when comparing replicates are considered equal to allow a cross-replicate shift (). This default value can be changed by the user. If a TSS is labeled as detected in one replicate, the corresponding positions in the other replicates are re-evaluated by reducing the thresholds by predefined reduction values for the step height and step factor ( and , respectively). This increases the number of detected TSS across replicates. A TSS needs to be detected in a minimal number of replicates (, default: 1) to be included in the next steps. However, this lower threshold can be increased for higher specificity. For all detected TSS at any position , the enrichment factor (Fig 1, S2 Algorithm) is computed for all replicates with respect to the same position in the non-enriched library, and the maximal value across replicates is compared to the enrichment fact threshold . For example, if the selected threshold is , it means that the double expression value is expected in the enriched library compared to the control library for the TSS to be labeled as enriched. Analogously to the cross-replicate shift, all TSS found within the cross-condition shift value (default: 1 bp), are considered equal. This returns one set of detected and enriched TSS per strand and experiment.

Lastly, the sets of enriched TSS per experiment are compared to each other. The comparison of TSS in a multiple-condition experiment is straightforward since the TSS are already in the same coordinate system. For a multiple-strain experiment, a common coordinate system is computed using the concept of the SuperGenome based on a whole-genome alignment [12], such as the XMFA file provided by Mauve [20]. With the alignment, TSS are compared to each other and clustered together if they are found in close chromosomal distance. This returns, per experiment, a set of detected and enriched TSS that can be further analyzed. All mentioned thresholds can be modified by the user to lower or increase the specificity and/or sensitivity of the prediction. As the prediction step is influenced by the chosen thresholds and parameters, five predefined parameter sets have been introduced to span a range of sensitivities and specificities, from very sensitive to very specific (S1 Fig). An overview of how the parameter sets influence the TSS-calling process for an experiment with one condition and one replicate is provided in S2 Fig.

Finally, all detected TSS are classified according to their locations relative to annotated genes. While other classification methods exist [21], here we classify TSS as defined in [1] (Fig 2). 5 different classes are defined: Primary and secondary TSS are found up to 300 nt upstream of a gene’s annotated translation start, where the strongest or the first signal are classified as primaries, and the rest of signals as secondary TSS. Internal TSS are found within an annotated genes, and antisense TSS are found on the antisense strand within a distance of less than 150 nt to an annotated gene. Lastly, TSS that are not associated with either of the other four classes are called orphan TSS. Note that a TSS can also get more than one class assignment (e.g., it can be a primary TSS of a gene and at the same time an antisense TSS for a gene on the opposite strand). The resulting predictions are stored in a TSV-file called MasterTable, which summarizes all relevant information per TSS. This table describes in detail all predicted TSS positions, such as their enrichment factor, their class, the gene to which they might be associated, among others. This is reported for every experiment in the analysis, showing multiple lines per TSS and classification. Moreover, TSSpredator provides GFF files for each experiment aligned into one coordinate system. This is especially useful when analyzing multiple strains of a bacterium.

Download:

Fig 2. Classification of TSS based on the distance to annotated genes as defined by [1].

Primary and secondary TSS are located upstream of annotated genes, where secondary TSS show a lower enriched expression signal compared to the respective primary TSS. Internal TSS are located within the genes themselves, while antisense TSS are located on the antisense strand close to a gene (within 150 bp) or within the gene itself. Lastly, all other identified TSS are called orphan TSS.

https://doi.org/10.1371/journal.pone.0326299.g002

2.2 Design process of TSSpredator-Web

The workflow for TSS prediction and their association with a gene is run independently for each replicate, condition, and strand, as described above. For each replicate, four input files are required—one for each strand (forward and reverse) and experimental type (non-enriched control and enriched). Since TSSpredator expects each file to be correctly categorized, every file must be uploaded separately.

To facilitate this interaction, the latest version of TSSpredator offers a JAVA-based GUI to assist users in allocating files and setting the required parameters for the TSS prediction (https://it.inf.uni-tuebingen.de/tsspredator, accessed on May 2025). From this GUI, the data processing can be initiated, and the predicted results are generated and saved in external files for downstream exploration.

While effective for TSS prediction, this process presents several usability challenges. First, since TSSpredator is implemented in JAVA, it requires users to install additional dependencies, limiting platform independence and posing difficulties for users with limited technical expertise. Second, the manual file allocation process becomes increasingly cumbersome, and possible error-prone, as the number of replicates and conditions grows. Finally, downstream analysis is not integrated into the tool, so that users must export the results to other platforms to gain a comprehensive overview or biological insight.

By reviewing several genome-wide TSS studies, we identified common strategies used to explore such data sets. For example, many studies provide overview visualizations with respect to the TSS distribution across their classes or analyzed conditions [6,16,22]. Additionally, these studies often integrate transcriptomic and genomic data, especially gene annotations and upstream regulatory regions, within one view for enhanced data exploration [1].

Based on the limitations of TSSpredator, and the evaluation of published TSS studies, we identified eight requirements to improve the TSS analysis workflow (see Fig 3). These requirements can be classified with respect to the step of the TSS prediction workflow they can improve: the input allocation (R1, R2), data processing (R3, R4) and result exploration (R5, R6, R7, R8).

Download:

Fig 3. Requirements identified for the improvement of the TSS prediction workflow.

Each of them will be tackled with the implementation of TSSpredator-Web.

https://doi.org/10.1371/journal.pone.0326299.g003

Building on these requirements, we have defined a set of goals to be addressed by TSSpredator-Web:

G1 Accessibility (R3): Ensure platform independence by eliminating installation requirements, allowing users to easily access the tool.

G2 Usability (R1, R4–R7): Simplify the user experience by providing easy ways of uploading and exploring the data, as well as making long waiting times bearable.

G3 Exploration (R5–R7): Support data exploration through interactive visualizations that present TSS predictions across levels of detail.

G4 Data Integration (R6, R7): Facilitate exploratory analyses by combining genomic and transcriptomic information into an interactive view, providing genomic context of the results.

G5 Reproducibility and Data Sharing (R2, R8): Provide mechanisms to reproduce and share both the prediction results and exploration processes with others.

2.3 Back-end of TSSpredator-Web

To facilitate the accessibility for the TSS prediction workflow, TSSpredator-Web has been designed as a web-application. It is freely accessible via the TueVis visualization platform (https://tsspredator-tuevis.cs.uni-tuebingen.de). However, we also provide Docker images that facilitate deployment on any server, as well as allow local usage of the tool (https://github.com/Integrative-Transcriptomics/TSSpredator-Web/releases). The back-end was designed with Flask (v2.3.3) and runs a JAVA-compiled version of TSSpredator for the TSS prediction. For multiple simultaneous asynchronous requests, a worker manager based on Celery (v5.3.4) and redis (v6.2.0) has been integrated in the back-end. The results of the TSS prediction are preprocessed using Python scripts and bedGraphToBigWig [23] for the subsequent interactive exploration in the web interface.

2.4 Web-interface

The front end of TSSpredator-Web has been developed using React and is structured into three distinct pages, each corresponding to one step in the TSS prediction workflow. The first step, input allocation, offers an improved file upload experience. Users can upload all necessary files at once and conveniently organize them using a drag & drop interface. The predefined sets of parameters described previously can be selected from this interface or adapted by the user. Once all parameters have been set, a configuration file that encompasses all the chosen parameters can be saved for future use, ensuring future reproducibility. If any user is provided with such a configuration file, they only need to upload it along with a folder containing all needed files, for TSSpredator-Web to allocate the inputs automatically, enabling a faster start of the analysis.

Upon starting the TSS prediction, users are redirected to the status page, where information on the data processing is regularly updated. This allows an effortless monitoring of the analysis without needing to keep the browser window open, as well as facilitating the sharing of ongoing predictions. Additionally, if any result have already been computed with TSSpredator-Web, the users are able to upload a ZIP-file of these results. This skips the need to rerun the prediction and still enabling access to the downstream data exploration features of TSSpredator-Web.

2.5 Data exploration & integration

To support the goals of data exploration & integration, as well as the usability improvement, TSSpredator-Web completes the TSS prediction workflow by providing an interactive result analysis. For this, the MasterTable as computed by TSSpredator is presented as an interactive table, complemented by two visualization approaches: an UpSet plot [24] offering a high-level overview of the dataset, and a genomic viewer for contextualization of the TSS predictions.

The interactive MasterTable includes common features of exploration, such as searching, filtering, and sorting. Since TSS can be associated with more than one gene and therefore can be classified into more than one of the classes mentioned above, they correspond to one row in the MasterTable per classification. The TSS distribution among the different classes or between experiments is visually summarized by an UpSet plot [24]. By setting which variable should be used for the plot (either TSS classes or different experiments), the users can identify how many TSS positions have multiple TSS classes or how many of them occur in specific combinations of experiments. To get more information on these subsets of TSS, these subsets in the UpSet plot are interactive, such that users can select them, and only their corresponding rows are shown in the MasterTable.

For the exploration of TSS in a genomic context, a genome browser has been implemented. Based on the visualization grammar Gosling [25], the genome browser provides aggregated and detailed views of the data for each strand independently, following Shneiderman’s mantra: Overview first, zoom and filter, details on demand [26]. The aggregated view (Fig 4, top) consists of multiple visualization components. The main component shows a stacked bar chart that bins the TSS according to their position in the genome and their assigned class. Depending on the zoom level, the view is aggregated with bins of 50kbp, 10kbp or 5kbp. A further track below the stacked bar charts indicates annotated genes by including gray rectangles at the corresponding position. Such an aggregated view is shown until a full window size resolution of 50,000 bp is reached. From this point on, the detailed view (Fig 4, bottom) is displayed, showing each individual TSS and the surrounding annotated genes. Within this view, the main track also includes the normalized expression values for the enriched and control libraries at each genomic position. To provide information on gene regulation, the 50 bp-long sequence upstream of the TSS is visualized on a third track. A tool-tip can provide more detailed information of each TSS, gene, and expression value.

Download:

Fig 4. Sketch of different views of the genome browser of TSSpredator-Web.

The top section (aggregated view) displays the aggregated stacked bar chart. Each bar shows the count of TSS of a specific class in the respective genomic region bin. Below, the gene track provides a hints for the location of annotated genes. The bottom section illustrates how the plot changes upon falling below the 50,000 bp threshold (detailed view). Individual TSS locations are represented by colored glyphs, and genes are shown in their entirety with their corresponding name or locus_tag. The visualization also includes expression data from both control (in gray) and enriched libraries (in yellow), along with a track that illustrates the upstream region of a TSS. For simplicity, only 10 bp of the upstream region are shown on this representation instead of the actual 50 bp.

https://doi.org/10.1371/journal.pone.0326299.g004

The described tracks visualize the data separately for each strand, with a difference in the orientation of the plots and glyphs (Fig 5). This mimics the visualization method used in other genomic browsers, such as the Integrated Genome Browser (IGB) [27] or the Integrative Genomics Viewer (IGV) [28]. The genome browser of TSSpredator-Web provides two view arrangements to facilitate the exploration of the data within one experiment, and also the comparison across experiments (Fig 5). The single view mode groups the tracks of each experiment together (see Fig 5, top row), thus facilitating the exploration of single experiments, under different conditions, or single strains of one organism. Furthermore, the aligned view mode (see Fig 5, bottom row) groups the components vertically with respect to their strand, hence facilitating the comparison between conditions or strains. Regardless of the chosen arrangement, all views are synchronized, so that the same zoom level is shown in all instances. Moreover, a synced cross-line appears on hover to identify the position over all views. The genome browser can be freely used for exploration at any level of detail of the genomic coordinates. If users require specific TSS positions, these can be searched in the MasterTable and then directly visualized in the genome browser.

Download:

Fig 5. Visual representation of the two modes of the genome browser to show data of multiple experiments.

Each experiment consists of one component per strand, differing on the orientation of the data (for example, the reverse strand is flipped vertically). The single view mode (top section) groups the components based on the experiments for a direct exploration within each experiment. Differently, the aligned view (bottom section) groups the visualization components vertically with respect to the two strands. This allows an easier comparison across experiments. Regardless of the chosen view, a synced crossline is shown on hover. For simplicity, in this figure this line is only shown in the single view.

https://doi.org/10.1371/journal.pone.0326299.g005

The complete predicted dataset as well as each single visualization can be downloaded from the interface. Moreover, as each prediction of TSSpredator receives a unique URL, the TSS predictions can be easily shared and accessed up to seven days after the corresponding TSS prediction was run.

3 Use case

To provide an example of how TSSpredator-Web can be used to generate and analyze genome-wide TSS data, a dataset for Escherichia coli K-12 MG1655 published by Balkin et al. [29] [GEO Accession No. GSE215300] is used in the following section. In this study, the authors treated the E. coli strain with three different antibiotics (novobiocin, rifampicin and tetracycline). All three antibiotics have different modes of action. Novobiocin inhibits the DNA gyrase and, hence reduces DNA replication [30]. Rifampicin blocks the bacterial RNA polymerase, reducing the RNA synthesis [31]. Lastly, tetracycline binds the 30S ribosomal subunit and inhibits protein synthesis [32]. For all three treatments and a control condition, the authors measured gene expression using three replicates per condition, following the Cappable-seq protocol [2] to generate -enriched reads. Non-enriched VCE-capped reads generated using the NEBNext Ultra™ II Directional RNA prep kit are also available under the same GEO accession number. READemption [19] was used to align the reads and to compute the coverage plots in wiggle format. Enriched and non-enriched reads were aligned independently to the E. coli reference genome [NCBI Accession No. GCF_000005845.2], taking into account their different protocols for library preparations. The READemption coverage command provides different normalized wiggle files. To make both independent runs comparable, the tnoar_mil_normalized coverage plots were used for further analysis, consisting of 24 wiggle files (that is, 4 conditions 3 replicates 2 strands) per library protocol (non-enriched and enriched). These 48 wiggle files, together with the genome and annotation file obtained from NCBI, are the basis for the prediction of TSSpredator-Web.

Since one of the goals of TSSpredator-Web is to provide a user-friendly way to predict and explore genome-wide TSS maps, this starts already with the upload of the data. The 50 required files can be easily uploaded to TSSpredator-Web using the drag & drop functionality (S3 Fig). Here, the files were distributed among the four conditions and three replicates. To increase the confidence of the results, the predetermined very specific parameters were chosen for the TSS prediction step. For clustering after prediction, a cross-condition shift of 3 bp and a cross-replicate shift of 2 bp were allowed. Based on these parameters, TSSpredator identified a total of genomic positions as enriched TSS.

The genome-wide TSS exploration process starts with an overview of the TSS across classes and analyzed conditions. Taking into account only the genomic position of a TSS, the UpSet plot shows that most TSS are classified as internal, directly followed by primary TSS (Fig 6A). However, this distribution does not account for how often a TSS occurs across conditions, meaning that a TSS can be enriched only in one condition. This can be verified by considering both, the genomic position and the condition in which the TSS occur for the UpSet plot (Fig 6B). The results show that primary TSS are, in fact, the predominant class in the results. A similar result can be seen by looking at the genome browser, where all TSS positions are aggregated by class and genomic position per condition (Fig 6C), where a predominance of primary TSS can also be observed.

Download:

Fig 6. Analysis of the overall distribution of TSS across conditions, classes and location in the genome for data collected for E. coli on four different conditions (one control and three treatment with antibiotics).

(a) UpSet plot showing the distribution of enriched TSS aggregated only by their location (i.e., position and strand). (b) UpSet plot showing the distribution of enriched TSS aggregated by their location and the condition they occur. For this UpSet plot, each TSS is counted for each condition separately. (c) Aggregated view of the genome browser showing the distribution of TSS colored by class and binned by their position in the genome for the control condition.

https://doi.org/10.1371/journal.pone.0326299.g006

To analyze the distribution of primary TSS even further, one can visualize the occurrence of this class across conditions. This can also be inspected via the UpSet plot (Fig 7A) and provides a glimpse into an interesting subset of TSS: 43 TSS surpass the threshold for detection only in the samples treated with any of the three antibiotics, but not in the control samples (highlighted in Fig 7A). These positions can be analyzed in more detail by interacting with the UpSet plot to filter the MasterTable for this specific subset. From here, the MasterTable can be sorted by step height (that is, the increase in the enriched library at position compared to the previous position ) to show the most prominent positions (an excerpt of the MasterTable shown in Table 1). Based on these results, users can search for further information, for example, by searching for more information about genes in known databases, such as the EcoCyc database [33] for E. coli. Some of the genes found in Table 1 were manually searched in EcoCyc to exemplify this exploration workflow. For example, the genes ugd, arnB and ais, are described as responsible for changes in membrane lipopolysaccharides (LPS), indicating a reaction against the hostile environments [34–36] as already shown for another antibiotic polymyxin [35]. In addition, the gene dinI indicates DNA damage, which can be caused by antibiotics, even though it is not part of their active mode of action [37]. In summary, some of the TSS with the highest step height and their associated genes reflect how E. coli reacts to the high stress caused by antibiotics.

Download:

Table 1. Excerpt of the MasterTable showing the top 10 enriched primary TSS enriched based on step height and their associated genes. These TSS are enriched in all antibiotic treatment conditions for E. coli.

https://doi.org/10.1371/journal.pone.0326299.t001

Download:

Fig 7. Analysis of primary TSS for E. coli across conditions, especially those TSS occurring only under the treatment with each antibiotic.

(a) UpSet plot showing the distribution of primary enriched TSS across conditions, aggregated only by their location. The highlighted set refers to those TSS positions enriched only under the treatment with each antibiotic. (b) Aligned mode of the genome browser showing a primary TSS position on the reverse strand (shown via the direction of the glyphs and the bars of the expression profiles) occurring in all conditions with antibiotic treatment, but not in the control condition. The TSS is located upstream of the gene udp (locus_tag: b2028). The enriched libraries (orange bars) can be seen increased in all conditions. However, the treatment with novobiocin shows the highest expression value. For simplicity, the empty genome track has been removed from the figure.

https://doi.org/10.1371/journal.pone.0326299.g007

Although providing an overview of the most prominent TSS can be helpful, combining this information with the transcriptomic layer provides even more insight. This can be achieved through the genome browser. Here, we inspect the most prominent TSS with respect to the step height: position , the primary TSS of gene ugd (locus_tag: b2028, Fig 7B). Here, it can be seen that enrichment libraries show the highest expression for the TSS under treatment with novobiocin, in comparison to the other two antibiotics. A recent study identified that among these three antibiotics, the membrane LPS modification triggered by ugd, among other genes, is most effective against novobiocin [38]. A further step to analyze this region beyond TSSpredator-Web would be to extract the promoter and/or the UTR region of this gene to analyze putative regulatory elements in detail. Here, the UTR region of ugd was manually extracted using the coordinates provided by TSSpredator-Web, and compared to the RFAM [39] database outside of the presented interface. Though no hit was identified, the secondary structure of the sequence was computed using RNAfold [40] and returned a stable secondary structure (see S4 Fig).

TSSpredator-Web also enables the analysis of another particularly interesting class of TSS: orphan TSS. These positions correspond to transcription start sites that cannot be associated with any annotated gene from the provided file, suggesting the presence of transcriptional activity outside known gene boundaries. Such signals may represent transcriptional units that were overlooked by standard gene annotation pipelines and become detectable only through genome-wide TSS mapping. To investigate such putatively overlooked genes, we analyzed the top 10 orphan TSS present under all conditions, ranked by their step height. A prominent region is close to the orphan TSS at position on the reverse strand (Fig 8A). Upon zooming in on this region in the genome browser, a noticeable increase in the enriched library can be observed, with expression levels increasing up to 350. Moreover, the upstream sequence contains a subsequence similar to the Pribnow box (TATAAA) at −9 nt upstream of the TSS (Fig 8B), suggesting a binding site for the RNA polymerase. When exploring the nearby regions, a second orphan TSS is identified on the forward strand at position (Fig 8C). This TSS also shows a clear Pribnow box at position −13 nt upstream of the TSS. Together, these two orphan TSS may represent previously unannotated transcriptional units. Their pronounced step heights and well-defined upstream promoter motifs make them strong candidates for further computational and experimental validation, with the potential to improve the genomic annotation of the organism. For example, one could analyze the downstream regions of the TSS to identify possibly overlooked open reading frames.

Download:

Fig 8. Usage of the genome browser for the exploration and characterization of orphan TSS in E. coli.

For simplicity, empty visualization tracks have been removed from the figures. (a) Genome viewer on Single view mode visualizing a region with two orphan TSS that occur in all conditions, shown here only for the rifampicin treatment. On the reverse strand (bottom), the TSS position shows a prominent step height rising up to an expression of around 350. Interestingly, on the forward strand at position a further orphan TSS position was identified. (b) Zoomed view of TSS at position of the reverse strand with an expression value of the enriched library (orange bars) around 350. The upstream region shows a putative Pribnow box (TATAAA) starting at −9 nt upstream of the TSS. (c) Zoomed view of TSS at position with an expression value of the enriched library (orange bars) around 25. The upstream region shows again a clear putative Pribnow box (TAATATAA) starting at −13 nt upstream of the TSS.

https://doi.org/10.1371/journal.pone.0326299.g008

4 Discussion and conclusion

Genome-wide TSS maps provide important information for the analysis of the architecture of the prokaryotic transcriptome [6,29] or even recent studies on bacteriophages [41,42]. These studies facilitate the definition of the regulatory promoter region of genes and provide clear signals for the identification of unannotated genes [3,4,11]. Due to the large complexity of the underlying data, computational methods are required to analyze the data and produce insights. The currently available methods tackle the prediction of TSS with different underlying methodologies [13–16] and most commonly provide only a command-line tool. Yet, the usage via the command-line demands technical expertise, which should not be expected from researchers with a biological background, especially since they are the individuals who generate insights from the data. Moreover, data exploration and integration are key steps in the generation of insights. However, this step has been neglected by many of the currently existing tools. Therefore, we defined requirements and goals to close the gap in insight generation for TSS prediction workflows at every step through the implementation of TSSpredator-Web.

As a web-application, the platform-independent usage and the user-friendly GUI of TSSpredator-Web enhance the accessibility of the prediction workflow (our defined goal G1). In addition, the web-based application allows for the integration of modern, user-centered approaches that address the remaining goals defined for TSSpredator-Web. For example, it facilitates the reproducibility and data sharing of the workflow (G5) via the upload of a configuration file containing all the input files.

Besides reproducibility, TSSpredator-Web was developed to increase the usability of the TSS prediction workflow (G2). This enhanced usability begins with the improved and intuitive file upload process and continues with the asynchronous TSS prediction executed in the backend. A further major usability improvement is the ability to explore genomic and transcriptomic data directly within the interface. Users can now rely not only on TSSpredator-Web for TSS prediction but also for data exploration (G3), as illustrated in our use case. The exploration can be pursued from different angles. The MasterTable provides access to all predicted results of TSSpredator, while the visualizations provide a more comprehensive view of the data. For example, a quick overview of the TSS distribution among classes, experiments, and locations in the genome can be achieved either by using the UpSet plot or by the aggregated view of the genome browser. As shown in the use case, users can identify specific TSS sets of interest; for example, those that occur only under specific conditions; such as those enriched only when E. coli was treated with antibiotics.

The aim of the genome browser is to provide a single view that integrates genomic and transcriptomic data (G4). Prior studies had to rely entirely on external tools, such as MEME, to provide a genomic context for a TSS. Though our implementation of TSSpredator-Web does not provide any statistical information on the occurrence of the sequences upstream of a TSS, it allows a general exploration and contextualization of the data. With this, users are not only able to identify TSS with high confidence values but also to dig deeper into their potential regulation, such as sequences present in their promoter regions.

Moreover, the ability to characterize orphan TSS, those TSS not linked to annotated transcriptional units, opens the door to discovering previously overlooked genomic elements, such as ncRNAs [43,44]. Using the genome browser of TSSpredator-Web, the confidence in such sites can be evaluated based on expression, high step height values, or the presence of a Pribnow box. Other tools, such as the recently developed pipeline TSS-Captur [11], focus on further characterizing these TSS sites to provide a hint about the functionality of the transcript and to close the gap in missing gene annotations. Both TSSpredator-Web and TSS-Captur are part of the Tübingen Visualization Server (TueVis) initiative, a visualization server for user-friendly tools. In the future, it is planned to directly link TSSpredator-Web with TSS-Captur to allow a seamless transition from exploratory TSS prediction to detailed characterization of uncharacterized TSS.

While the current exploration within TSSpredator-Web is primarily visual, users may also be interested in identifying statistically different TSS between conditions. This can be achieved by exporting the normalized expression profiles generated by TSSpredator and extracting the enriched coverage profile surrounding each enriched TSS position based on the MasterTable. These profiles are treated as distributions. Our statistical approach uses the Kolmogorov-Smirnov (KS-)test [45,46] by comparing the coverage distributions around the called TSS between the conditions. For this, we conduct pairwise KS-tests between all replicates of the conditions to be compared and compute a common -value using the Cauchy combination test [47]. Finally, all the combined -values of all tested TSS are subject to a Bonferroni -value correction [48].

This approach has not yet been integrated into TSSpredator-Web, but we provide a preliminary implementation in the GitHub repository.

Moreover, TSSpredator-Web has so far been thoroughly tested using coverage profiles that account for the complete read length, particularly for defining the predefined parameter sets (see S1 Fig). However, READemption [19] and other coverage profile generating tools, such as bedtools genomcov [49] or deeptools [50], also allow for the possibility of accounting for only the first base of the read during profile generation. While TSSpredator-Web has not been fully tested on such coverage profiles, we expect it to correctly identify TSS in this data as well. Still, as first-base profiles tend to be less smooth, especially in intergenic regions, we recommend running the prediction with more specific thresholds for the step height and step factor parameters as the ones used for full-read coverage profiles.

Additionally, future developments of TSSpredator-Web may include support for newer sequencing protocols such as Cappable-ONT [42] for long-read-based TSS prediction and Term-seq [51] for the characterization of transcription termination sites (TTS). This would provide deeper insights into RNA processing, regulation and boundary detection, further extending the depth and applicability of TSS analyses in prokaryotes.

In conclusion, TSSpredator-Web provides an accessible and interactive platform for genome-wide TSS exploration, improving the discovery and interpretation of bacterial transcriptomics. Its user-friendly focus, combined with the capability of visual analysis of the data, provides users with a good basis for the analysis of the prokaryotic transcriptome architecture.

Supporting information

S1 Fig. Overview of the five predefined sets of thresholds for Step height, Step factor and Enrichment factor for the TSS prediction with TSSpredator.

https://doi.org/10.1371/journal.pone.0326299.s001

(TIF)

S2 Fig. Overview of how different parameter sets influence TSS calling in TSSpredator.

Three discontinuous genomic positions (i, j, k) show loci with putative TSS. The top plot shows the normalized expression profiles of the enriched and non-enriched RNA-seq libraries for one condition and one replicate. The middle plots display the derived metrics Step Height, Step Factor, and Enrichment Factor, together with the corresponding thresholds for the very specific, default, and very sensitive parameter sets. Positions exceeding a given threshold are marked with a colored point according to that parameter set. Note that for Step Factor, the thresholds for the very specific and default sets coincide. In the bottom plot, a TSS is called at positions where all three metrics surpass the thresholds of a given parameter set. The color of each TSS indicates the parameter set that would identify it.

https://doi.org/10.1371/journal.pone.0326299.s002

(TIF)

S3 Fig. Screenshot of drag & drop process for file distribution.

Instead of selecting files individually, they can be uploaded at once and distributed across the different conditions and replicates.

https://doi.org/10.1371/journal.pone.0326299.s003

(TIF)

S4 Fig. Secondary structure of the UTR region of the gene ugd as predicted by RNAfold.

https://doi.org/10.1371/journal.pone.0326299.s004

(TIF)

Supplementary Material. File containing the S1, S2 & S3 Algorithms.

https://doi.org/10.1371/journal.pone.0326299.s005

(PDF)

Acknowledgments

We thank Valerie Bouillon for her contributions to the prototype of the user-interface for TSSpredator-Web. We also thank Dilek Tuncbilek-Dere and Sven Fillinger for further developments of TSSpredator. Lastly, we thank Natalia Gogoleva and Yuri Gogolev for answering questions regarding the library preparation of the analyzed dataset.

References

1. Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiss S, Sittka A, et al. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature. 2010;464(7286):250–5. pmid:20164839
- View Article
- PubMed/NCBI
- Google Scholar
2. Ettwiller L, Buswell J, Yigit E, Schildkraut I. A novel enrichment strategy reveals unprecedented number of novel transcription start sites at single base resolution in a model prokaryote and the gut microbiome. BMC Genomics. 2016;17:199. pmid:26951544
- View Article
- PubMed/NCBI
- Google Scholar
3. Backofen R, Amman F, Costa F, Findeiß S, Richter AS, Stadler PF. Bioinformatics of prokaryotic RNAs. RNA Biol. 2014;11(5):470–83. pmid:24755880
- View Article
- PubMed/NCBI
- Google Scholar
4. Kazmierczak MJ, Wiedmann M, Boor KJ. Alternative sigma factors and their roles in bacterial virulence. Microbiol Mol Biol Rev. 2005;69(4):527–43. pmid:16339734
- View Article
- PubMed/NCBI
- Google Scholar
5. Paget MS. Bacterial Sigma Factors and Anti-Sigma Factors: Structure, Function and Distribution. Biomolecules. 2015;5(3):1245–65. pmid:26131973
- View Article
- PubMed/NCBI
- Google Scholar
6. Ryan D, Jenniches L, Reichardt S, Barquist L, Westermann AJ. A high-resolution transcriptome map identifies small RNA regulation of metabolism in the gut microbe Bacteroides thetaiotaomicron. Nat Commun. 2020;11(1):3557. pmid:32678091
- View Article
- PubMed/NCBI
- Google Scholar
7. Sorek R, Cossart P. Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat Rev Genet. 2010;11(1):9–16. pmid:19935729
- View Article
- PubMed/NCBI
- Google Scholar
8. Storz G, Vogel J, Wassarman KM. Regulation by small RNAs in bacteria: expanding frontiers. Mol Cell. 2011;43(6):880–91. pmid:21925377
- View Article
- PubMed/NCBI
- Google Scholar
9. Westermann AJ, Vogel J. Cross-species RNA-seq for deciphering host-microbe interactions. Nat Rev Genet. 2021;22(6):361–78. pmid:33597744
- View Article
- PubMed/NCBI
- Google Scholar
10. Innocenti N, Golumbeanu M, Fouquier d’Hérouël A, Lacoux C, Bonnin RA, Kennedy SP, et al. Whole-genome mapping of 5’ RNA ends in bacteria by tagged sequencing: a comprehensive view in Enterococcus faecalis. RNA. 2015;21(5):1018–30. pmid:25737579
- View Article
- PubMed/NCBI
- Google Scholar
11. Witte Paz M, Vogel T, Nieselt K. TSS-Captur: a user-friendly pipeline for characterizing unclassified RNA transcripts. NAR Genom Bioinform. 2024;6(4):lqae168. pmid:39703424
- View Article
- PubMed/NCBI
- Google Scholar
12. Dugar G, Herbig A, Förstner KU, Heidrich N, Reinhardt R, Nieselt K, et al. High-resolution transcriptome maps reveal strain-specific regulatory features of multiple Campylobacter jejuni isolates. PLoS Genet. 2013;9(5):e1003495. pmid:23696746
- View Article
- PubMed/NCBI
- Google Scholar
13. Amman F, Wolfinger MT, Lorenz R, Hofacker IL, Stadler PF, Findeiß S. TSSAR: TSS annotation regime for dRNA-seq data. BMC Bioinformatics. 2014;15:89. pmid:24674136
- View Article
- PubMed/NCBI
- Google Scholar
14. Jorjani H, Zavolan M. TSSer: an automated method to identify transcription start sites in prokaryotic genomes from differential RNA sequencing data. Bioinformatics. 2014;30(7):971–4. pmid:24371151
- View Article
- PubMed/NCBI
- Google Scholar
15. Promworn Y, Kaewprommal P, Shaw PJ, Intarapanich A, Tongsima S, Piriyapongsa J. ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data. PLoS One. 2017;12(5):e0178483. pmid:28542466
- View Article
- PubMed/NCBI
- Google Scholar
16. Čuklina J, Hahn J, Imakaev M, Omasits U, Förstner KU, Ljubimov N, et al. Genome-wide transcription start site mapping of Bradyrhizobium japonicum grown free-living or in symbiosis - a rich resource to identify new transcripts, proteins and to study gene regulation. BMC Genomics. 2016;17:302. pmid:27107716
- View Article
- PubMed/NCBI
- Google Scholar
17. Bischler T, Tan HS, Nieselt K, Sharma CM. Differential RNA-seq (dRNA-seq) for annotation of transcriptional start sites and small RNAs in Helicobacter pylori. Methods. 2015;86:89–101. pmid:26091613
- View Article
- PubMed/NCBI
- Google Scholar
18. Rohmer C, Dobritz R, Tuncbilek-Dere D, Lehmann E, Gerlach D, George SE, et al. Influence of Staphylococcus aureus Strain Background on Sa3int Phage Life Cycle Switches. Viruses. 2022;14(11):2471. pmid:36366569
- View Article
- PubMed/NCBI
- Google Scholar
19. Förstner KU, Vogel J, Sharma CM. READemption-a tool for the computational analysis of deep-sequencing-based transcriptome data. Bioinformatics. 2014;30(23):3421–3. pmid:25123900
- View Article
- PubMed/NCBI
- Google Scholar
20. Darling AE, Mau B, Perna NT. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010;5(6):e11147. pmid:20593022
- View Article
- PubMed/NCBI
- Google Scholar
21. Boutard M, Ettwiller L, Cerisy T, Alberti A, Labadie K, Salanoubat M, et al. Global repositioning of transcription start sites in a plant-fermenting bacterium. Nat Commun. 2016;7:13783. pmid:27982035
- View Article
- PubMed/NCBI
- Google Scholar
22. Hocq R, Jagtap S, Boutard M, Tolonen AC, Duval L, Pirayre A, et al. Genome-Wide TSS Distribution in Three Related Clostridia with Normalized Capp-Switch Sequencing. Microbiol Spectr. 2022;10(2):e0228821. pmid:35412381
- View Article
- PubMed/NCBI
- Google Scholar
23. Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010;26(17):2204–7. pmid:20639541
- View Article
- PubMed/NCBI
- Google Scholar
24. Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H. UpSet: Visualization of Intersecting Sets. IEEE Trans Vis Comput Graph. 2014;20(12):1983–92. pmid:26356912
- View Article
- PubMed/NCBI
- Google Scholar
25. L’Yi S, Wang Q, Lekschas F, Gehlenborg N. Gosling: A Grammar-based Toolkit for Scalable and Interactive Genomics Data Visualization. IEEE Trans Vis Comput Graph. 2022;28(1):140–50. pmid:34596551
- View Article
- PubMed/NCBI
- Google Scholar
26. Shneiderman B. The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings 1996 IEEE Symposium on Visual Languages. p. 336–43. https://doi.org/10.1109/vl.1996.545307
27. Freese NH, Norris DC, Loraine AE. Integrated genome browser: visual analytics platform for genomics. Bioinformatics. 2016;32(14):2089–95. pmid:27153568
- View Article
- PubMed/NCBI
- Google Scholar
28. Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–92. pmid:22517427
- View Article
- PubMed/NCBI
- Google Scholar
29. Balkin A, Plotnikov A, Konnova T, Shagimardanova E, Hamo H, Gogolev Y, et al. Cappable-seq RNA-sequencing data sets of Escherichia coli K-12 MG1655 treated with novobiocin, tetracycline, and rifampicin. Microbiol Resour Announc. 2025;14(2):e0119424. pmid:39727393
- View Article
- PubMed/NCBI
- Google Scholar
30. Smith DH, Davis BD. Mode of action of novobiocin in Escherichia coli. J Bacteriol. 1967;93(1):71–9. pmid:5335903
- View Article
- PubMed/NCBI
- Google Scholar
31. Floss HG, Yu T-W. Rifamycin-mode of action, resistance, and biosynthesis. Chem Rev. 2005;105(2):621–32. pmid:15700959
- View Article
- PubMed/NCBI
- Google Scholar
32. Chopra I, Roberts M. Tetracycline antibiotics: mode of action, applications, molecular biology, and epidemiology of bacterial resistance. Microbiol Mol Biol Rev. 2001;65(2):232-60; second page, table of contents. pmid:11381101
- View Article
- PubMed/NCBI
- Google Scholar
33. Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, Paulsen IT, et al. EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res. 2005;33(Database issue):D334-7. pmid:15608210
- View Article
- PubMed/NCBI
- Google Scholar
34. Mainprize IL, Bean JD, Bouwman C, Kimber MS, Whitfield C. The UDP-glucose dehydrogenase of Escherichia coli K-12 displays substrate inhibition by NAD that is relieved by nucleotide triphosphates. J Biol Chem. 2013;288(32):23064–74. pmid:23792965
- View Article
- PubMed/NCBI
- Google Scholar
35. Breazeale SD, Ribeiro AA, Raetz CRH. Oxidative decarboxylation of UDP-glucuronic acid in extracts of polymyxin-resistant Escherichia coli. Origin of lipid a species modified with 4-amino-4-deoxy-L-arabinose. J Biol Chem. 2002;277(4):2886–96. pmid:11706007
- View Article
- PubMed/NCBI
- Google Scholar
36. Nishino K, Hsu F-F, Turk J, Cromie MJ, Wösten MMSM, Groisman EA. Identification of the lipopolysaccharide modifications controlled by the Salmonella PmrA/PmrB system mediating resistance to Fe(III) and Al(III). Mol Microbiol. 2006;61(3):645–54. pmid:16803591
- View Article
- PubMed/NCBI
- Google Scholar
37. Revitt-Mills SA, Robinson A. Antibiotic-Induced Mutagenesis: Under the Microscope. Front Microbiol. 2020;11:585175. pmid:33193230
- View Article
- PubMed/NCBI
- Google Scholar
38. Deylami J, Chng SS, Yong EH. Elucidating Antibiotic Permeation through the Escherichia coli Outer Membrane: Insights from Molecular Dynamics. J Chem Inf Model. 2024;64(21):8310–21. pmid:39480067
- View Article
- PubMed/NCBI
- Google Scholar
39. Ontiveros-Palacios N, Cooke E, Nawrocki EP, Triebel S, Marz M, Rivas E, et al. Rfam 15: RNA families database in 2025. Nucleic Acids Res. 2025;53(D1):D258–67. pmid:39526405
- View Article
- PubMed/NCBI
- Google Scholar
40. Gruber AR, Bernhart SH, Lorenz R. The ViennaRNA web services. Methods Mol Biol. 2015;1269:307–26. pmid:25577387
- View Article
- PubMed/NCBI
- Google Scholar
41. Wolfram-Schauerte M, Moskalchuk A, Pozhydaieva N, Rojas AAR, Schindler D, Kaiser S, et al. T4 phage RNA is NAD-capped and alters the NAD-cap epitranscriptome of Escherichia coli during infection through a phage-encoded decapping enzyme. openRxiv. 2024. http://dx.doi.org/10.1101/2024.04.04.588121
42. Putzeys L, Boon M, Lammens E-M, Kuznedelov K, Severinov K, Lavigne R. Development of ONT-cappable-seq to unravel the transcriptional landscape of Pseudomonas phages. Comput Struct Biotechnol J. 2022;20:2624–38. pmid:35685363
- View Article
- PubMed/NCBI
- Google Scholar
43. Stazic D, Voß B. The complexity of bacterial transcriptomes. J Biotechnol. 2016;232:69–78. pmid:26450562
- View Article
- PubMed/NCBI
- Google Scholar
44. Barquist L, Burge SW, Gardner PP. Studying RNA homology and conservation with infernal: from single sequences to RNA families. CP in Bioinformatics. 2016;54(1).
- View Article
- Google Scholar
45. Kolmogorov A. Sulla determinazione empirica di una legge didistribuzione. Giorn Dell’inst Ital Degli Att. 1933;4:89–91.
- View Article
- Google Scholar
46. Smirnov N. Table for estimating the goodness of fit of empirical distributions. Ann Math Statist. 1948;19(2):279–81.
- View Article
- Google Scholar
47. Liu Y, Xie J. Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J Am Stat Assoc. 2020;115(529):393–402. pmid:33012899
- View Article
- PubMed/NCBI
- Google Scholar
48. Teoria Statistica Delle Classi e Calcolo Delle Probabilità. Encyclopedia of Research Design. SAGE Publications, Inc.; 2010. http://dx.doi.org/10.4135/9781412961288.n455
49. Quinlan AR. BEDTools: the Swiss-Army tool for genome feature analysis. Curr Protoc Bioinformatics. 2014;47:11.12.1-34. pmid:25199790
- View Article
- PubMed/NCBI
- Google Scholar
50. Ramírez F, Dündar F, Diehl S, Grüning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014;42(Web Server issue):W187-91. pmid:24799436
- View Article
- PubMed/NCBI
- Google Scholar
51. Dar D, Shamir M, Mellin JR, Koutero M, Stern-Ginossar N, Cossart P, et al. Term-seq reveals abundant ribo-regulation of antibiotics resistance in bacteria. Science. 2016;352(6282):aad9822. pmid:27120414
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiss S, Sittka A, et al. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature. 2010;464(7286):250–5. pmid:20164839
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Ettwiller L, Buswell J, Yigit E, Schildkraut I. A novel enrichment strategy reveals unprecedented number of novel transcription start sites at single base resolution in a model prokaryote and the gut microbiome. BMC Genomics. 2016;17:199. pmid:26951544
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Backofen R, Amman F, Costa F, Findeiß S, Richter AS, Stadler PF. Bioinformatics of prokaryotic RNAs. RNA Biol. 2014;11(5):470–83. pmid:24755880
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Kazmierczak MJ, Wiedmann M, Boor KJ. Alternative sigma factors and their roles in bacterial virulence. Microbiol Mol Biol Rev. 2005;69(4):527–43. pmid:16339734
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Paget MS. Bacterial Sigma Factors and Anti-Sigma Factors: Structure, Function and Distribution. Biomolecules. 2015;5(3):1245–65. pmid:26131973
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Ryan D, Jenniches L, Reichardt S, Barquist L, Westermann AJ. A high-resolution transcriptome map identifies small RNA regulation of metabolism in the gut microbe Bacteroides thetaiotaomicron. Nat Commun. 2020;11(1):3557. pmid:32678091
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Sorek R, Cossart P. Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat Rev Genet. 2010;11(1):9–16. pmid:19935729
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Storz G, Vogel J, Wassarman KM. Regulation by small RNAs in bacteria: expanding frontiers. Mol Cell. 2011;43(6):880–91. pmid:21925377
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Westermann AJ, Vogel J. Cross-species RNA-seq for deciphering host-microbe interactions. Nat Rev Genet. 2021;22(6):361–78. pmid:33597744
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Innocenti N, Golumbeanu M, Fouquier d’Hérouël A, Lacoux C, Bonnin RA, Kennedy SP, et al. Whole-genome mapping of 5’ RNA ends in bacteria by tagged sequencing: a comprehensive view in Enterococcus faecalis. RNA. 2015;21(5):1018–30. pmid:25737579
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Witte Paz M, Vogel T, Nieselt K. TSS-Captur: a user-friendly pipeline for characterizing unclassified RNA transcripts. NAR Genom Bioinform. 2024;6(4):lqae168. pmid:39703424
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Dugar G, Herbig A, Förstner KU, Heidrich N, Reinhardt R, Nieselt K, et al. High-resolution transcriptome maps reveal strain-specific regulatory features of multiple Campylobacter jejuni isolates. PLoS Genet. 2013;9(5):e1003495. pmid:23696746
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref13] 13. Amman F, Wolfinger MT, Lorenz R, Hofacker IL, Stadler PF, Findeiß S. TSSAR: TSS annotation regime for dRNA-seq data. BMC Bioinformatics. 2014;15:89. pmid:24674136
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref14] 14. Jorjani H, Zavolan M. TSSer: an automated method to identify transcription start sites in prokaryotic genomes from differential RNA sequencing data. Bioinformatics. 2014;30(7):971–4. pmid:24371151
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref15] 15. Promworn Y, Kaewprommal P, Shaw PJ, Intarapanich A, Tongsima S, Piriyapongsa J. ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data. PLoS One. 2017;12(5):e0178483. pmid:28542466
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref16] 16. Čuklina J, Hahn J, Imakaev M, Omasits U, Förstner KU, Ljubimov N, et al. Genome-wide transcription start site mapping of Bradyrhizobium japonicum grown free-living or in symbiosis - a rich resource to identify new transcripts, proteins and to study gene regulation. BMC Genomics. 2016;17:302. pmid:27107716
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref17] 17. Bischler T, Tan HS, Nieselt K, Sharma CM. Differential RNA-seq (dRNA-seq) for annotation of transcriptional start sites and small RNAs in Helicobacter pylori. Methods. 2015;86:89–101. pmid:26091613
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref18] 18. Rohmer C, Dobritz R, Tuncbilek-Dere D, Lehmann E, Gerlach D, George SE, et al. Influence of Staphylococcus aureus Strain Background on Sa3int Phage Life Cycle Switches. Viruses. 2022;14(11):2471. pmid:36366569
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref19] 19. Förstner KU, Vogel J, Sharma CM. READemption-a tool for the computational analysis of deep-sequencing-based transcriptome data. Bioinformatics. 2014;30(23):3421–3. pmid:25123900
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref20] 20. Darling AE, Mau B, Perna NT. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010;5(6):e11147. pmid:20593022
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref21] 21. Boutard M, Ettwiller L, Cerisy T, Alberti A, Labadie K, Salanoubat M, et al. Global repositioning of transcription start sites in a plant-fermenting bacterium. Nat Commun. 2016;7:13783. pmid:27982035
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref22] 22. Hocq R, Jagtap S, Boutard M, Tolonen AC, Duval L, Pirayre A, et al. Genome-Wide TSS Distribution in Three Related Clostridia with Normalized Capp-Switch Sequencing. Microbiol Spectr. 2022;10(2):e0228821. pmid:35412381
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref23] 23. Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010;26(17):2204–7. pmid:20639541
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref24] 24. Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H. UpSet: Visualization of Intersecting Sets. IEEE Trans Vis Comput Graph. 2014;20(12):1983–92. pmid:26356912
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref25] 25. L’Yi S, Wang Q, Lekschas F, Gehlenborg N. Gosling: A Grammar-based Toolkit for Scalable and Interactive Genomics Data Visualization. IEEE Trans Vis Comput Graph. 2022;28(1):140–50. pmid:34596551
View Article
PubMed/NCBI
Google Scholar

[98] View Article

[99] PubMed/NCBI

[100] Google Scholar

[ref26] 26. Shneiderman B. The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings 1996 IEEE Symposium on Visual Languages. p. 336–43. https://doi.org/10.1109/vl.1996.545307

[ref27] 27. Freese NH, Norris DC, Loraine AE. Integrated genome browser: visual analytics platform for genomics. Bioinformatics. 2016;32(14):2089–95. pmid:27153568
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref28] 28. Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–92. pmid:22517427
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref29] 29. Balkin A, Plotnikov A, Konnova T, Shagimardanova E, Hamo H, Gogolev Y, et al. Cappable-seq RNA-sequencing data sets of Escherichia coli K-12 MG1655 treated with novobiocin, tetracycline, and rifampicin. Microbiol Resour Announc. 2025;14(2):e0119424. pmid:39727393
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref30] 30. Smith DH, Davis BD. Mode of action of novobiocin in Escherichia coli. J Bacteriol. 1967;93(1):71–9. pmid:5335903
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref31] 31. Floss HG, Yu T-W. Rifamycin-mode of action, resistance, and biosynthesis. Chem Rev. 2005;105(2):621–32. pmid:15700959
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref32] 32. Chopra I, Roberts M. Tetracycline antibiotics: mode of action, applications, molecular biology, and epidemiology of bacterial resistance. Microbiol Mol Biol Rev. 2001;65(2):232-60; second page, table of contents. pmid:11381101
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref33] 33. Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, Paulsen IT, et al. EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res. 2005;33(Database issue):D334-7. pmid:15608210
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref34] 34. Mainprize IL, Bean JD, Bouwman C, Kimber MS, Whitfield C. The UDP-glucose dehydrogenase of Escherichia coli K-12 displays substrate inhibition by NAD that is relieved by nucleotide triphosphates. J Biol Chem. 2013;288(32):23064–74. pmid:23792965
View Article
PubMed/NCBI
Google Scholar

[131] View Article

[132] PubMed/NCBI

[133] Google Scholar

[ref35] 35. Breazeale SD, Ribeiro AA, Raetz CRH. Oxidative decarboxylation of UDP-glucuronic acid in extracts of polymyxin-resistant Escherichia coli. Origin of lipid a species modified with 4-amino-4-deoxy-L-arabinose. J Biol Chem. 2002;277(4):2886–96. pmid:11706007
View Article
PubMed/NCBI
Google Scholar

[135] View Article

[136] PubMed/NCBI

[137] Google Scholar

[ref36] 36. Nishino K, Hsu F-F, Turk J, Cromie MJ, Wösten MMSM, Groisman EA. Identification of the lipopolysaccharide modifications controlled by the Salmonella PmrA/PmrB system mediating resistance to Fe(III) and Al(III). Mol Microbiol. 2006;61(3):645–54. pmid:16803591
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref37] 37. Revitt-Mills SA, Robinson A. Antibiotic-Induced Mutagenesis: Under the Microscope. Front Microbiol. 2020;11:585175. pmid:33193230
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref38] 38. Deylami J, Chng SS, Yong EH. Elucidating Antibiotic Permeation through the Escherichia coli Outer Membrane: Insights from Molecular Dynamics. J Chem Inf Model. 2024;64(21):8310–21. pmid:39480067
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref39] 39. Ontiveros-Palacios N, Cooke E, Nawrocki EP, Triebel S, Marz M, Rivas E, et al. Rfam 15: RNA families database in 2025. Nucleic Acids Res. 2025;53(D1):D258–67. pmid:39526405
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

[ref40] 40. Gruber AR, Bernhart SH, Lorenz R. The ViennaRNA web services. Methods Mol Biol. 2015;1269:307–26. pmid:25577387
View Article
PubMed/NCBI
Google Scholar

[155] View Article

[156] PubMed/NCBI

[157] Google Scholar

[ref41] 41. Wolfram-Schauerte M, Moskalchuk A, Pozhydaieva N, Rojas AAR, Schindler D, Kaiser S, et al. T4 phage RNA is NAD-capped and alters the NAD-cap epitranscriptome of Escherichia coli during infection through a phage-encoded decapping enzyme. openRxiv. 2024. http://dx.doi.org/10.1101/2024.04.04.588121

[ref42] 42. Putzeys L, Boon M, Lammens E-M, Kuznedelov K, Severinov K, Lavigne R. Development of ONT-cappable-seq to unravel the transcriptional landscape of Pseudomonas phages. Comput Struct Biotechnol J. 2022;20:2624–38. pmid:35685363
View Article
PubMed/NCBI
Google Scholar

[160] View Article

[161] PubMed/NCBI

[162] Google Scholar

[ref43] 43. Stazic D, Voß B. The complexity of bacterial transcriptomes. J Biotechnol. 2016;232:69–78. pmid:26450562
View Article
PubMed/NCBI
Google Scholar

[164] View Article

[165] PubMed/NCBI

[166] Google Scholar

[ref44] 44. Barquist L, Burge SW, Gardner PP. Studying RNA homology and conservation with infernal: from single sequences to RNA families. CP in Bioinformatics. 2016;54(1).
View Article
Google Scholar

[168] View Article

[169] Google Scholar

[ref45] 45. Kolmogorov A. Sulla determinazione empirica di una legge didistribuzione. Giorn Dell’inst Ital Degli Att. 1933;4:89–91.
View Article
Google Scholar

[171] View Article

[172] Google Scholar

[ref46] 46. Smirnov N. Table for estimating the goodness of fit of empirical distributions. Ann Math Statist. 1948;19(2):279–81.
View Article
Google Scholar

[174] View Article

[175] Google Scholar

[ref47] 47. Liu Y, Xie J. Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J Am Stat Assoc. 2020;115(529):393–402. pmid:33012899
View Article
PubMed/NCBI
Google Scholar

[177] View Article

[178] PubMed/NCBI

[179] Google Scholar

[ref48] 48. Teoria Statistica Delle Classi e Calcolo Delle Probabilità. Encyclopedia of Research Design. SAGE Publications, Inc.; 2010. http://dx.doi.org/10.4135/9781412961288.n455

[ref49] 49. Quinlan AR. BEDTools: the Swiss-Army tool for genome feature analysis. Curr Protoc Bioinformatics. 2014;47:11.12.1-34. pmid:25199790
View Article
PubMed/NCBI
Google Scholar

[182] View Article

[183] PubMed/NCBI

[184] Google Scholar

[ref50] 50. Ramírez F, Dündar F, Diehl S, Grüning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014;42(Web Server issue):W187-91. pmid:24799436
View Article
PubMed/NCBI
Google Scholar

[186] View Article

[187] PubMed/NCBI

[188] Google Scholar

[ref51] 51. Dar D, Shamir M, Mellin JR, Koutero M, Stern-Ginossar N, Cossart P, et al. Term-seq reveals abundant ribo-regulation of antibiotics resistance in bacteria. Science. 2016;352(6282):aad9822. pmid:27120414
View Article
PubMed/NCBI
Google Scholar

[190] View Article

[191] PubMed/NCBI

[192] Google Scholar

Figures

Abstract

Background

Results

Conclusions

1 Introduction

2 Methods

2.1 Comparative detection of TSS

2.2 Design process of TSSpredator-Web

2.3 Back-end of TSSpredator-Web

2.4 Web-interface

2.5 Data exploration & integration

3 Use case

4 Discussion and conclusion

Supporting information

S1 Fig. Overview of the five predefined sets of thresholds for Step height, Step factor and Enrichment factor for the TSS prediction with TSSpredator.

S2 Fig. Overview of how different parameter sets influence TSS calling in TSSpredator.

S3 Fig. Screenshot of drag & drop process for file distribution.

S4 Fig. Secondary structure of the UTR region of the gene ugd as predicted by RNAfold.

Supplementary Material. File containing the S1, S2 & S3 Algorithms.

Acknowledgments

References