Prometheus: omics portals for interkingdom comparative genomic analyses

Functional analyses of genes are crucial for unveiling biological responses, for genetic engineering, and for developing new medicines. However, functional analyses have largely been restricted to model organisms, representing a major hurdle for functional studies and industrial applications. To resolve this, comparative genome analyses can be used to provide clues to gene functions as well as their evolutionary history. To this end, we present Prometheus (http://prometheus.kobic.re.kr),web-based omics portal that contains more than 17,215 sequences from prokaryotic and eukaryotic genomes. This portal supports interkingdom comparative analyses via a domain architecture-based gene identification system, Gene Search, and users can easily and rapidly identify single or entire gene sets in specific pathways. Bioinformatics tools for further analyses are provided in Prometheus or through BioExpress, a cloud-based bioinformatics analysis platform. Prometheus suggests a new paradigm for comparative analyses with large amounts of genomic information.

in humans 2,3 . Thus, the trend of functional analyses has been transferred from candidate gene research to 10 genome-wide research. However, this flood of information has largely been restricted to model 11 organisms, and it has been challenging for researchers to apply these data to newly sequenced genomes.

12
Since next-generation sequencing (NGS) technology was developed in the mid-2000s, an 13 enormous amount of genomic information has been analyzed and amassed in public databases. As the 14 numbers of sequenced genomes increased, many tools and pipelines were developed to investigate gene 15 functions, identify gene families, and perform comparative genomic analyses. However, the application of 16 comparative analyses is restricted to functional gene annotations and newly sequenced genome analyses.

17
Newly sequenced genomes are initially compared to those that have previously been analyzed, including 18 genomes of closely related species, to provide information on genome structure changes and gene 19 repertoires. Such comparisons can also predict gene paralogues, which are genes related by duplication 20 events, or orthologues, which are those related by speciation events 4-6 . As orthologues tend to be more 21 similar in function that paralogues 7 , they are widely used for functional gene annotations 8 . Moreover, 22 recent gene-of-interest studies that include multigenome orthologues offer insight into their mechanisms 23 for adapting to the environment 9,10 . However, these comparative genomic analyses were performed at 24 genome-, genus-, or kingdom-wide levels, thereby restricting comparisons to the species, family, or order 25 level 11-13 . To understand the evolution of genes of interest more precisely, interkingdom analyses are 26 needed, particularly because many genes in eukaryotic genomes have universal common ancestries in 27 Bacteria and Archaea 14 .

28
Here, we report an omics portal for interkingdom comparative genomic analyses named 29 Prometheus (http://prometheus.kobic.re.kr). We collected 17,215 sequences from 16,730 species and 30 constructed four primary databases to provide basic genome information, with more detailed information 31 on individual genes provided in secondary databases. Researchers can then access detailed information on 32 genes of interest, such as gene structure, domain architecture, subcellular localization, orthologues, and 33 paralogues, as well as their sequences. In particular, Prometheus provides Gene Search to identify genes 34 4 of interest based on their domain architectures from prokaryotes to eukaryotes and performs various 1 comparative analyses, such as comparison of chromosome sequences, sequence alignment, and 2 phylogenetic analyses. Furthermore, researchers can perform various bioinformatics analyses with these 3 and their own sequencing data in a cloud-based platform, BioExpress. Prometheus suggests a new 4 paradigm for genome research, from single genes of interest to entire gene pathways. Prometheus furnishes data search, configuration of data analyses, data visualization, and storage 9 of users' own data. The interface is implemented using a Hypertext Markup Language (HTML), 10 cascading style sheets (CSS) and uses a jQuery JavaScript library (jQuery) to modify web page contents.

11
To visualize data, dynamic web interface is constructed by Asynchronous JavaScript and XML (Ajax) 12 using JavaScript Object Notation (JSON) data format. Furthermore, genome browser was constructed 13 using Scalable Vector Graphics (SVG) and phylogenetic viewer is constructed using JavaScript. Web   Table 1) 23 is arranged by taxonomic rank (obtained from NCBI), which users can access by clicking the species 24 name in the taxonomic tree or using a key word search. This General Information provides details on 25 genome assembly, annotation, and taxonomy. In eukaryotic genomes, distinct versions of genome 26 assembly and annotation were provided, and so each version is stored separately ( Figure 2B). In 27 prokaryotic genomes, genomic information is separated by strain to support metagenomics analyses.  Table 2). Taxonomic information in Genome Archive is stored in a taxonomy database 34 6 and general information of genome assembly and annotation is stored in a genome report databank.