Essential genes, which form the basis of life activities, are crucial for the survival of organisms. Essential genes tend to be located in operons, but how they are distributed in operons is still unclear for most prokaryotes. In order to clarify the general rule of position preference of essential genes in operons, an index of the average position of genes in an operon was proposed, and the distributions of essential and non-essential genes in operons in 51 bacterial genomes and two archaeal genomes were analyzed based on this new index. Consequently, essential genes were found to preferentially occupy the front positions of the operons, which tend to be expressed at higher levels.
Citation: Liu T, Luo H, Gao F (2021) Position preference of essential genes in prokaryotic operons. PLoS ONE 16(4): e0250380. https://doi.org/10.1371/journal.pone.0250380
Editor: Kang Ning, Huazhong University of Science and Technology, CHINA
Received: November 5, 2020; Accepted: April 5, 2021; Published: April 22, 2021
Copyright: © 2021 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files.
Funding: FG;2018YFA0903700; National Key Research and Development Program of China; https://service.most.gov.cn/ FG;21621004,31571358; National Natural Science Foundation of China; http://www.nsfc.gov.cn/ HL;31801104; National Natural Science Foundation of China; http://www.nsfc.gov.cn/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: COG, cluster of orthologous group; EG, essential gene; NEG, non-essential gene
Essential genes usually refer to genes whose inactivation or loss causes either severe growth impairment, irreversible growth arrest, or cell death . Essential genes are necessary for cells or organisms to survive under specific conditions [2, 3]. These genes constitute the minimal gene set required for living cells. Therefore, the functions encoded by this gene set are considered the basis of life [4, 5]. The study of essential genes has become a hot topic, as it is helpful to explore the origin and evolution of life, as well as provide an important basis for discovery of drug targets [6, 7], treatment of diseases [1, 8], and design of minimal genomes [9, 10]. Currently, essential genes can be identified through a series of experimental methods, including transposon mutagenesis , antisense RNA silencing , single-gene knockout technology , and other methods. An increasing number of essential genes have been genome-widely identified, and this facilitates the study of characteristic differences between essential and non-essential genes. For example, in prokaryotes, essential genes are found to be preferentially located on the leading strand of chromosomes [14, 15], and further studies have shown that only those with certain COG functional subclasses are preferentially located on the leading strand [16, 17]. Proteins corresponding to essential genes were enriched in the cytoplasm, and the proportion of non-essential genes in the plasma membrane, periplasm, outer membrane, cell wall, and extracellular space is significantly higher than that of essential genes . Essential genes in genomic islands are significantly fewer than those outside of genomic islands . Compared with non-essential genes, bacterial essential genes tend to encode core functions related to transcription, translation and replication [4, 20], and have a higher ratio of enzymes . In addition, essential genes have higher expression levels than non-essential genes [22, 23] and are more evolutionarily conserved [24, 25].
An operon is the set of one or several genes and their associated regulatory elements, which are transcribed as a polycistronic unit [26, 27]. Operons are widely used as basic transcriptional and functional units . Regarding operon formation, the most widely accepted theory is the co-regulation hypothesis, which assumes that operons are formed by rearranging two or more genes together, while maintaining this structure by selecting a coordinated transcriptional regulation and translation of functionally related proteins [29, 30]. Regarding the evolution of operons, the regulatory model and selfish model are two generally accepted models . The former emphasizes the advantage of co-transcription for regulatory purposes, while the latter emphasizes the advantage of genome proximity for co-transfer of adjacent functions . Other proposed operon evolution models have received less attention, mainly because they do not conform to the existing evidence . According to the co-regulation hypothesis, essential genes are preferentially located in operons, which has been confirmed in Escherichia coli [29, 30, 34]. In addition, studies have found that essential genes are not only preferentially located in operons, but also often occupy the first position in operons . However, this research has certain limitations, such as the relatively small number of prokaryotic genomes analyzed, and conclusions drawn without considering the influence of the proportion of essential genes in an operon on which gene occupies the first position. In particular, focusing only on the preference of the first operon position does not lead to a general conclusion on the position preference of essential genes in operons.
With the wide application of high-throughput experimental technologies in the identification of essential genes, essential genes data has increased rapidly, and the essential genes database DEG is also constantly updated to include these essential genes data. However, at present, the distribution of essential genes in most prokaryotic operons listed in DEG 15 is not clear. As reliable information in the operons database becomes available for more prokaryotic genomes, a systematic study on the distribution of essential genes in operons in prokaryotic genomes is possible.
In the present work, the preferences of essential and non-essential genes for special positions in operons were studied for 53 prokaryotic genomes, including 51 bacteria and 2 archaea. By analyzing the distribution of essential genes in operons, it was found that essential genes preferentially occupy the first position of operons, as reported in a previous study. However, after removing operons in which all genes are essential genes, the rule becomes invalid. Here, an index of the average position of genes in an operon is proposed to measure the position preference of essential genes in operons. By comparing the average positions of essential and non-essential genes in operons, it was found that essential genes tend to occupy the front positions of operons compared to non-essential genes, which was also confirmed by analyzing the proportion of essential genes located in the first half of operons.
Materials and methods
The essential genes data of the 53 prokaryotic genomes studied here were downloaded from the DEG database (version 15)  (http://essentialgene.org/). For some genomes, essential genes have been identified through different experimental methods. In this study, only one essential genes set was reserved by considering the reliability of the method used or the results. The corresponding operons data were obtained from the DOOR database  (http://220.127.116.11/DOOR3). For the prokaryotic genome with multiple chromosomes, only the essential genes on the main chromosome were studied. For the operons data in the DOOR database, only multi-gene operons were regarded as operons.
Determination of DNA strands
Index of average position of genes in an operon
Assuming that an operon contains n genes, including n1 essential genes and n2 non-essential genes (1≤n1<n, 1≤n2<n), the position occupied by a certain gene is x, and the average position of genes in an operon is defined as (1)
Only operons containing at least one essential gene were considered. It should be noted that if all the genes in an operon are essential genes, the position is all occupied by an essential gene. Therefore, only the positions in operons in which both essential and non-essential genes exist were analyzed.
Results and discussion
Position preference of essential genes in operons
Position preference of essential and non-essential genes in special positions of operons.
Essential genes in E. coli have been found to be enriched in operons , but whether this is a common feature of other bacteria and archaea needs to be verified. There was a clear trend for essential genes to occupy operons across 44 prokaryotic genomes (P ≤ 0.05, Fisher’s exact test) (S1 Table in S1 File). Further, the statistical significance was very high in 33 of these conditions (P < 2.0 × 10−4, Fisher’s exact test) (S1 Table in S1 File).
It was also found that most of the essential genes preferentially occupied the first position of the operon they were located in (Fig 1). Among them, in 44 genomes, there are more than 50% of operons in which the essential genes occupy the first position (S2 Table in S1 File), consistent with previous results. Among 39 genomes, compared with non-essential genes, essential genes tend to occupy the first position of the operon (P ≤ 0.05, Fisher’s exact test) (S2 Table in S1 File). We also studied the distribution of essential genes in operons containing two and three genes, and performed a chi-squared test, which confirmed that essential genes preferentially occupy the first position in operons of most species (P ≤ 0.05; S3 Table in S1 File). In addition, the distribution of non-essential genes in the operons was analyzed. Consequently, in 53 prokaryotic genomes, non-essential genes were found to frequently occupy the last position of the operon (Fig 1). Among them, in 51 genomes, in more than 50% of operons, non-essential genes occupy the last position (S2 Table in S1 File). In 37 genomes, compared with essential genes, non-essential genes tend to occupy the last position of the operon (P ≤ 0.05, Fisher’s exact test) (S2 Table in S1 File).
The heatmap was plotted using the heatmap function in the R package. The cells in the heatmap correspond to the proportion of genes under different conditions, and the value range is displayed in different colors. The color bar on the left side of the heatmap corresponds to the phylum classification of the species. Hierarchical clustering of analysis results in two dimensions is represented by a tree diagram. Species whose distribution of essential genes occupies the first position in less than 50% of operons are shown in red square boxes b-d, and species whose distribution of non-essential genes occupies the last position in less than 50% of operons are shown in red square box a.
We found that the positions occupied by essential and non-essential genes were related to the proportion of essential genes out of all the genes in operons (Fig 1). As can be seen from Fig 1, the essential genes of Mycoplasma genitalium G37 and Mycoplasma pneumoniae M129 account for a higher proportion of the genes in operons, resulting in a lower proportion of non-essential genes occupying the last position of the operon (box a in Fig 1). The essential genes of Staphylococcus aureus N315, Bacteroides thetaiotaomicron VPI-5482, Streptococcus pneumoniae TIGR4, Pseudomonas aeruginosa UCBPP-PA14, Campylobacter jejuni NCTC 11168, Bacillus thuringiensis BMB171, Helicobacter pylori 26695, Salmonella enterica serovar Typhimurium 14028S, and Salmonella Typhimurium LT2 account for a low proportion of the genes in operons, resulting in a low proportion of essential genes occupying the first position of operons (boxes b-d in Fig 1). The Pearson correlation coefficient  between the proportion of essential genes occupying the first position of operons and the proportion of essential genes in operons was 0.88, while the Pearson correlation coefficient between the proportion of non-essential genes occupying the last position of operons and the proportion of essential genes in operons was −0.52. From these 53 prokaryotic genomes, the rule can be summarized as follows: the higher the proportion of essential genes in the genes in operons, the higher the proportion of essential genes occupying the first position of operons, and the lower the proportion of non-essential genes occupying the last position of operons. Conversely, the lower the proportion of essential genes in the genes in operons, the lower the proportion of essential genes occupying the first position of operons, and the higher the proportion of non-essential genes occupying the last position of operons.
Position preference of essential genes in general positions of operons.
It should be noted that if all the genes in an operon are essential genes, the first position is occupied by an essential gene. Therefore, operons whose genes are exclusively essential genes were removed from analysis, and then the distribution of essential genes in hybrid operons (operons containing both essential and non-essential genes), was analyzed again (S2 Table in S1 File). It was found that among 53 prokaryotic genomes, the number of genomes in which essential genes occupy the first position in more than 50% of the operons was reduced from 44 to 19 under this analysis (S2 Table in S1 File). The average position of essential genes in hybrid operons and the proportion of essential genes in the first half of the hybrid operons were studied (Table 1). Consequently, by analyzing the average positions of essential and non-essential genes in hybrid operons of 53 prokaryotic genomes, it was found that essential genes preferentially occupied the front positions of operons compared to non-essential genes (P = 0.004257, Student’s t-test). We also calculated the DEG, the relative position of the essential genes in operons, which is defined in Eq (4). If the relative position DEG is negative, it means that the average position of essential genes is in front of the average position of all genes, whereas if the relative position DEG is positive, it means that the average position of essential genes is behind the average position of all genes. As shown in Fig 2, the relative positions of essential genes in most genomes were negative, indicating that essential genes were biased toward the front positions of operons. Compared with the random arrangement result, the relative position of essential genes is different from zero, and essential genes tend to be located in the front positions of operons (P = 9.772e-07, Student’s t-test).
In the left part of the figure, the size of the dot represents the number of essential genes occupying the first half of operons, and the color of the dot represents the proportion of essential genes occupying the first half of operons. The part on the left is sorted according to the proportion of essential genes in the first half of operons from high to low. In the right part of the figure, the size of the dot represents the number of operons, and the color of the dot represents the relative positions of essential genes.
We also studied the proportion of essential genes in the first half of hybrid operons. Please note that if the number of genes in the operon is odd, the middle gene is considered to be in the first half of the operon. The bubblechart of the relative position of essential genes in operons and the proportion of essential genes occupying the first half of operons is shown in Fig 2. It was found that the relative positions of essential genes in the genomes with a lower proportion of essential genes occupying the first half of operons tended to be positive. The Pearson correlation coefficient between them was −0.78. By analyzing the relative position of essential genes in operons and the proportion of essential genes occupying the first half of operons in 53 prokaryotic genomes, it was confirmed that essential genes tend to occupy the front positions of operons. Moreover, the Pearson correlation coefficients between DEG and the proportion of essential genes in operons was only 0.02, while the Pearson correlation coefficients between the proportion of essential genes occupying the first half of operons and the proportion of essential genes in operons was −0.12. This indicates that these results are independent of the proportion of essential genes in operons. Therefore, compared to the previous result that essential genes tend to occupy the first position of operons , the present conclusion on the position preference of essential genes in operons is more general and reliable.
The possible reason for position preference of essential genes in operons
Depending on whether the operon contains essential genes, operons can be divided into three categories: operons containing only essential genes, operons containing both essential and non-essential genes, and operons containing only non-essential genes. By analyzing these three types of operons in 53 prokaryotic genomes, we found that essential genes have an impact on both gene number and the location of operons. Operons containing essential genes were more biased to be on the leading strand, and the average gene number of operons containing essential and non-essential genes was higher (S4 Table in S1 File).
Previous studies have shown that there is a strong relationship between gene expression and the number, length, and order of genes in operons . In operons, the distance from the start of the gene to the end of the operon is defined as the transcription distance. Gene expression increases with an increase in the transcription distance; that is, gene expression increases with an increase in the length of the operon [42, 43]. Changes in the order of genes in operons also affect gene expression. The gene farthest from the end of the operon (or the gene closer to the promoter) was always more expressed. That is, the expression level of the gene in the first position is higher than that of the same gene at other positions . In 46 prokaryotic genomes, the average position of essential genes is generally in front of the average position of non-essential genes, which indicates that essential genes tend to have a higher expression level than non-essential genes (Table 1). Operons containing essential and non-essential genes have more genes, thereby increasing the expression of genes in operons. This is consistent with the fact that essential genes are crucial genes with higher expression levels and encode proteins that perform important functions. It also explains the fact that essential genes tend to be located in operons rather than alone. This work will be of great significance for understanding the functional basis of genome organization and the practical application of synthetic biology.
In the present study, the position preference of essential genes in prokaryotic operons was explored systematically. The result of a previous study showed that essential genes tend to occupy the first position of operons was related to the proportion of essential genes in operons. To solve this problem, a new index, the average position of genes in an operon, is proposed, which better reflects the position preference of essential genes in operons. Thus, previous shortcomings were avoided, and more general and reliable conclusions were reached. Our work provides new insights into related research on synthetic biology, such as the construction of cell factories and the design of artificial genomes.
The authors would like to thank Prof. Chun-Ting Zhang for the invaluable assistance and inspiring discussion.
- 1. Rancati G, Moffat J, Typas A, Pavelka N. Emerging and evolving concepts in gene essentiality. Nature Reviews Genetics. 2018;19(1):34. pmid:29033457
- 2. Koonin EV. How many genes can make a cell: the minimal-gene-set concept. Annual review of genomics and human genetics. 2000;1(1):99–116. pmid:11701626
- 3. Bartha I, di Iulio J, Venter JC, Telenti A. Human gene essentiality. Nature Reviews Genetics. 2018;19(1):51. pmid:29082913
- 4. Kobayashi K, Ehrlich SD, Albertini A, Amati G, Andersen K, Arnaud M, et al. Essential Bacillus subtilis genes. Proceedings of the National Academy of Sciences. 2003;100(8):4678–83. pmid:12682299
- 5. Itaya M. An estimation of minimal genome size required for life. FEBS letters. 1995;362(3):257–60. pmid:7729508
- 6. Galperin MY, Koonin EV. Searching for drug targets in microbial genomes. Current Opinion in Biotechnology. 1999;10(6):571–8. pmid:10600691
- 7. Yan F, Gao F. A systematic strategy for the investigation of vaccines and drugs targeting bacteria. Computational and Structural Biotechnology Journal. 2020;18:1525–38. pmid:32637049
- 8. Chen P, Wang D, Chen H, Zhou Z, He X. The nonessentiality of essential genes in yeast provides therapeutic insights into a human disease. Genome Research. 2016;26(10):1355–62. pmid:27440870
- 9. Juhas M, Eberl L, Glass JI. Essence of life: essential genes of minimal genomes. Trends in Cell Biology. 2011;21(10):562–8. pmid:21889892
- 10. Hutchison CA, Chuang R-Y, Noskov VN, Assad-Garcia N, Deerinck TJ, Ellisman MH, et al. Design and synthesis of a minimal bacterial genome. Science. 2016;351: aad6253. pmid:27013737
- 11. Hutchison CA, Peterson SN, Gill SR, Cline RT, White O, Fraser CM, et al. Global transposon mutagenesis and a minimal Mycoplasma genome. Science. 1999;286(5447):2165–9. pmid:10591650
- 12. Forsyth RA, Haselbeck RJ, Ohlsen KL, Yamamoto RT, Xu H, Trawick JD, et al. A genome-wide strategy for the identification of essential genes in Staphylococcus aureus. Molecular Microbiology. 2002;43(6):1387–400. pmid:11952893
- 13. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Molecular Systems Biology. 2006;2(1):2006.0008. pmid:16738554
- 14. Rocha EP, Danchin A. Essentiality, not expressiveness, drives gene-strand bias in bacteria. Nature Genetics. 2003;34(4):377–8. pmid:12847524
- 15. Repar J, Warnecke T. Non-random inversion landscapes in prokaryotic genomes are shaped by heterogeneous selection pressures. Molecular Biology and Evolution. 2017;34(8):1902–11. pmid:28407093
- 16. Lin Y, Gao F, Zhang C-T. Functionality of essential genes drives gene strand-bias in bacterial genomes. Biochemical and Biophysical Research Communications. 2010;396(2):472–6. pmid:20417622
- 17. Price MN, Alm EJ, Arkin AP. Interruptions in gene expression drive highly expressed operons to the leading strand of DNA replication. Nucleic Acids Research. 2005;33(10):3224–34. pmid:15942025
- 18. Peng C, Gao F. Protein localization analysis of essential genes in prokaryotes. Scientific Reports. 2014;4:6001. pmid:25105358
- 19. Zhang X, Peng C, Zhang G, Gao F. Comparative analysis of essential genes in prokaryotic genomic islands. Scientific Reports. 2015;5(1):12561. pmid:26223387
- 20. Mushegian AR, Koonin EV. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proceedings of the National Academy of Sciences. 1996;93(19):10268–73. pmid:8816789
- 21. Gao F, Zhang RR. Enzymes are enriched in bacterial essential genes. PloS One. 2011;6(6):e21683. pmid:21738765
- 22. Chen H, Zhang Z, Jiang S, Li R, Li W, Zhao C, et al. New insights on human essential genes based on integrated analysis and the construction of the HEGIAP web-based platform. Briefings in Bioinformatics. 2020;21(4):1397–410. pmid:31504171
- 23. Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, Wei JJ, et al. Identification and characterization of essential genes in the human genome. Science. 2015;350(6264):1096–101. pmid:26472758
- 24. Luo H, Gao F, Lin Y. Evolutionary conservation analysis between the essential and nonessential genes in bacterial genomes. Scientific Reports. 2015;5(1):13210. pmid:26272053
- 25. Jordan IK, Rogozin IB, Wolf YI, Koonin EV. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Research. 2002;12(6):962–8. pmid:12045149
- 26. Jacob F, Monod J. Genetic regulatory mechanisms in synthesis of proteins. J Mol Biol. 1961;3(3):318–56. pmid:13718526
- 27. Huerta AM, Salgado H, Thieffry D, Collado-Vides J. RegulonDB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Research. 1998;26(1):55–9. pmid:9399800
- 28. Mao X, Ma Q, Zhou C, Chen X, Zhang H, Yang J, et al. DOOR 2.0: presenting operons and their functions through dynamic and integrated views. Nucleic Acids Research. 2014;42(D1):D654–9. pmid:24214966
- 29. Pál C, Hurst LD. Evidence against the selfish operon theory. Trends in Genetics. 2004;20(6):232–4. pmid:15145575
- 30. Price MN, Huang KH, Arkin AP, Alm EJ. Operon formation is driven by co-regulation and not by horizontal gene transfer. Genome Research. 2005;15(6):809–19. pmid:15930492
- 31. Lawrence JG, Roth JR. Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics. 1996;143(4):1843–60. pmid:8844169
- 32. Rocha EP. The organization of the bacterial genome. Annual Review of Genetics. 2008;42:211–33. pmid:18605898
- 33. Lawrence JG. Gene organization: selection, selfishness, and serendipity. Annual Reviews in Microbiology. 2003;57(1):419–40.
- 34. Okuda S, Kawashima S, Kobayashi K, Ogasawara N, Kanehisa M, Goto S. Characterization of relationships between transcriptional units and operon structures in Bacillus subtilis and Escherichia coli. BMC Genomics. 2007;8(1):48. pmid:17298663
- 35. Grazziotin AL, Vidal NM, Venancio TM. Uncovering major genomic features of essential genes in Bacteria and a methanogenic Archaea. The FEBS Journal. 2015;282(17):3395–411. pmid:26084810
- 36. Luo H, Lin Y, Liu T, Lai F-L, Zhang C-T, Gao F, et al. DEG 15, an update of the Database of Essential Genes that includes built-in analysis tools. Nucleic Acids Research. 2021;49(D1):D677–86. pmid:33095861
- 37. Luo H, Gao F. DoriC 10.0: an updated database of replication origins in prokaryotic genomes including chromosomes and plasmids. Nucleic Acids Research. 2019;47(D1):D74–7. pmid:30364951
- 38. Gao F, Luo H, Zhang C-T. DoriC 5.0: an updated database of oriC regions in both bacterial and archaeal genomes. Nucleic Acids Research. 2012;41(D1):D90–3. pmid:23093601
- 39. Price MN, Arkin AP, Alm EJ. The life-cycle of operons. PLoS Genet. 2006;2(6):e96. pmid:16789824
- 40. Cohen I, Huang Y, Chen J, Benesty J. Pearson Correlation Coefficient. 2009;(Chapter 5):In Noise Reduction in Speech Processing (Benesty J, Chen J, Huang Y and Cohen I, eds), pp. 1–4. Springer Berlin Heidelberg, Berlin, Heidelberg.
- 41. Lim HN, Lee Y, Hussein R. Fundamental relationship between operon organization and gene expression. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(26):10626–31. pmid:21670266
- 42. Nishizaki T, Tsuge K, Itaya M, Doi N, Yanagawa H. Metabolic engineering of carotenoid biosynthesis in Escherichia coli by ordered gene assembly in Bacillus subtilis. Applied and Environmental Microbiology. 2007;73(4):1355–61. pmid:17194842
- 43. Kovács K, Hurst LD, Papp B. Stochasticity in protein levels drives colinearity of gene order in metabolic operons of Escherichia coli. PLoS Biol. 2009;7(5):e1000115. pmid:19492041