Position preference of essential genes in prokaryotic operons

Essential genes, which form the basis of life activities, are crucial for the survival of organisms. Essential genes tend to be located in operons, but how they are distributed in operons is still unclear for most prokaryotes. In order to clarify the general rule of position preference of essential genes in operons, an index of the average position of genes in an operon was proposed, and the distributions of essential and non-essential genes in operons in 51 bacterial genomes and two archaeal genomes were analyzed based on this new index. Consequently, essential genes were found to preferentially occupy the front positions of the operons, which tend to be expressed at higher levels.


Introduction
Essential genes usually refer to genes whose inactivation or loss causes either severe growth impairment, irreversible growth arrest, or cell death [1]. Essential genes are necessary for cells or organisms to survive under specific conditions [2,3]. These genes constitute the minimal gene set required for living cells. Therefore, the functions encoded by this gene set are considered the basis of life [4,5]. The study of essential genes has become a hot topic, as it is helpful to explore the origin and evolution of life, as well as provide an important basis for discovery of drug targets [6,7], treatment of diseases [1,8], and design of minimal genomes [9,10]. Currently, essential genes can be identified through a series of experimental methods, including transposon mutagenesis [11], antisense RNA silencing [12], single-gene knockout technology [13], and other methods. An increasing number of essential genes have been genome-widely identified, and this facilitates the study of characteristic differences between essential and nonessential genes. For example, in prokaryotes, essential genes are found to be preferentially located on the leading strand of chromosomes [14,15], and further studies have shown that only those with certain COG functional subclasses are preferentially located on the leading strand [16,17]. Proteins corresponding to essential genes were enriched in the cytoplasm, and the proportion of non-essential genes in the plasma membrane, periplasm, outer membrane, cell wall, and extracellular space is significantly higher than that of essential genes [18]. Essential genes in genomic islands are significantly fewer than those outside of genomic islands [19]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Compared with non-essential genes, bacterial essential genes tend to encode core functions related to transcription, translation and replication [4,20], and have a higher ratio of enzymes [21]. In addition, essential genes have higher expression levels than non-essential genes [22,23] and are more evolutionarily conserved [24,25].
An operon is the set of one or several genes and their associated regulatory elements, which are transcribed as a polycistronic unit [26,27]. Operons are widely used as basic transcriptional and functional units [28]. Regarding operon formation, the most widely accepted theory is the co-regulation hypothesis, which assumes that operons are formed by rearranging two or more genes together, while maintaining this structure by selecting a coordinated transcriptional regulation and translation of functionally related proteins [29,30]. Regarding the evolution of operons, the regulatory model and selfish model are two generally accepted models [31]. The former emphasizes the advantage of co-transcription for regulatory purposes, while the latter emphasizes the advantage of genome proximity for co-transfer of adjacent functions [32]. Other proposed operon evolution models have received less attention, mainly because they do not conform to the existing evidence [33]. According to the co-regulation hypothesis, essential genes are preferentially located in operons, which has been confirmed in Escherichia coli [29,30,34]. In addition, studies have found that essential genes are not only preferentially located in operons, but also often occupy the first position in operons [35]. However, this research has certain limitations, such as the relatively small number of prokaryotic genomes analyzed, and conclusions drawn without considering the influence of the proportion of essential genes in an operon on which gene occupies the first position. In particular, focusing only on the preference of the first operon position does not lead to a general conclusion on the position preference of essential genes in operons.
With the wide application of high-throughput experimental technologies in the identification of essential genes, essential genes data has increased rapidly, and the essential genes database DEG is also constantly updated to include these essential genes data. However, at present, the distribution of essential genes in most prokaryotic operons listed in DEG 15 is not clear. As reliable information in the operons database becomes available for more prokaryotic genomes, a systematic study on the distribution of essential genes in operons in prokaryotic genomes is possible.
In the present work, the preferences of essential and non-essential genes for special positions in operons were studied for 53 prokaryotic genomes, including 51 bacteria and 2 archaea. By analyzing the distribution of essential genes in operons, it was found that essential genes preferentially occupy the first position of operons, as reported in a previous study. However, after removing operons in which all genes are essential genes, the rule becomes invalid. Here, an index of the average position of genes in an operon is proposed to measure the position preference of essential genes in operons. By comparing the average positions of essential and non-essential genes in operons, it was found that essential genes tend to occupy the front positions of operons compared to non-essential genes, which was also confirmed by analyzing the proportion of essential genes located in the first half of operons.

Data source
The essential genes data of the 53 prokaryotic genomes studied here were downloaded from the DEG database (version 15) [36] (http://essentialgene.org/). For some genomes, essential genes have been identified through different experimental methods. In this study, only one essential genes set was reserved by considering the reliability of the method used or the results. The corresponding operons data were obtained from the DOOR database [28] (http://161.117.81.224/DOOR3). For the prokaryotic genome with multiple chromosomes, only the essential genes on the main chromosome were studied. For the operons data in the DOOR database, only multi-gene operons were regarded as operons.

Determination of DNA strands
The replication origins and termini were derived from the DoriC database [37,38] (http:// tubic.tju.edu.cn/doric/), based on which the leading and lagging strands for each genome can be determined.

Index of average position of genes in an operon
Assuming that an operon contains n genes, including n 1 essential genes and n 2 non-essential genes (1�n 1 <n, 1�n 2 <n), the position occupied by a certain gene is x, and the average position of genes in an operon is defined as Similarly, the average position of essential genes in an operon is And the average position of non-essential genes in an operon is And the relative position of essential genes in an operon is calculated as follows: Only operons containing at least one essential gene were considered. It should be noted that if all the genes in an operon are essential genes, the position is all occupied by an essential gene. Therefore, only the positions in operons in which both essential and non-essential genes exist were analyzed.

Position preference of essential genes in operons
Position preference of essential and non-essential genes in special positions of operons. Essential genes in E. coli have been found to be enriched in operons [39], but whether this is a common feature of other bacteria and archaea needs to be verified. There was a clear trend for essential genes to occupy operons across 44 prokaryotic genomes (P � 0.05, Fisher's exact test) (S1 Table in S1 File). Further, the statistical significance was very high in 33 of these conditions (P < 2.0 × 10 −4 , Fisher's exact test) (S1 Table in S1 File).
It was also found that most of the essential genes preferentially occupied the first position of the operon they were located in (Fig 1). Among them, in 44 genomes, there are more than 50% of operons in which the essential genes occupy the first position (S2 Table in S1 File), consistent with previous results. Among 39 genomes, compared with non-essential genes, essential Hierarchical clustering of analysis results in two dimensions is represented by a tree diagram. Species whose distribution of essential genes occupies the first position in less than 50% of operons are shown in red square boxes b-d, and species whose distribution of non-essential genes occupies the last position in less than 50% of operons are shown in red square box a. https://doi.org/10.1371/journal.pone.0250380.g001

PLOS ONE
Position preference of essential genes in prokaryotic operons genes tend to occupy the first position of the operon (P � 0.05, Fisher's exact test) (S2 Table in S1 File). We also studied the distribution of essential genes in operons containing two and three genes, and performed a chi-squared test, which confirmed that essential genes preferentially occupy the first position in operons of most species (P � 0.05; S3 Table in S1 File). In addition, the distribution of non-essential genes in the operons was analyzed. Consequently, in 53 prokaryotic genomes, non-essential genes were found to frequently occupy the last position of the operon (Fig 1). Among them, in 51 genomes, in more than 50% of operons, nonessential genes occupy the last position (S2 Table in S1 File). In 37 genomes, compared with essential genes, non-essential genes tend to occupy the last position of the operon (P � 0.05, Fisher's exact test) (S2 Table in S1 File).
We found that the positions occupied by essential and non-essential genes were related to the proportion of essential genes out of all the genes in operons (Fig 1). As can be seen from Fig 1, the essential genes of Mycoplasma genitalium G37 and Mycoplasma pneumoniae M129 account for a higher proportion of the genes in operons, resulting in a lower proportion of non-essential genes occupying the last position of the operon (box a in Fig 1). The essential genes of Staphylococcus aureus N315, Bacteroides thetaiotaomicron VPI-5482, Streptococcus pneumoniae TIGR4, Pseudomonas aeruginosa UCBPP-PA14, Campylobacter jejuni NCTC 11168, Bacillus thuringiensis BMB171, Helicobacter pylori 26695, Salmonella enterica serovar Typhimurium 14028S, and Salmonella Typhimurium LT2 account for a low proportion of the genes in operons, resulting in a low proportion of essential genes occupying the first position of operons (boxes b-d in Fig 1). The Pearson correlation coefficient [40] between the proportion of essential genes occupying the first position of operons and the proportion of essential genes in operons was 0.88, while the Pearson correlation coefficient between the proportion of non-essential genes occupying the last position of operons and the proportion of essential genes in operons was −0.52. From these 53 prokaryotic genomes, the rule can be summarized as follows: the higher the proportion of essential genes in the genes in operons, the higher the proportion of essential genes occupying the first position of operons, and the lower the proportion of non-essential genes occupying the last position of operons. Conversely, the lower the proportion of essential genes in the genes in operons, the lower the proportion of essential genes occupying the first position of operons, and the higher the proportion of non-essential genes occupying the last position of operons.
Position preference of essential genes in general positions of operons. It should be noted that if all the genes in an operon are essential genes, the first position is occupied by an essential gene. Therefore, operons whose genes are exclusively essential genes were removed from analysis, and then the distribution of essential genes in hybrid operons (operons containing both essential and non-essential genes), was analyzed again (S2 Table in S1 File). It was found that among 53 prokaryotic genomes, the number of genomes in which essential genes occupy the first position in more than 50% of the operons was reduced from 44 to 19 under this analysis (S2 Table in S1 File). The average position of essential genes in hybrid operons and the proportion of essential genes in the first half of the hybrid operons were studied (Table 1). Consequently, by analyzing the average positions of essential and non-essential genes in hybrid operons of 53 prokaryotic genomes, it was found that essential genes preferentially occupied the front positions of operons compared to non-essential genes (P = 0.004257, Student's t-test). We also calculated the D EG , the relative position of the essential genes in operons, which is defined in Eq (4). If the relative position D EG is negative, it means that the average position of essential genes is in front of the average position of all genes, whereas if the relative position D EG is positive, it means that the average position of essential genes is behind the average position of all genes. As shown in Fig 2, the relative positions of essential genes in most genomes were negative, indicating that essential genes were biased toward the front positions of operons. Compared with the random arrangement result, the relative position of essential genes is different from zero, and essential genes tend to be located in the front positions of operons (P = 9.772e-07, Student's t-test).
We also studied the proportion of essential genes in the first half of hybrid operons. Please note that if the number of genes in the operon is odd, the middle gene is considered to be in the first half of the operon. The bubblechart of the relative position of essential genes in operons and the proportion of essential genes occupying the first half of operons is shown in Fig 2. It was found that the relative positions of essential genes in the genomes with a lower proportion of essential genes occupying the first half of operons tended to be positive. The Pearson correlation coefficient between them was −0.78. By analyzing the relative position of essential genes in operons and the proportion of essential genes occupying the first half of operons in 53 prokaryotic genomes, it was confirmed that essential genes tend to occupy the front positions of operons. Moreover, the Pearson correlation coefficients between D EG and the proportion of essential genes in operons was only 0.02, while the Pearson correlation coefficients between the proportion of essential genes occupying the first half of operons and the proportion of essential genes in operons was −0.12. This indicates that these results are independent of the proportion of essential genes in operons. Therefore, compared to the previous result that essential genes tend to occupy the first position of operons [35], the present conclusion on the position preference of essential genes in operons is more general and reliable.

The possible reason for position preference of essential genes in operons
Depending on whether the operon contains essential genes, operons can be divided into three categories: operons containing only essential genes, operons containing both essential and non-essential genes, and operons containing only non-essential genes. By analyzing these three types of operons in 53 prokaryotic genomes, we found that essential genes have an impact on both gene number and the location of operons. Operons containing essential genes were more biased to be on the leading strand, and the average gene number of operons containing essential and non-essential genes was higher (S4 Table in S1 File). Previous studies have shown that there is a strong relationship between gene expression and the number, length, and order of genes in operons [41]. In operons, the distance from the start of the gene to the end of the operon is defined as the transcription distance. Gene expression increases with an increase in the transcription distance; that is, gene expression increases with an increase in the length of the operon [42,43]. Changes in the order of genes in operons also affect gene expression. The gene farthest from the end of the operon (or the gene closer to the promoter) was always more expressed. That is, the expression level of the gene in the first position is higher than that of the same gene at other positions [41]. In 46 prokaryotic genomes, the average position of essential genes is generally in front of the average position of non-essential genes, which indicates that essential genes tend to have a higher expression level than non-essential genes ( Table 1). Operons containing essential and non-essential genes have more genes, thereby increasing the expression of genes in operons. This is consistent with the fact that essential genes are crucial genes with higher expression levels and encode proteins that perform important functions. It also explains the fact that essential genes tend to be In the left part of the figure, the size of the dot represents the number of essential genes occupying the first half of operons, and the color of the dot represents the proportion of essential genes occupying the first half of operons. The part on the left is sorted according to the proportion of essential genes in the first half of operons from high to low. In the right part of the figure, the size of the dot represents the number of operons, and the color of the dot represents the relative positions of essential genes.
https://doi.org/10.1371/journal.pone.0250380.g002 located in operons rather than alone. This work will be of great significance for understanding the functional basis of genome organization and the practical application of synthetic biology.

Conclusion
In the present study, the position preference of essential genes in prokaryotic operons was explored systematically. The result of a previous study showed that essential genes tend to occupy the first position of operons was related to the proportion of essential genes in operons. To solve this problem, a new index, the average position of genes in an operon, is proposed, which better reflects the position preference of essential genes in operons. Thus, previous shortcomings were avoided, and more general and reliable conclusions were reached. Our work provides new insights into related research on synthetic biology, such as the construction of cell factories and the design of artificial genomes.