Discerning evolutionary trends in post-translational modification and the effect of intrinsic disorder: Analysis of methylation, acetylation and ubiquitination sites in human proteins

Intrinsically disordered regions (IDRs) of proteins play significant biological functional roles despite lacking a well-defined 3D structure. For example, IDRs provide efficient housing for large numbers of post-translational modification (PTM) sites in eukaryotic proteins. Here, we study the distribution of more than 15,000 experimentally determined human methylation, acetylation and ubiquitination sites (collectively termed ‘MAU’ sites) in ordered and disordered regions, and analyse their conservation across 380 eukaryotic species. Conservation signals for the maintenance and novel emergence of MAU sites are examined at 11 evolutionary levels from the whole eukaryotic domain down to the ape superfamily, in both ordered and disordered regions. We discover that MAU PTM is a major driver of conservation for arginines and lysines in both ordered and disordered regions, across the 11 levels, most significantly across the mammalian clade. Conservation of human methylatable arginines is very strongly favoured for ordered regions rather than for disordered, whereas methylatable lysines are conserved in either set of regions, and conservation of acetylatable and ubiquitinatable lysines is favoured in disordered over ordered. Notably, we find evidence for the emergence of new lysine MAU sites in disordered regions of proteins in deuterostomes and mammals, and in ordered regions after the dawn of eutherians. For histones specifically, MAU sites demonstrate an idiosyncratic significant conservation pattern that is evident since the last common ancestor of mammals. Similarly, folding-on-binding (FB) regions are highly enriched for MAU sites relative to either ordered or disordered regions, with ubiquitination sites in FBs being highly conserved at all evolutionary levels back as far as mammals. This investigation clearly demonstrates the complex patterns of PTM evolution across the human proteome and that it is necessary to consider conservation of sequence features at multiple evolutionary levels in order not to get an incomplete or misleading picture.

In many proteins, IDRs exhibit low amino-acid sequence conservation [22] and tandem repeats are more abundant in IDRs than in ordered regions [23,24]. Insertions and deletions are more common in IDRs [25,26] and they contain more amino acid substitutions than the ordered regions of the same proteins [22]. Furthermore, some disordered regions in proteins show conservation for chemical composition, but not detailed amino-acid sequence conservation [27]. Studies on the evolution of ordered and disordered regions have revealed that disordered regions generally evolve differently from ordered regions, but in some cases similarly to ordered regions [22,[26][27][28][29][30][31]. Hence, understanding the evolution of disordered regions in comparison to ordered regions has been challenging.
IDRs are involved in protein-protein interaction [11], including binding to kinases [32], transcription factors [33], and translation inhibitors [34], and they also mediate interaction with nucleic acids [33,35]. Numerous receptors and enzymes with disordered regions acquire structure when binding to a partner molecule [4,[36][37][38]. Proteins with such folding on binding (FB) regions exhibit high specificity and low affinity towards a partner molecule [1,39]. Compared to other disordered regions, they are enriched in hydrophobic residues, and positively charged amino acids [40] and are more conserved [31]. Post-translational modifications (PTMs) can induce their disorder-to-order transitions [41] [42]. Furthermore, PTMs in disordered regions have a significant role in signalling and regulation [42]. Experimental and computational studies suggest that PTMs including phosphorylation methylation and ubiquitination are enriched within IDRs, [6,7,[42][43][44][45] whereas analysis of acetylation has shown contradictory results [46]. Furthermore, the phosphorylation sites present in disordered regions have been suggested to facilitate the evolution of transcriptional regulation [45,47,48]. Methylation, Acetylation, and Ubiquitination (abbreviated here collectively as 'MAU') are the three major PTMs, next to phosphorylation and glycosylation, which regulate the function of many eukaryotic proteins. Crosstalk between MAU sites facilitates complex regulatory programs in both histone and non-histone proteins [49]. However, the evolution of MAU sites in IDRs across eukaryotic species is not well understood [50][51][52][53]. Therefore, a comparative study on the conservation of human MAU site residues in ordered and disordered regions will illuminate their importance across the eukaryotic domain. Analysis of conservation across a large panel of genome-sequenced eukaryotes can give us more comprehensive insights into the evolutionary history of PTMs [45,47,48], while avoiding issues of data set completeness that may be a problem for experimental analysis of a variety of multi-cellular species.
We have performed a large-scale analysis of >15,000 experimentally-verified MAU sites from the ordered and disordered regions of >7,000 human proteins. We compiled four such data sets for both ordered and disordered regions: (i) methylated arginines, (ii) methylated lysines, (iii) acetylated lysines and (iv) ubiquitinated lysines. We studied the distribution and conservation of MAU-site residues in ordered and disordered regions across 380 eukaryotic organisms. Conservation signals for the maintenance and novel emergence of MAU sites were analysed at 11 evolutionary levels from the whole eukaryotic domain down to the level of the ape superfamily. We observed significant conservation attributable to lysine and arginine PTMs in both ordered and disordered regions across the 11 levels, and also some signals for the novel emergence of new MAU sites. Furthermore, we have pinpointed trends for biologically important subsets of IDRs, such as FB regions and prion-like domains. For example, we observed that MAU and other PTM sites are highly enriched in FB regions relative to both ordered and disordered regions generally and at evolutionary depths back as far as the emergence of the mammal class. eukaryotic organisms were identified using the reciprocal best hit method with BLASTP and e-value threshold <1e-04 [62]. Multiple sequence alignment of human proteins with MAU sites and their orthologs in the 380 organisms was performed using ClustalOmega [63]. For the evolutionary analysis, human proteins with an orthologue in at least one of the organisms in a clade are included and the human proteins without an orthologue in at least one of the organisms are discarded. We used ZORRO, a probabilistic masking program to evaluate the alignment quality of individual positions [64]. In doing this, the aligned positions with low ZORRO score were discarded, and the positions within the recommended score range of five to ten were retained for conservation analysis. For comparison, the alignment program KMAD was also applied in some cases [65].
Enrichment analyses of gene ontology (GO) molecular function categories was performed using the GOrilla tool to identify GO terms enriched in different clades [66].

Identification of ordered and disordered regions in proteins
We performed protein BLASTP [version 2.2.28] [62] against the ASTRAL non-redundant protein domain database (95% identity threshold) [67]. We used PDB atom records of proteins from ASTRAL domain database to identify the experimentally validated position of ordered regions in human proteins and the disordered regions in human proteins were annotated with DISOPRED and IUPRED per-residue prediction scores, using default parameters [18,19]. Since ASTRAL domains are experimentally validated structures, we considered the region given by ASTRAL BLAST hits as ordered region for the cases that are also predicted as disordered. To keep the analysis and presentation of results manageable, regions un-classified in this way were not analysed.
Human prion-like proteins are annotated disordered regions that have a bias for asparagine or glutamine residues (using the fLPS program [68], run with default parameters except for a binomial P-value threshold of 1e-10, as used in previous studies [69][70][71]).

Conservation & statistical analysis
A Python script was written to find the conserved MAU sites in ordered and disordered regions by calculating the completely conserved lysine/arginine residues in the multiple sequence alignment at each clade. Newly-emerged conserved residues are those that are completely conserved in a clade but not across a more ancient, wider clade. To test the significance of conservation, we performed enrichment analysis of the conserved MAU-site residues at each evolutionary level as subsets of the total sets of conserved residues of the same type, with appropriate corrections for multiple hypotheses.
Hypergeometric probability tests were used to find these enrichments of MAU-site residues in ordered and disordered regions for the different evolutionary levels. A Bonferroni correction for multiple hypothesis testing was applied for all tests for a given background population. The details of the enrichment calculations are given in the introductory page of S1 File. All enrichment and statistical analyses are performed using the R language [72].

Results and discussion
First, we overview the distribution of methylation, acetylation and ubiquitination (MAU) sites in ordered and disordered regions, and include some specific analysis and discussion of MAU sites in folding-on-binding (FB) regions, prion-like proteins and homopeptides (which are common features of disordered regions [73]).
Then, we examine the effect of MAU sites on the evolutionary behaviour of lysine and arginine residues. To what extent do MAU sites drive the conservation of these residues and the appearance of new conserved residues at different points in eukaryotic evolution? Is there evidence for the appearance of new conserved lysines in evolutionarily old proteins because of MAU site status?
These questions are examined for each of methylation, acetylation and ubiquitination separately in turn. In doing so, we also consider the effects of: (i) allowing mutation to other possible residue types for the same modification (e.g., allowing mutation between arginine and lysine for methylation); (ii) alignment quality on the results (through applying the program ZORRO, as described in Methods); (iii) removal of histones (which are known to have high levels of MAU).
The evolution of MAU sites is also specifically examined for histones, and for folding-onbinding proteins as subsets. Finally, we briefly consider the evolutionary behaviour of sites that are 'multiple-MAU' (i.e., that can have more than one different type of MAU modification).

Distribution of MAU sites in ordered and disordered regions
The MAU site contents in the ordered and disordered regions are summarized in Fig 1A. Specific lysine residues can be sites for multiple PTMs, including MAU ( Fig 1B). For MAU sites in ordered and disordered regions, the observed overlap between acetylation and ubiquitination sites correlates with an established regulatory relationship [74], and it is also interesting to note the high proportion of methylation sites (~51%) specifically in ordered regions that have other PTMs, in comparison to any other MAU in either ordered or disordered regions ( Fig 1B).
In general, PTM sites have been reported to be abundant in the disordered regions of eukaryotic proteins [7,75]. However, not all PTMs show a preference for disordered regions. We examined the distribution in ordered and disordered regions of human proteins of experimentally-verified MAU sites, along with phosphorylation sites for comparison (as listed in Methods).
We observe that acetylation and ubiquitination sites and methylated lysine sites generally have a significant preference for ordered regions (Fig 2). It is known that lysine methylation in disordered regions blocks site-specific lysine ubiquitination to increase protein half-life [76]. This may contribute to the relative abundance of ubiquitination sites in ordered regions. In comparison, phosphorylation sites prefer disordered regions, as expected [7,75] (Fig 2).
Previous studies have suggested that MAU sites are enriched in disordered regions [6,7,[42][43][44] and acetylated lysines have no preference for either ordered or disordered regions [46]. In contrast, our analysis here shows that experimentally-verified MAU lysines are significantly relatively enriched in ordered regions rather than in disordered ones, whereas the opposite is true for phosphorylation sites (Fig 2).

FB regions as display areas for PTMs
FB regions in proteins are known to interact with multiple and diverse partners [1,39], and are associated with PTMs [41,42]. Previously, we found that FB regions are more conserved than contiguous disordered regions that are not known to exhibit disorder-to-order transition [31]. We have analysed the enrichment of MAU sites and other PTMs in FB regions (in 172 human proteins, data taken from the IDEAL database [77]). Phosphorylation sites are highest in number in FB regions, followed by MAU sites (Fig 3A).
We observed that the major PTMs phosphorylation, methylation, acetylation, and ubiquitination are highly significantly enriched in FB regions treated as a sample either of ordered or of disordered regions (Fig 3A). In addition, two other less numerous PTMs namely O-linked glycosylation on threonines (P-values 3E-05) and sumoylation on lysines (P-value 6.7E-15) are significantly enriched in FB regions treated as a sample of either ordered or disordered regions (not depicted in the figure). Hence, MAU / phosphorylation site enrichment is a distinctive feature of FB regions relative to other (dis)ordered regions. Furthermore, we calculated the percentage distribution of MAU and phosphorylation sites in FB and non-FB/ unclassified regions, and these sites show preference for FB regions, however the number of sites are higher in non-FB/unclassified regions ( Fig 3A). PTMs have been reported to induce disorder-to-order transition and facilitate binding to multiple partners [42]. In addition, PTM sites and 'multiple-MAU' sites (i.e., individual sites that can have multiple different MAU modifications) have been previously reported to show a preference for molecular recognition features (MoRFs) [44]. MoRFs are short (10-70 residues) structured regions within disordered regions, that are thought to undergo disorder-to-order transition on partner binding [78], whereas FB regions are of varying length within both ordered and disordered regions. We analysed the enrichment of multiple-MAU sites within FB regions (Table J in S1 File). We found a highly significant enrichment, treated as a sample of either ordered or disordered regions (P<1e-20). FB regions could be involved in many significant functions due to the prevalence of long disordered regions (>50 residues) in eukaryotic proteins [79]. Indeed, FB proteins with multiple-MAU sites such as flap endonuclease 1 (FEN1), α-synuclein, HMG-I and p53 are involved in DNA/RNA binding. For example, acetylation regulates the activity of FEN1 through p300 [80] and N-terminal acetylation leads to the α-helical oligomerization of α-synuclein [81]. Generally, FB regions are known to be involved in many interactions with high specificity and low affinity towards a partner molecule. Hence, FB PTMs could be crucial for facilitating these interactions.

PTMs are depleted in homopeptides and prion-like proteins
Homopeptide repeats are common in eukaryotic proteins, and they tend to occur in disordered regions [82]. These repeats occur in a variety of nucleic-acid-binding domains linked to Post-translational modification, intrinsic disorder and evolution signalling and transcriptional processes [83]. We calculated the occurrence of PTMs in homopeptides (!3 amino acids) in ordered and disordered regions. Among the major PTMs, a higher proportion of serine phosphorylation and lysine acetylation sites are present in the homopeptides of disordered regions ( Fig 3B). However, enrichment/depletion analyses show that MAU sites are generally significantly depleted in both ordered-and disordered-region homopeptides, although phosphorylated tyrosines may be enriched in disordered-region homopeptides ( Fig 3B). Other PTMs analysed do not show significant enrichment/depletion (i.e., P-values are non-significant); this might be due to their very limited experimental data. We suggest that the homopeptide lack of PTMs is due to the rapid evolution of amino-acid repeats [84], and also because they do not well accommodate required sequence motifs. performed for the FB set as a sample of total ordered or disordered regions. Due to the limited experimental data, other PTM sites were detected only at very low levels or were not present: nitrosylated cysteines 2 sites, O-linked glycosylation (serine, 1 site and threonine, 5 sites), prenylated cysteine (2 sites), sulfated tyrosine (2 sites) and sumoylated lysines (24 sites), whereas carboxylation, myristoylation, palmitoylation sites are not present in the FB regions. We used hypergeometric probability tests to perform the enrichment/depletion analyses of PTM sites in FB regions. The critical P-value to test the significance is P<0.0014 (to correct for multiple hypotheses). (B) Distribution of MAU and phosphorylation sites in homopeptides. The enrichment and depletion analyses are calculated for homopeptides present in the ordered (olive green) and disordered (peach) regions. The statistical test and critical P-value is as for part (A). (C) Distribution of MAU and phosphorylation sites in Human prion-like proteins (grey). Enrichment analysis is performed for lysines or arginines in the prion-like protein set as a sample of total lysines or arginines in the disordered set, as appropriate. The statistical test and critical P-value is as for part (B). https://doi.org/10.1371/journal.pcbi.1006349.g003 The intrinsically disordered nature of prion-like proteins and the role of PTMs such as Nglycosylation in changing the conformation and stability of prion proteins [42,[85][86][87] motivated us to study PTM occurrence in 1269 human prion-like proteins. We performed the analyses as mentioned above ( Fig 3C). As for homopeptides, there is a general trend for significant depletion. We hypothesize that PTMs may get in the way of regular side-chain hydrogenbonding patterns that are essential for prion amyloid formation. Notably also, prion-like proteins do not show a significantly high proportion of N-glycosylation sites, even though they tend to be N-rich (i.e., P-values are non-significant).

Evolutionary behaviour of MAU sites at eleven evolutionary levels
The main goal of this work is to reveal to what extent the evolutionary behaviour of lysine and arginine amino acids is driven by MAU post-translational modification and by presence in intrinsic disorder. To this end, we analysed the evolutionary sequence variation of experimentally verified methylation (lysine: 1009 and arginine: 1676), lysine acetylation (10,044) and lysine ubiquitination (14,396) sites in human proteins.
We analysed the conservation trends at eleven evolutionary levels: (i) Apes, (ii) Primates, (iii) Supraprimates (primates + rodents + lagamorphs), (iv) Eutherians, (v) Mammals, (vi) Tetrapods, (vii) Vertebrates, (viii) Chordates, (ix) Deuterostomes, (x) Metazoans, and (xi) Eukaryotes (all 380 eukaryotes species examined) ( Fig 4A). Conservation of MAU-site residues was investigated in the ordered and disordered regions across the 380 eukaryotic organisms using the pipeline of methods illustrated in Fig 4B. An illustrative example of a protein alignment (for 'human chromobox protein homolog 3') indicating the positions of MAU sites in ordered and disordered regions is shown in Fig 5. When we talk about conservation of PTM sites in the following analysis, it is the conservation for the amino-acid residues that is under consideration, and not for PTMs explicitly. It is discovered below that there is sufficient sequence information to discover conservation signals that indicate the maintenance and emergence of new MAU sites during the evolutionary ancestry of humans.
We examined the degree of conservation of arginines and lysines that are human MAU sites at each of the 11 evolutionary levels. We analysed: (i) the MAU site residues that are conserved (out of the total number of conserved arginines and lysines) for each of these 11 clades, and (ii) the MAU site residues that are newly emerged residues for that specific clade and are conserved right across it. To test the significance of conservation, we performed enrichment analysis of the conserved MAU sites at each evolutionary level, with appropriate corrections for multiple hypotheses. The fractions of conserved residues that are MAU sites at different evolutionary stages are shown on schematic species trees in S2 File. A summary schematic of the major results is shown in Fig 6. In general, we found that, of proteins with MAU sites, 7.3% in ordered and 1.0% in disordered regions have conserved sites across all eukaryotes, with 3.0% of sites in ordered and 0.5% of sites in disordered being completely conserved in this way (Table 1). For example, the abundant eukaryotic DEAD-box protein p68 contains such completely conserved acetylatable (ordered: K-351) and ubiquitinatable (ordered: K-351, disordered: K-375) residues in both ordered and disordered regions. PTMs such as acetylation and ubiquitination are reported to regulate transcriptional coactivation and increase the stability of p68 [89]. The presence of conserved acetylation-and ubiquitination-site residues suggest an essential role of very specific PTMs in p68 across all eukaryotes.
Evidence for methylation as a driver of lysine conservation during eukaryotic evolution, and for the emergence of new lysine methylation sites. The fraction of conserved methylation-site lysine residues in each clade is shown in Figure A in S2 File, ordered regions being shown in green and, disordered regions in peach colour. The bubble size indicates the fraction of conservation. We find substantial evidence for significant conservation of lysine methylation sites across most of the 11 levels (P-values = 0.004 to 5e-21) except in apes, primates and vertebrates for ordered regions and apes, primates, supraprimates, vertebrates and across all eukaryotic organisms for disordered regions) (Figure A in S2 File, top and bottom left panels, and Table A in S1 File). This strong persistent conservation signal across most of the levels suggests that methylation is a major driver of lysine conservation in both ordered and disordered regions across eukaryote evolution.
In addition, for each clade we studied newly emerged lysines that are methylated in humans. By doing so, we can ask: Is lysine methylation also a driver for conservation for newly emerged lysine residues? We observed evidence for a significant enrichment of new lysine methylation sites in the ordered regions of eutherians (P = 6.9e-06), and in the disordered regions of mammals (P = 9.6e-04) and deuterostomes (P = 0.0011) (Figure A in S2 File / Table A in S1 File). Specifically, we observed a conservation signal for a significant number of Multiple sequence alignment of human chromobox protein homolog 3 and its primate orthologs, depicted using JalView [88], showing methylation, acetylation (purple) and ubiquitination (yellow) sites in ordered (green) and disordered (peach) regions. The sites with both acetylation and methylation sites are highlighted in brown, sites with both acetylation and ubiquitination sites are highlighted in cyan and the sites with acetylation, methylation and ubiquitination sites are highlighted in red.
https://doi.org/10.1371/journal.pcbi.1006349.g005 evolutionarily new methylation sites appearing at various epochs in old proteins, i.e. proteins that emerged earlier in eukaryotic evolution. The significant enrichment of new sites in old proteins is similar to the above general results except that new sites are more highly enriched in the disordered regions of deuterostomes (P = 5e-04) (Table E in S1 File). Examples of such proteins in mammals are microtubule-associated protein tau and chromodomain Y-like protein (CDYL1). In tau proteins, methylatable residues K-163 and K-267 in disordered regions are conserved across mammals. K-267 residue methylation is reported to increase frequency of phosphorylation at S-262, and K-163 is identified as a site for both methylation and acetylation [90]. Moreover, methylation at these sites may play important roles in pathological conditions [90]. In mammals, in the protein CDYL1 methylatable K-135 in a disordered region is conserved, and is reported to regulate chromodomain binding to H3K9me3 [91]. These conservation signals for emergence of new lysine methylation sites suggest that clade-specific changes in modifying enzymes might cause progressive addition of more PTM sites to specific proteins in complex organisms.
All conservation signals for new emergent lysine methylation sites appear to be due to new sites in evolutionarily old proteins, i.e., there appears to be no significant contribution from new proteins (such as those arising from new gene duplications). This is also observed generally for all the MAU sites analysed further below.
We also examined the conservation of human lysine methylation sites while allowing for mutation to arginine (i.e., since arginines can also be methylated) and vice versa. This analysis also yields significant conservation signals at various evolutionary levels, with a few differences ( Table B in S1 File). For example, specifically in eutherians, a signal for the emergence of new sites is observed in both ordered and disordered regions ( Table B in S1 File). This indicates that methylated lysine sites could have been mutated to arginines in the epoch after eutherian emergence. Furthermore, in general the conservation analyses of aligned positions for human lysine methylation sites after applying the alignment quality filtering program ZORRO give similar results, but with increased significance (Table C in S1 File). Also, overall, there is little Fig 6. Summary of significantly enriched conserved MAU sites in ordered and disordered regions at 11 evolutionary clades. Evolutionary levels with significant enrichment (after correction for multiple hypotheses) are labelled with four different shapes: lysine methylation (square), arginine methylation (circle), lysine acetylation (star) and ubiquitination (triangle) sites. The ordered and disordered regions with enriched MAU sites are labelled in olive green and peach respectively and the sites with significant enrichment in both disordered and ordered regions are coloured blue. Where there are conservation signals for newly emerged MAU sites in ordered and disordered regions the symbols are marked with black and red borders respectively. The results are depicted in more detail (with P-values, specific thresholds and total numbers of sites) in S1 File. Table A in S1 File is for the total data set, and Table B  difference in the results upon removal of histones (Table D in S1 File), with just three results switching significance status in three of the analysed levels. In addition, we checked the effect of using an alternative alignment tool called KMAD, that has some features designed to apply to alignment of disordered proteins [65] (Table K in S1 File). This tool produced considerably less aligned positions overall at all evolutionary levels, but led to increased significance or acquisition of significance in the enrichments detected for 9 of the 11 levels, and decreases in significance for two of them (Deuterostomes and Metazoan). We also calculated the significant conservation of methylation sites in the disordered regions predicted by IUPRED software (Table L in S1 File), for comparison. IUPRED annotates fewer disordered regions than DIS-OPRED, however only one significance result changes (conservation at the primate level becomes significant) ( Table K in S1 File). Arginine methylation conservation is highly favoured in ordered regions across human evolutionary descent in eukaryotes. Arginine methylation has been extensively studied in both histones and non-histones, and generally involved in signal transduction, mRNA splicing, transcription factors and DNA repair (reviewed in [92]). Protein arginine methyltransferases have been identified in many non-mammalian organisms such as invertebrate chordates, arthropods and nematodes [92]. We find here evidence that right across eukaryotic evolution human methylated arginine sites have had significant conservation, almost exclusively in ordered regions ( Figure B in S2 File, top left panel and Table A in S1 File). The human methylated arginines in ordered regions show a higher fraction of conservation than in disordered regions at almost all evolutionary levels. There are no significant conservation signals for the emergence of new methylated arginine sites during eukaryotic evolution. However, methylated arginine residues, when allowed to mutate to lysine, show potential emergence of new sites in metazoans, indicating potential allowance of such mutation (Table B in S1 File). Similar conservation results are obtained for IUPRED-predicted disordered regions, with additional enrichment in metazoans (Table K in S1 File). In addition, filtering for alignment quality using ZORRO or application of the KMAD tool yields similar results as for methylated Ks, i.e., increased and more pervasive significance, with additional enrichments in clades such as primates, eutherians, tetrapods and vertebrates. In general, since such quality filtering gives higher scoring for conserved positions, ordered regions tend to gain higher scores than disordered regions; however, generally in our analyses we see further significant conservation in disordered regions as well (Table C in S1 File). Also, similar results are obtained here when histones are removed from the data sets (Table D in S1 File).
In the analysis for newly emerged arginine methylation sites at various evolutionary levels, we looked specifically for a conservation signal indicating the emergence of new arginine methylation sites in evolutionarily old proteins (Table E in S1 File). We found a significant enrichment of such methylated arginines in the ordered regions of old proteins in tetrapods (P = 0.028). In tetrapods, these sites are identified in the ordered regions of proteins such as heterogeneous ribonucleoproteins hnRNP A2/B1 and A0. Arginine methylation sites in hnRNPs A2/B1 and hnRNP A0 are involved in cellular signaling and maturation of hnRNPs [93]. Furthermore, methylation-site arginine residues show conservation in the disordered regions of hnRNP H3 in tetrapods. hnRNP isoforms confer various splicing functions, and hnRNP is reported to transactivate tyrosine hydrolase gene transcription in tetrapods [94]. Thus, methylation-site arginine residue conservation correlates with their vital role in tetrapod hnRNPs.
Human acetylated lysines are favoured for significant conservation in disordered regions rather than in ordered regions across eukaryote evolution. To explore the conservation of lysine acetylation in ordered and disordered regions, we performed the same analysis as for methylation. Here, we find that human lysine acetylation sites are significantly enriched (P<0.00417) among conserved lysines in disordered regions at 7 out of the 11 evolutionary levels, more so than in ordered regions (4/11 levels) (Fig 6A). Notably, human acetylated lysines are significantly enriched among conserved lysines in disordered regions at several levels (P<1e-20) (Table A in S1 File and Figure C in S2 File, bottom left panel). Strong conservation evidence for the emergence of new disordered-region lysine acetylation sites is observed in Deuterostomes (P = 3e-21). There is no conservation signal for the emergence of new lysine acetylation sites in ordered regions at any evolutionary level (Fig 6A and Table A in S1 File), except that when mutation to other possible acetylation sites is allowed, it is observed in eutherians (Table B in S1 File).
Since there is a conservation signal for new lysine acetylation sites in disordered regions across deuterostomes, we examined a few proteins that may have acquired new sites in this evolutionary epoch. For example, new conservation at MAU sites is found in the disordered regions of CREB-binding protein (CBP) and p300 HAT. Six acetylated K residues are conserved in CBP IDRs. CBP is hypothesized to increase the acetylation of H3 and H4 histones and NcoA3 [95]. In p300 HAT, we found eight conserved acetylatable K residues in IDRs in the p300 loop region. The autoacetylation of K residues within this region is proposed to regulate the p300 HAT domain [95].
We analysed for evidence of new lysine acetylation sites in 'new' proteins (i.e. proteins that arose in each clade) and in 'old' proteins (i.e., proteins that arose earlier in evolution). We find conservation signals for new lysine acetylation sites in old proteins (Table E in S1 File) in both ordered (P = 0.0046) and disordered (P = 1e-21) regions of old proteins in deuterostomes.
As above for methylation, we checked whether the results are affected by the application of several criteria. Firstly, we compared the results to the case where the conservation of K acetylation sites as other residue types is allowed (i.e., substitution of acetylated K by A, G, M, S, or T; these are amino acids which can also be acetylated). We observed that the two datasets exhibit little or no difference (Table B in S1 File). This result suggests that the overall trend for conservation of human acetylation sites is robust to substitution of acetyl lysine to other possible acetylatable residues. In addition, IUPRED-predicted disordered regions show similar significances but with decreased significance in supraprimates, eutherians and tetrapods, and additional enrichment in primates (Table K in S1 File). As above, applying the ZORRO alignment quality filter or the KMAD tool, or removal of histones give similar or more highly significant enrichments.
Ubiquitination-site residue conservation is favoured in disordered regions of eukaryotic proteins. We analysed ubiquitination sites as above. We find that 4 out of 11 eukaryotic levels show significant enrichment of conserved ubiquitination sites in both ordered and disordered regions, and furthermore in apes, eutherians and vertebrates, only disordered regions exhibit significant conservation (P<0.0025) of these sites (Table A in S1 File and Figure D in S2 File). In deuterostomes, we found a significant signal for new sites in disordered regions (P<0.00417). Moreover, when we focused on potential new sites in evolutionarily old proteins, we found similar enrichment for disordered regions, with all the potential additional sites found in deuterostomes present in such old proteins (P<1e-10) (Table E in S1 File).
For example, the human ubiquitinated K-56 residue in IDRs is newly conserved in RNA helicase p68 across deuterostomes. The poly-ubiquitination of overexpressed p68 is reported in colorectal neoplasms [96]. Moreover, mutation of sumoylation sites is reported to increase polyubiquitination, therefore resulting in p68 aggregation [96]. In addition, ubiquitinatable K-207 is newly conserved across deuterostomes in the disordered regions of MCM3, an essential DNA replication licensing factor. K-207 in MCM3 is reported to be ubiquitinated by KEAP1 and KEAP1-mediated MCM3 ubiquitination sites are stated to be on predicted exposed surfaces of the C-terminal domain in MCM3 [97]. Such conservation suggests that these ubiquitinatable sites in the disordered regions could have facilitated macromolecular interactions since the dawn of deuterostomes.
Previously, for a much smaller data set, it has been observed that ubiquitination sites are more conserved than unmodified lysines in both ordered and disordered regions in mammals, whereas these sites are not more significantly conserved than unmodified sites in yeast [50]. Here, we discover that such conservation has been maintained throughout various stages of human eukaryotic ancestry. Also, we find a conservation signal for the emergence of new ubiquitination sites during deuterostome evolution (Fig 6A, Table A in S1 File). Furthermore, similar conservation results are observed for the IUPRED-predicted disordered regions but with loss of significance for two clades (Table K in S1 File). As above, filtering with the ZORRO program or application of the KMAD program (Tables C and L in S1 File) in general accentuate the conservation results with additional enrichments in several further clades, and removing histones makes little or no difference (Table D in S1 File).

Conservation signals for MAU sites in histones
Histone proteins are highly conserved in all eukaryotes, and their regulatory activity is intimately linked to MAU and phosphorylation. These modifications provide several functions to histones and can modify nucleosome shape and stability. For example, acetylation and phosphorylation alter the charge of histone proteins. Methylation is more complex, i.e. lysine can be mono-, di-or tri-methylated, and ubiquitination provides a much larger covalent modification [98]. Most histone modifications occur within the disordered N-terminal tails, where they are linked to regulation of chromatin structure and recruitment of enzymes to reposition nucleosomes [98]. Furthermore, ordered regions of histones are highly conserved and modifications in these regions are also observed. Extensive study of the cross-talk between PTMs in histone tails has given rise to the term "histone code", wherein histone tails exhibit sites for multiple PTM types and function in transcriptional regulation [41,99]. Hence, we wished to compare the conservation behaviour of MAU sites in the ordered and disordered regions of histones.
We examined the MAU sites in histones that are significantly enriched in each clade. The percentage of histones in the total proteins analysed is 0.69%, which almost triples (to 1.74%) for proteins with conserved MAU sites across all eukaryotes. We found a similar pattern of significant conservation signals across three evolutionary levels (mammals, eutherians and supraprimates), i.e., for lysine methylation sites and ubiquitination sites in disordered regions, and for acetylation sites in ordered regions, (P-values = 2e-03 to 2.8e-07) (Table F in S1 File and Fig 6B).
Methylation site lysine residues in the disordered regions of linker H1 and H3 variants are conserved as far back as mammals. Histones have a significant enrichment for conserved methylation-site residues in disordered regions in mammalian, eutherian and supraprimates clade alignments (Table F in S1 File). Hence, we examined some individual cases for further perspectives. In mammals, we found a notable number of conserved methylation site residues in the disordered regions of Histone H1 variants H1.0 (K-12, K-102 and K-108) and H1.3 (K-17, K-107 and K-169) and of Histone H3 variants H3.2 and H3.3 (7 conserved site residues each). The linker histone H1.1 binds between the nucleosomes and is part of higherorder chromatin structure. H1 variant PTMs might be involved in modulating DNA binding [100]. Lysine acetylation in the H1 N-terminal region reduces H1 affinity to chromatin, and also recruits TAF1 to activate transcription [101]. In addition, lysine methylation in the N-terminal region of Histone H3 has been linked with strong cognitive abilities [102]. Thus, conservation of methylatable lysine residues in the disordered regions of histone H1 and H3 variants might facilitate cell-specific transcription and exhibit vital roles in neurodegenerative diseases.
Ubiquitination sites in H2A and H3 variants in mammalian histones. In mammalian histones, conserved ubiquitination-site lysines are significantly enriched in disordered regions (Table F in S1 File). The highest number of conserved ubiquitinatable site residues are observed in Histone H2A variants such as Histone H2A.1 (positions 120, 126, 128 and 130) and Histone H2A type 2-B (positions 119, 120, 125, 128 and 130). This could be linked to monoubiquitination being common in H2A and H2B, and present in all cells of higher organisms [103]. PTMs in intrinsically disordered histone tail domains have diverse functional impacts. For example, during spermatogenesis, proteasome-mediated degradation of histones may facilitate chromatin condensation [103]. Also, ubiquitinated H2A is involved in gene silencing and suppresses transcription initiation by inhibiting methylation of H3 at K-4 [103]. Hence, the results suggest that modifications on the disordered regions of histone variants that altered nucleosome stability were consolidated in the epoch of evolution since the dawn of mammals.

Sites with multiple MAU PTMs
Multiple PTMs can occur on the same residue in a protein. Histone proteins are the bestknown example of this; they have such 'multiple-MAU' sites in their N-terminal tail regions. The association between multiple-MAU sites and signalling is also observed in other proteins, e.g., α-tubulin, RNA polymerase II, p300/CBP and Cdc25C phosphatases [44]. PTM cross-talking at these sites such as between phosphorylation/acetylation, phosphorylation/sumoylation, hydroxylation/O-linked-glycosylation, and acetylation/ubiquitination has been reported [74,99]. As shown in Fig 1B, our analysis shows the pronounced co-occurrence of acetylation and ubiquitination that plays a major regulatory role [74]. Previously it was shown that multiple-MAU sites show a strong preference for disordered regions [44].
We checked whether having multiple MAU modifications at one site is linked to increased sequence conservation for the 11 evolutionary levels. This would also be a further strong indicator that the conservation signals we have observed are due to conservation of PTMs at various evolutionary depths. PTM sites in human proteins with more than one MAU modification were separated into ordered (1836 sites) and disordered regions (676 sites). We found evidence for significant conservation of multiple-MAU sites in disordered regions in apes (P = 0.009) and supraprimates (P = 1e-05), and in ordered regions in apes (P = 0.006), eutherians (P = 9e-26), chordates (P = 1.5e-43) and across all eukaryotes (P = 3.5e-04). Also, there are conservation signals that appear due to the emergence of new conserved sites, e.g., in chordate, eutherian and supraprimate clades for ordered regions, and very high significance is found in supraprimates (P = 1e-82) for disordered regions (Table H in S1 File). Many of the P-values for these are smaller than the P-value for any relevant individual PTM enrichment, indicating potential increased conservation due to their multiple-MAU status.

Ubiquitination is a major driver of conservation of lysines in folding-onbinding (FB) regions
Analysis of PTM sites in folding-on-binding regions showed that phosphorylation and MAU sites are significantly enriched (Fig 3A). So, we analysed the conservation of MAU sites in FB regions across the 11 evolutionary levels (Table I in S1 File). Here, we treated the FB regions as a sample of both ordered regions and disordered regions.
In eutherians, we observe significant enrichments of conserved lysines/arginines for all types of MAU in the FB regions (as samples of either disordered or ordered regions) ( Table I in S1 File). Ubiquitination-site residues are the most enriched, with a persistence of enrichment back as far the mammalian clade (P = 4e-3), followed by acetylated lysines (Table I in S1 File). Examples of human proteins with ubiquitination-site residues in FB regions that have such conservation in mammals are: Myc proto-oncogene, histone H3.3, Ras-related C3 botulinum toxin substrate-1 (Rac1), and Protein CASC3.
For example, the Myc proto-oncogene, a transcription factor, is shown to undergo phosphorylation on T-58 and S-62 prior to its degradation by ubiquitination. The interaction of Fbw7 on T-58 is reported to promote degradation of Myc protein, and the mutation on this site results in decreased degradation [104]. The ubiquitinated K-18 in the FB region of the histone H3 tail is identified to mediate DNA methylation by interacting with the N-terminal regulatory domain of DNMT1 [105]. Furthermore, it has been reported that the ubiquitinated Rac1 might be involved in the internalization of Rac1 from peripheral membrane and relocation of Rac1 towards endocytic vesicles. In addition, mutations at evolutionarily conserved ubiquitination sites are identified to be enriched in cancer [105]. Therefore, these results suggest that the enriched ubiquitination sites in FB or disordered regions could be due to the shorter half-life of these proteins.

Functional trends in MAU-site containing proteins
We checked for any interesting functional trends in the conservation data through examining Gene Ontology (GO) annotation [106]. Specifically, we were interested in the functional trends for proteins that have newly emerged conserved MAU-site residues at each evolutionary level. We see high enrichments for very general GO categories involved in 'binding', such as 'nucleotide binding', 'RNA binding', 'protein binding', 'anion binding', etc. These functional trends tend to be detectable from the tetrapod clade down to primates, but not so much outside of this range. These results tally well with a general role for MAU PTMs in modifying binding specificities and modalities ( Figure A in S3 File). Interestingly, when compared to the GO category enrichments for the whole set of MAU-modified proteins ( Figure B in S3 File), there are a few missing categories, e.g., 'drug binding', 'organic cyclic compound binding', and 'microtubule binding', suggesting that MAU sites on proteins with these functions do not undergo concerted changes in conservation across deep eukaryotic evolutionary time.

Concluding remarks
Intrinsically disordered regions can house large numbers of post-translational modifications, such as the MAU sites which are the focus of this study. By examining for conservation of these sites in ordered and disordered regions separately, we have discovered that MAU is an important driver of arginine/lysine conservation throughout different stages of eukaryotic evolution, and that there is evolutionary evidence for key moments in human ancestry where new MAU sites have arisen in existing proteins, particularly during the epochs of deuterostome and eutherian evolution. The conservation signals for emergence of new PTM sites suggest that clade-specific changes in modifying enzymes might cause the progressive addition of more PTM sites to specific proteins in complex organisms. There is a surprising variety of conservation patterns for MAU-site residues when comparing disordered and ordered regions to each other. The four types of MAU site (methylatable K and R, acetylatable K and ubiquinatable K) each have distinct conservation patterns, with conservation of methylatable Rs being strongly favoured in ordered regions. In contrast, methylatable Ks are conserved in either set of regions, and conservation of acetylatable and ubiquitinatable Ks is favoured in disordered regions over ordered. The strongest conservation signals occur across the mammalian clade, indicating its appropriate use as a baseline conservation level for further analyses (Fig 7A and  7B), but for newly-emerged cases, the signals are strongest in other clades, indicating specific epochs of evolutionary emergence (Fig 7C and 7D). Distinct patterns of MAU-site evolution are observed in histones during eukaryote evolution, as compared to non-histones. However, removal of histones from the data makes little or no difference to the overall results. Also, in general filtering for alignment quality increases significances, in both ordered and disordered regions.
Examining the scenario where mutation to other possible MAU sites yields some interesting variant results. For example, a conservation signal for an emergent allowance of mutation between methylated arginine (R) and lysine (K) is observed in a certain epoch. This suggests that some sites switched between R and K and left a trace of this in the conservation pattern.
Folding-on-binding regions have highly significant enrichments in MAU sites (particularly ubiquitination sites) relative to other ordered or disordered regions, that persist back as far as mammalian emergence. Also, in some cases 'multiple-MAU' sites, i.e., sites that can be modified in any of the three ways, demonstrate highly significant conservation that is much more significant than for any corresponding single PTM type. The number of conserved sites gets smaller when conservation is analysed for larger, wider clades. Conversely, however, the residues that are PTMed in human are a larger fraction of these deeply conserved residues, so significant conservation is still detected. This investigation demonstrates that analysis of conservation across a large panel of genome-sequenced eukaryotes can give us more comprehensive insights into the evolutionary history of PTMs, while avoiding issues of data set completeness that may be a problem for experimental analysis of such a variety of multi-cellular species. Also it is clear that we need to consider conservation of sequence features at multiple levels in order not to get an incomplete or misleading picture.