Conceived and designed the experiments: JL. Performed the experiments: DJ-M. Analyzed the data: DJ-M JL. Contributed reagents/materials/analysis tools: DJ-M JL. Wrote the paper: JL DJ-M.
The authors have declared that no competing interests exist.
As one of the two classes of integral membrane proteins,
The architecture and amino acid make-up of
A remaining challenging task is the detection and quantification of evolutionary patterns of residues embedded in the TM region. The amino acid sequences of
Characterizing amino acid substitutions can also be used to develop scoring matrices specific for
To capture the pattern of amino acid substitutions of
This paper is organized as follows. We first describe the pattern of amino acid substitutions found for TM segments of bacterial
We use a set of 11
# of Residues and TM Strands | Hydrophobicity Index (GES) | |||||
PDB | TM |
TM |
TM |
TM |
TM |
TM |
1A0S | 172/413/18 | 84 | 87 | −0.54 | −1.66 | 0.52 |
1BXW | 84/172/8 | 42 | 42 | −0.05 | −1.76 | 1.66 |
1E54 | 139/332/16 | 70 | 69 | −0.33 | −1.8 | 1.17 |
1FEP | 206/724/22 | 102 | 104 | −0.67 | −2.25 | 0.87 |
1I78 | 102/297/10 | 50 | 51 | −0.11 | −1.99 | 1.71 |
1KMO | 217/774/22 | 108 | 109 | −0.94 | −2.6 | 0.7 |
1NQE | 220/549/22 | 111 | 109 | −0.87 | −2.47 | 0.77 |
1QD6 | 124/240/12 | 59 | 64 | −0.63 | −2.64 | 1.16 |
1QJ8 | 75/148/8 | 35 | 40 | 0.2 | −1.02 | 1.27 |
2MPR | 178/427/16 | 90 | 87 | −0.75 | −2.5 | 1.04 |
2OMF | 153/340/16 | 76 | 77 | −0.66 | −2.38 | 1.04 |
Mean | 152/401/16 | 75 | 76 | −0.49 | −2.10 | 1.08 |
TM
The transmembrane segments as a whole have moderate polarity (−0.49 by the GES scale
The general pattern of amino acid residue substitutions observed for residues in the TM region is shown in
Estimated instantaneous rates of substitution for residues in the TM segments and at different TM interfaces from 11 template
Small polar residues S and T substitute mostly between themselves (S-T: 38), with the small residues A (25 for T-A, 18 for S-A) and G (S-G:9, T-G:3). Exchanges also occur with N (T-N:6, S-N:15).
Among large polar residues, Q shows overall lower substitutions, but with a broader number of residue types,
Aromatic residues most likely substitute among themselves (
The most abundant residue in the transmembrane segments of
The pattern of substitutions for residues facing the outer membrane lipids (TM
L is the most abundant residue in the TM
Residue A has a broad range of substitutions at the TM
Among aromatic residues, Y is well conserved at the lipid-facing surface of
The predominant polar residue at the TM
Finally, the only substitution observed for G in this interface are mainly with A (46), and to a less extent with T (4) and S (3).
The pattern of substitutions for residues facing the interior of the barrel (TM
Ionizable residues such as E, R, K and D are more abundant in the TM
The pattern of substitutions for hydrophobic residues is somewhat different at this interface. Although V, I, L and M mostly exchange amongst themselves, they also exchanges more frequently with polar residues such as T, in contrast to what is found in the TM
The most abundant residue at this interface is G. Its substitution pattern shows some similarities with G at the lipid interface, although a larger number of substitutions is observed with S (S-G:19).
To identify residues that behave similarly in their patterns of substitutions, we carried out clustering analysis based on the substitution profile of the 20 amino acids. For each amino acid residue, we collected the substitution rates of replacing this residue type with each of the other 19 residue types. These rates form a 19-dimensional vector. As each of the twenty amino acid types has its own vector, we collected a set of twenty vectors and calculated the Euclidean distances between all pairs of vectors. We then carried out single-linkage hierarchical clustering analysis. This is repeated for each interface region and for the entire TM region. The resulting clustering trees are shown in
There is clear grouping of residues in the clustering tree for the TM
The general pattern for the TM
The hierarchical tree for the TM
The estimated amino acid substitution rates can be used to construct scoring matrices for sequence alignment and for large scale database search of homologs of
Three sets of scoring matrices were derived from the estimated substitution rates (see
To obtain objective evaluation, we constructed a “true-positive” data set containing known and predicted
Two additional data set were constructed from the Uniprot database
We use the concatenated transmembrane segments of 11 proteins from which the scoring matrices were derived, along with an additional 9
We first carry out a test of specificity. Results of Blast searches against the randomized database are shown in
B |
P |
S |
P |
||||
0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 20 | 0 | |
0 | 0 | 0 | 0 | 0 | 52 | 0 | |
0 | 0 | 0 | 0 | 0 | 142 | 5 | |
0 | 0 | 0 | 0 | 5 | 319 | 42 | |
0 | 0 | 0 | 0 | 45 | 689 | 181 |
Cumulative number of random sequences incorrectly identified as homologs of
The scoring matrix P
When searches were carried out against the data set of other membrane proteins (
B |
P |
S |
P |
||||
0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 2/2 | 0/0 | |
0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 21/3 | 0/0 | |
0/0 | 0/0 | 0/0 | 0/0 | 0/1 | 98/5 | 1/1 | |
0/0 | 0/0 | 0/0 | 0/0 | 3/6 | 457/13 | 3/2 | |
0/0 | 0/0 | 0/0 | 0/0 | 13/26 | 1780/42 | 28/31 |
Cumulative number of sequences of membrane proteins with other architecture and globular protein sequences incorrectly identified as homologs of
We conclude that the scoring matrices P
Next we performed Blast searches against the “true-positive” database of outer membrane proteins. Search results are shown in
B |
P |
S |
P |
||||
49 | 62 | 56 | 5 | 48 | 46 | 8 | |
116 | 106 | 121 | 32 | 121 | 119 | 41 | |
122 | 121 | 129 | 42 | 133 | 130 | 79 | |
128 | 127 | 143 | 83 | 141 | 143 | 102 | |
138 | 131 | 147 | 95 | 148 | 145 | 107 | |
146 | 139 | 168 | 109 | 176 | 170 | 119 | |
153 | 144 | 206 | 120 | 200 | 202 | 136 | |
191 | 166 | 245 | 126 | 272 | 260 | 202 |
Cumulative number of proteins identified as homologs of 20 template
Finally, we also performed Blast searches against the non-redundant NCBI database (
B |
P |
S |
P |
||||
821 | 934 | 897 | 65 | 605 | 608 | 103 | |
1556 | 1579 | 1977 | 294 | 1781 | 1832 | 416 | |
2020 | 1879 | 2211 | 504 | 2120 | 2749 | 649 | |
2201 | 2135 | 2327 | 650 | 2309 | 4040 | 812 | |
2262 | 2212 | 2377 | 708 | 2385 | 5516 | 1142 | |
2322 | 2288 | 2464 | 856 | 2477 | 7495 | 1475 | |
2407 | 2437 | 2602 | 1198 | 2570 | 8538 | 1677 | |
2573 | 2573 | 2757 | 1503 | 2799 | 9192 | 1966 |
Cumulative number of proteins identified as homologs of the 20 template
In summary, the
It was estimated that a large number of
266 | 277 | 269 | |
335 | 324 | 348 | |
355 | 354 | 360 | |
364 | 361 | 371 | |
369 | 364 | 373 | |
378 | 370 | 381 | |
381 | 376 | 384 | |
383 | 379 | 388 |
Cumulative number of proteins identified as homologs of the human mitochondrial
An important implication of our results is that we can now reliably detect remote homologs of
There are altogether 2,619 proteins in the OMP
The estimated substitution rates reveal characteristic patterns common to all
The most frequently observed substitutions in this region are among branched aliphatic or small hydrophobic residues (
Substitutions of polar residues frequently occur among themselves, and also with A, G and V. They are likely to be involved in the maintenance of inter-strand polar-polar motifs as described in a previous study
With the exception of residue E, ionizable residues in the TM
The overall pattern of substitution of the TM
There are physical constraints on allowed substitutions due to the requirement of folding and stability of
Anti-parallel strands are arranged with all hydrophobic residues on the side of the barrel facing the lipid interface. Residues L, V, A, F, I and W are frequently found in this interface, which is in agreement with the GES and RW hydrophobicity scales
The aromatic girdle represents another structural constraint, where W and Y are enriched. Both W and Y residues at the aromatic girdle are important for the
The result that abundant G is strongly conserved is consistent with the findings from an earlier study, in which it was shown that the substitution of a residue is only weakly influenced by the composition in amino acids, but strongly depends on the constraints of carrying out biological functions and maintaining structural integrity
The lipid-water interface at the end of the
Since the interior of the barrel is the location where these proteins interact with ions, metabolites, and substrates, amino acids in this interface are under strong selection pressure to carry out specific biological functions. As a consequence, there are limited substitutions for residues in this interface (
Aromatic residues facing the TM
Although depicting our results in the form of a Receiver Operating Characteristic (ROC) curve is appealing, there are a number of difficulties that prevent us from using an ROC curve. First, the numbers of true positives and true negatives in any of the data set are not known for each of the query sequences. The total number of sequences in the outer membrane database (3,079) is not the same as the number of true positives when we use only the sequences of a small number of known structures as queries. Second, although the data set of shuffled sequences are most likely to be unrelated to the query proteins, one cannot in principle rule out the presence of some sequences that happens to be homologous to the query sequences by random chance. For these reason, the numbers of true negatives are also not known.
Despite the relatively remote phylogenetic relationship and overall differences, as the proto-mitochondrion probably entered the primitive eukaryotic cell between two and three billion years ago
Our finding is consistent with a recent hypothesis that no eukaryote-specific signals for the translocation into mitochondria evolved in mitochondrial
The estimated substitution patterns of residues in the TM region of
Sequences of bacteria and mitochondria are rapidly accumulating from efforts such as metagenomics projects
In summary, we have characterized the substitution pattern of residues in the transmembrane segments of
We carried out B
The substitution rates of residues in the transmembrane segments were estimated following the approach of Tseng and Liang
Using a Bayesian approach, we describe the instantaneous substitution rate
In this study,
Once the initial
The
We have made available a set of tool to perform Blast searches for
(EPS)
(PDF)
(PDF)
(FASTA)
(FASTA)
(FASTA)
(FASTA)
We thank Drs. Larisa Adamian and Linda Kenney for helpful discussions, and the members of the Liang's lab for stimulating comments and suggestions. We are also very thankful to Barbara W. Boockmeier for a careful reading of the manuscript.