Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Incorporation of Local Structural Preference Potential Improves Fold Recognition

  • Yun Hu ,

    Contributed equally to this work with: Yun Hu, Xiaoxi Dong

    Affiliations National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China, Graduate University of Chinese Academy of Sciences, Beijing, China

  • Xiaoxi Dong ,

    Contributed equally to this work with: Yun Hu, Xiaoxi Dong

    Affiliations National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China, Graduate University of Chinese Academy of Sciences, Beijing, China

  • Aiping Wu,

    Affiliation National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China

  • Yang Cao,

    Affiliations National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China, Graduate University of Chinese Academy of Sciences, Beijing, China

  • Liqing Tian,

    Affiliations National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China, Graduate University of Chinese Academy of Sciences, Beijing, China

  • Taijiao Jiang

    taijiao@moon.ibp.ac.cn

    Affiliation National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China

Incorporation of Local Structural Preference Potential Improves Fold Recognition

  • Yun Hu, 
  • Xiaoxi Dong, 
  • Aiping Wu, 
  • Yang Cao, 
  • Liqing Tian, 
  • Taijiao Jiang
PLOS
x

Abstract

Fold recognition, or threading, is a popular protein structure modeling approach that uses known structure templates to build structures for those of unknown. The key to the success of fold recognition methods lies in the proper integration of sequence, physiochemical and structural information. Here we introduce another type of information, local structural preference potentials of 3-residue and 9-residue fragments, for fold recognition. By combining the two local structural preference potentials with the widely used sequence profile, secondary structure information and hydrophobic score, we have developed a new threading method called FR-t5 (fold recognition by use of 5 terms). In benchmark testings, we have found the consideration of local structural preference potentials in FR-t5 not only greatly enhances the alignment accuracy and recognition sensitivity, but also significantly improves the quality of prediction models.

Introduction

Modeling of protein structures based on structure templates found from experimentally determined structures, called template-based modeling (TBM), is currently the most effective way to build a 3-D structure for a protein of unknown structure. To build a structure model for a target protein sequence, the TBM process consists of four major steps: identification of structural templates, alignment of target sequence to structural templates (or sequence-structure alignment), model building, and model quality evaluation. The first two steps are the key steps in the TBM process, improvement of which can greatly improve the quality of the final predicted model [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13]. For target sequences with high sequence similarity to those of structure templates, the structural templates can be easily identified and the target sequences can be reliably aligned to the structural templates by those methods that use sequence information alone such as PSIBLAST [14] and HMMER [15]. However, for target sequences with low sequence similarity, the reliable identification of structural templates and accurate sequence-structure alignment requires a much more complex process called threading or fold recognition that integrates many other types of information with sequence profile information.

The secondary structure information is probably the most popular one that has been integrated with sequence profile information in most of the existing fold recognition methods [9], [10], [11], [16], [17], [18], [19]. Other types of structural information such as contact information, solvent accessibility, predicted backbone torsion angles and structure profiles have also been explored to improve the accuracy of fold recognition [20], [21], [22], [23], [24], [25]. Arguably, the integration of a proper type of structural information can significantly improve fold recognition, particularly for those target sequences with low sequence similarity to structural templates of similar fold.

In this work, we introduce another type of structural information, local structural preference information, in fold recognition. The structure preferences of 3-residue and 9-residue fragments were derived as potential-like terms from known structures. We have shown that integration of these terms with the three widely used information, sequence profile, secondary structure and hydrophobic score allows us to develop an effective fold recognition method, called FR-t5, an abbreviation of fold recognition with 5 terms.

Results

Overview of the FR-t5, a novel fold recognition approach by considering local structure preference potentials (LSPPs)

We have derived local structural preference potentials (LSPPs) to capture the structure preference of sequence fragments of short length. Fragments of 3- and 9-amino acids are considered in our work. To calculate the 3-residue and 9-residue LSPPs, we first divide the conformers of 3-residue and 9-residues fragments into a number of bins, then compute the distribution of these binned conformers among known structures (Details see Methods). By combining 3-residue and 9-residue LSPPs with the three widely used information, sequence profile, secondary structure and hydrophobic score, we further develop a new threading algorithm called FR-t5. In FR-t5, dynamic programming (DP) [26], [27] is used to make alignments between the query and the templates. Then the templates are selected to build the structure models for the query protein sequence using MODELLER [28]. A detailed description of the method is given in Methods.

In the following results, we will first show based on different tests that the incorporation of LSPPs indeed improves the fold recognition of FR-t5 in both the threading alignment and the sensitivity of fold recognition by comparing to the method without considering LSPPs which we called as FR-t3 for convenience. Then, we will compare the FR-t5 to the state-of-the-art fold recognition methods. Finally, we will demonstrate the performance of FR-t5 in the recent CASP9 of 2010. The consideration of LSPPs has enabled us to develop an effective fold recognition approach.

LSPPs improves the threading alignment in FR-t5

To test whether the incorporation of LSPPs improves the alignment accuracy, the performance of FR-t5 was evaluated by comparing to FR-t3 on two datasets: SALIGN [29] and MUSTER190 [19]. The SALIGN dataset consists of 200 pairs of structurally similar proteins with 65% of equivalent Cα atoms superposed within an RMSD of 3.5 Å. But the sequence similarity of these SALIGN protein pairs is low, ∼20% sequence identity on average. The Muster190 dataset contains 190 protein pairs whose structural similarities are indicated by SCOP hierarchical structure classification, 120 of them having same folds but in different superfamilies and 70 of them belonging to same superfamilies but not same family. The structural alignments used as gold standards were carried out by the TM-align program [30]. The MUSTER190 dataset could be more difficult to align than SALIGN, because the protein pairs of MUSTER190 (average TM-score  = 0.536) are less structurally similar than those of SALIGN (average TM-score  = 0.653).

To compare the alignment accuracy of FR-t5 and FR-t3 on the two datasets, for each protein pair, we align the query with its template using the threading program FR-t5 and FR-t3, respectively. The alignment accuracy is calculated as the percentage of correctly aligned positions by comparing to the gold standards generated by the TM-align program (see Table 1).

thumbnail
Table 1. The alignment accuracies for FR-t5 and FR-t3 on SALIGN and MUSTER190 datasets.

https://doi.org/10.1371/journal.pone.0017215.t001

Table 1 summarizes the alignment accuracies of FR-t5 and FR-t3 on both datasets. Obviously, FR-t5 achieves better alignments than FR-t3 (58.9% vs 57.1% on SALIGN and 36.0% vs 35.1% on MUSTER190). These tests demonstrate that the consideration of LSPPs in fold recognition can improve the alignment accuracy.

LSPPs improves fold recognition sensitivity in FR-t5

To further investigate whether the consideration of LSPPs is able to improve the fold recognition sensitivity, we compared FR-t5 with the method without considering LSPPs, FR-t3, on the Lindahl dataset [31] which is a widely used dataset for benchmarking the sensitivity of other threading programs [3], [6], [7], [10], [11], [31], [32], [33]. The Lindahl dataset includes 976 proteins, of which 555, 434 and 321 proteins have at least one match with the others in the dataset at the family, superfamily and fold levels, respectively. To evaluate the contribution of LSPPs in fold recognition sensitivity by comparing FR-t5 to FR-t3, each protein was aligned with the other 975 proteins. The fold recognition sensitivity is measured as the percentages of the true hits identified as the first rank or as one of the top five ranks (see Table 2).

thumbnail
Table 2. The benchmarking of the sensitivity of FR-t5 and FR-t3 on Lindahl dataset.

https://doi.org/10.1371/journal.pone.0017215.t002

As shown in Table 2, the FR-t5 outperforms FR-t3 in Top1 by 2.4%, 4.9%, and 5.4%, at the level of family, superfamily, and fold, respectively, indicating that the consideration of LSPPs can improve the sensitivity of fold recognitions at all SCOP levels. But compared to the improvement at family level (84.0% versus 81.6% in Top1 and 90.2% versus 89.7% in Top5), the improvements at the superfamily level (49.1% versus 54.0% in Top1 and 64.7% versus 71.9% in Top5) and the fold level (29.6% versus 35.0% in Top1 and 58.4% versus 65.5% in Top5) are even more significant. This shows the advantage of the incorporation of LSPPs for fold recognition in its ability to significantly improve fold recognitions for proteins sharing low sequence similarity.

The consideration of LSPPs in FR-t5 significantly improves the quality of structure modeling in CASP8 test set

To gain more comprehensive insights into the contribution of LSPPs in fold recognition, the methods with (FR-t5) and without (FR-t3) consideration of LSPPs were more rigorously compared by applying them to find structure templates and make structure prediction for the CASP 8 targets [34]. In CASP 8, 164 domains from 121 target proteins to be predicted were used to evaluate the server prediction performance [35]. Of the 164 domains, 13 were defined as free modeling (FM) targets and 154 as template-based (TBM) targets (including 3 FM targets). Of the 154 TBM targets, 50 were further defined as the high-accuracy (TBM-HA) targets. The above classification was based on sequence and structure similarity [35]. To ensure a blind prediction, we only used the non-redundant (NR) sequence database (ftp://ftp.ncbi.nih.gov/blast/db) and PDB database [36] generated before the start of CASP8. The prediction performance is evaluated by the TM-score of the first model and Top 5 models.

As shown in Table 3, when the first models for all 164 targets were considered, FR-t5 outperforms FR-t3 in both TBM targets and FM targets by an improvement of TM-score about 1.9% on average. The improvement of FR-t5 over FR-t3 is more significant for the difficult FM targets of no detectable templates: a 12.5% (0.025/0.2) increase of TM score. While for the easy 50 TBM-HA domains, there is no significant improvement, emphasizing the contribution of LSPPs to the structure prediction beyond sequence similarity.

Comparison with other methods

As shown above, the incorporation of LSPPs can significantly improve both alignment accuracy and sensitivity of fold recognition. Here we ask whether FR-t5 which simply incorporates local structural preference information into the three widely used terms (sequence profile, secondary structure and hydrophobic score) can achieve a satisfactory performance that is comparable to the existing popular fold recognition programs. In developing fold recognition methods, the SALIGN dataset [29] and Lindahl dataset [31] have been widely used to test alignment accuracy and fold recognition sensitivity, respectively. In order to compare FR-t5 with the existing popular fold recognition methods directly, we carried out the tests based on these two datasets that had been previously used to test or develop these existing methods.

Based on the dataset SALIGN that was used to assess the performance of BLAST [37], COMPASS [38], SALIGN [29], SPARKS [7], SP3 [9] and UNI-FOLD [39], we compared the alignment accuracy of FR-t5 with the alignment accuracies of these methods reported in the literature [39]. As shown in table 4, FR-t5 is slightly better than UNI-FOLD, the best of these methods (58.9% vs 57.4%).

Based on the Lindahl dataset, we also compared the fold recognition sensitivities between FR-t5 and the existing 9 threading methods that demonstrated good performance in previous CASPs, namely SAMT98 [40], FUGUE [3], RAPTOR [6], SPARKS [7], HHpred [18], FOLDpro [32], SP3 [9], SP4 [10], SP5 [11]. Table 5 shows that in terms of fold recognition sensitivity, FR-t5 is comparable to the best of these existing 9 threading methods in finding structural templates for proteins with a wide range of sequence similarities to their template structures (from the family level to fold level).

thumbnail
Table 5. The comparison of FR-t5 with other methods for fold recognition on the Lindahl benchmark.

https://doi.org/10.1371/journal.pone.0017215.t005

Participation of FR-t5 in the recent CASP 9

Our newly developed FR-t5 has participated in the recent CASP9 of 2010 under the name of Jiang_THREADER. As a server group, Jiang_THREADER made structure prediction for all the 147 domain targets provided by CASP9. Based on the evaluation, our program Jiang_THREADER was ranked 24th among all 81 structural modeling programs (http://predictioncenter.org/casp9/CD/data/html/groups.2.html), demonstrating the relative good performance of our FR-t5 in structural modeling, which is comparable to most of the state-of-the-art structural modeling programs.

The prediction results from all participated methods have been released online (http://predictioncenter.org/download_area/CASP9/server_predictions/), allowing us to make comparisons based on individual predictions. Here we would like to show some successful examples predicted by FR-t5 (See Figure 1). One example is T0549 of 84 AA, the FR-t5 predicts the model with a TM-score of 0.662 which has the best performance among all prediction methods;Another example is T0592 of 144 AA, the FR-t5 predicts the model with a TM-score of 0.771, which comes next to Raptor (with a TM-score of 0.789). For the more difficult target T0553 of 141 AA, FR-t5′s prediction is the best among the fold recognition methods, which predicted a model with a TM-score of 0.332 that is comparable to the de novo prediction method BAKER-ROSETTASERVER with a TM-score of 0.331. We note that it is hard to do a fair comparison with other prediction methods based on the prediction models submitted to the CASP prediction center. First, as pointed out by Wu and Zhang [19], the threading performance is usually sensitive to the template library which varies greatly between different methods. Second, in CASP, some predictions could combine several threading methods (so called meta-threading methods) [5], [41], [42], [43], integrate multiple templates [44], [45], [46], [47], perform optimizations such as all-atom refinement [48], [49] and employ ab initio prediction when the correct templates are ambiguous [45], [48], [50]. However, our FR-t5 prediction does not incorporate the results of other methods of same kind or perform any further refinement. Nonetheless, the relative good performance of FR-t5 in CASP9 has demonstrated its potential application to structure modeling.

thumbnail
Figure 1. Modeled structures for three CASP9 targets, T0549, T0592 and T0553, by FR-t5.

(a) The superposition between the native structure of T0549 (green) and the top1 model (red) predicted by FR-t5. (b) The superposition between the native structure of T0592 (green) and the top1 model (red) predicted by FR-t5. (c) The superposition between the native structure of T0553 (green) and the top1 model (red) generated by FR-t5.

https://doi.org/10.1371/journal.pone.0017215.g001

Discussion

In this work, we have developed a new threading method FR-t5 by combining the information of local structural preference for 3-residue and 9-residue fragments with sequence profile, predicted secondary structure, and hydrophobic scoring. The incorporation of the two new terms is intended to capture the local structure stiffness when the template structure is aligned with the query sequence. To explore the effects of the two new terms on the improvement of fold recognition, the FR-t5 method is compared with the method FR-t3 that only considers sequence profile, predicted secondary structure and hydrophobic scoring. Based on testings on three public benchmarks, we have shown that the incorporation of the two terms improves both the alignment accuracy and recognition sensitivity in fold recognition. Moreover, when testing on the CASP8 targets, we found that incorporation of the two new terms can significantly improve the structure modeling for the targets of low similarity, with an improvement of TM-score of 12.5% for the 13 hard targets.

Many pioneering studies have analyzed the characteristics of recurring local structural fragments and their mappings to local sequence properties [51], [52], [53], [54], [55], [56], [57], [58], [59], [60]. The mappings between local sequence and structure have been used to improve protein structure modeling. The Rosetta program [59] uses the information of 3-residue and 9-residue fragments for de novo protein structure prediction and refinement of protein prediction models. In the SP3/SP4/SP5 [9], [10], [11] and MUSTER [19], the local structural profiles derived from 25 top fragments in the comparison of the 9-residue fragment with all same-size fragments in the structural database have also been shown to contribute much to the good performance of fold recognition. Recently, Zhou and Skolnick showed that use of fragment comparison and template comparison which provide local and global quality evaluation of the prediction model, respectively can better rank and assess the prediction model [24]. These studies have demonstrated the direct use of local sequence and structure mappings in terms of fragment library to improve protein structure modeling. In our study, inspired by the work of Shakhnovich group that used a local sequence-energy term for protein structure de novo prediction [61], we have derived a statistics-based local structural preference potential (LSPP) for 3-residue and 9-residue fragments for fold recognition. Indeed, the integration of 3-residue and 9-residue LSPPs into the three widely used information, sequence profile, secondary structure and hydrophobic score has led us to develop the effective fold recognition program, FR-t5. We believe the development of the local structural preference potential will be of great benefits for application, because it is independent of databases and requires no structural comparison which is computationally expensive. Therefore, the local structural preference potential we developed can be easily incorporated into other threading methods.

Consideration of individual terms that are independent is very important for development of an effective knowledge-based scoring function. In our work, we have considered two types of structural information. One is the secondary structure information and the other is local structural preference information. Although the two types of structural information can be highly correlated, they are different, which can capture different aspects of structural feature. The secondary structure term is intended to capture the secondary structure propensity of a residue, which is based on three crude categories: alpha helix, beta sheet and coil regions. While the LSPPs used in our study is able to capture more detailed local structural conformation at short fragment level. In our testing, we found that the incorporation of LSPPs significantly improve the performance, suggesting the complementary nature of the two types of structural information in fold recognition.

Although we have shown that consideration of local structure information in potential-like forms has significantly improved fold recognition. There is still much room to improve. First, the fragment sizes have not been extensively explored. In our study, for simplicity, we only attempted fragments of 3-residue and 9-residue. Second, since different representations of the structural fragments can reveal different features of the local structures, finding better way of structure representation could dramatically improve fold recognition. Lastly, a more systematic and comparative analysis is needed to look for discretization of the local structural space, which will generate more appropriate bin numbers for the improvement of fold recognition. Despite this, our consideration of local structural preference information has led us to develop an effective fold recognition method, FR-t5, which can achieve a comparable performance to the existing well-established threading methods.

Methods

Local Structural Preference Potential of 3-residue Fragments

The local structural preference potential of the 3-residue fragment is computed by following Yang et al's method [61] with adaptation. Let us suppose a 3-residue fragment that consists of three amino acids , , and , the four variables ,, and are used to represent the conformation space of the 3-residue fragment, where is the angle between and , is the angle between and , respectively (see Figure 2). The width of bins was 60°, 60°, 30°, 30° for ,, and , respectively. The potential of the 3-residue fragment is obtained from the Potential Database (see below) by:

thumbnail
Figure 2. A Schematic Diagram of Spatial Representation and Conformational Constraints of a 3-residue Fragment.

The bold letters and denote the bisecting vector of and , the bisecting vector of and , respectively. and denote the vectors in planes defined by three backbone atoms (,, and ), and (,, and ), respectively.

https://doi.org/10.1371/journal.pone.0017215.g002

(1)where and are the number of observations in the j-th bin and the total number of observations not in the j-th bin, respectively. The normalization process requires the careful choice of the value of µ (0<µ<1) to balance the contribution of the positive counts and the negative counts for all the bins in the database. Because the total number of observations not in the j-th bin, is always far larger than number of observations in the j-th bin, , a big weight (µ>0.5) should be given to the positive counts. The value of µ should be chosen to make (net interaction energy, i.e., the average of energies for the bins that have at least one positive count.). In the case of 3-residue fragment, the value of is chosen to make the net interaction zero. The potentials of all 8000 3-residue fragments over the binned conformers constitute an energy table.

In threading, we assume a 3-residue fragment to adopt its template conformation, then its local structural preference potential can be obtained from above energy table.

The Local Structural Preference Potential of 9-residue Fragment

The local structural preference potentials of a 9-residue fragment describe the statistical distributions of its binned conformers. Given the myriad of conformers of 9-residue fragments, to avoid the undersampling issue, we introduce a coarse-grained model described as follows: First, to reduce the sequence space of 9-residue fragments, the 20 amino acids were re-represented as three alphabets based on their hydrophobic-polar properties: H for hydrophobic residues F, W, Y, C, M, I, L and V; N for neutral residues A, G, T, S and P; and P for polar or hydrophilic residues N, Q, D, E, H, R and K [62]. Then, to decrease degrees of freedom in the conformation of a 9-residue fragment, each residue is represented by its . Supposing is the atom of the residue (where is the HNP type of a residue i), Figure 3 illustrates the coarse-grained model of a 9-residue fragment that centers on the residue i. In the coarse-grained model, the conformation of 9-residue fragment has same number of degrees of freedom as 3-residue fragment (comparing Figure 3 and Figure 2). Therefore, by following the conformation annotation of 3-residue fragment as shown in Figure 2, we introduce to represent the vector bisecting two vectors ( and ) and to denote a vector in a plane defined by three continuous atoms (,, and ). Therefore, the four variables , ,, and can also be used to describe the reduced conformation space of 9-residue fragment, where is the virtual dihedral angle defined by four continuous atoms (,,, and ); is the virtual dihedral angle defined by four continuous atoms (,,, and ); is the angle between and ; and is the angle between and . , , , and are further binned at intervals of 60°, 60°, 30° and 30°, respectively. The total number of bins is 39*(360/60)*(360/60)*(180/30)*(180/30) ≈25.5 million, is greater than the number of 9-residue fragments (about 1.5 million) in the structural template database. Because most of bins are inaccessible due to the position features of the Cα atoms of the protein backbone [63], [64], the actual number of bins (i.e., the size of the energy table) is significantly reduced and thus the undersampling issue can be avoided. The structural preference potential of 9-residue fragment is obtained from the Potential Database (see below) by:

thumbnail
Figure 3. Coarse-grained Structure Model of a 9-residue Fragment.

The coarse-grained structure of a 9-residue fragment consists of nine points, each of which represents an amino acid and is denoted as the Cα atom of the residue. A link between two Cα atoms is a virtual bond that connects the two residues. Thus, the description of the coarse-grained structure of a 9-residue fragment follows that for 3-residue fragment (see Figure 2). The bold letters , denote the bisecting vector of and , the bisecting vector of and , respectively. , denote the vectors in a plane defined by three continuous atoms (,, and ), and (,, and ), respectively.

https://doi.org/10.1371/journal.pone.0017215.g003

(2)where and are the number of observations in the j-th bin and the total number of observations not in the j-th bin. The value of is chosen to make the net interaction zero.

In threading, we assume a 9-residue fragment to adopt its template conformation, then its local structural preference potential can be obtained from above energy table.

The Potential Database

The parameters of local structural preference potentials for 3-residue and 9-residue fragments are derived from the PDB database released before CASP8 beginning date of May 3rd of 2008. The non-redundant PDB library of sequence identity ≤30% was generated with PISCES [65]. The sequences that share sequence identity greater than 30% to the sequence in the training dataset ProSup [66] were further removed, resulting in 6298 sequences whose structures were determined by X-RAY with resolution higher than 3.0 Å. If there are any chain breaks in the fragment, the value of the energy is set to a reference value of 0.

Scoring Functions

The local structural preference potentials for 3-residue (Efrag3) and 9-residue fragment (Efrag9) are combined with the three widely used terms, sequence profile (Eseq,seq), secondary structure (E2nd) and hydrophobic score (Ehydro) to make up a scoring function for template-based modeling. The shift constant (Eshift) is introduced to avoid the alignment of unrelated residues in the local regions [19]. Thus, the score E(i,j) for measuring the extent/quality of alignment between the ith residue of a query sequence and the jth residue of a template sequence of known structure is a linear combination of the above five terms and the shift constant, which is given as follows:(3)where wi are the weights of the equation, which were obtained by training the equation on the dataset ProSup (see below). The calculation of Eseq,seq, E2nd and Ehydro sees below. For simplicity, fold recognition or template-based modeling by using the scoring function with 5 terms is called FR-t5, while the method that uses the scoring function consisting of the three terms, sequence profile, secondary structure, and hydrophobic score is denoted as FR-t3.

Sequence Profile, Eseq,seq

For a given sequence, its sequence profile was built by using PSIBLAST [14] to search against the non-redundant (NR) sequence database. The PSIBLAST was run at e-value cutoff 0.001 with 3 iterations.

The term in Equation 1 is the sequence profile match score between query sequence and template sequence,which is computed as:(4)where is the frequency of the presence of residue k of the template sequence at the position i of the query sequence, is the log-odds profile value (Position-Specific Substitution Matrix in PSIBLAST) of the residue k at position j of template sequence.

Secondary Structure, E2nd

The term in Equation 1 is the match score between the predicted secondary structure of the query sequence and the observed secondary structure of the template structure, which is given below:(5)where is the predicted second structure of the query sequence at position i and is the observed second structure of the template at position j. The secondary structures of query sequences are predicted by the program PSIPRED [67]. The secondary structures of template structures are assigned by DSSP [68]. The secondary structures are represented by three states, Helix (H), Strand (E), and Coil (C).

Hydrophobic score, Ehydro

The term is the match score of the hydrophobic patterns between the query sequence and template sequence, which is given below:(6)where is the residue type of the query sequence at position i and is the residue type of the template at position j. The hydrophobic scoring matrix is taken from Silva [23].

The Gap Model

The gap model in the threading algorithm is an important factor that affects the alignment accuracy. Many different gap models were introduced previously. For example, SP3/SP4 [9], [10] and MUSTER [19] used a position-dependent gap penalty model which depends on the type of secondary structure. SP5 [11] used a profile-based gap model, which depends on the multiple sequence alignment made by PSIBLAST. More recently, Peng and Xu [69] used a more complicated gap model, which uses both context-specific and position-specific gap penalty. In our work, we employed the position-dependent gap penalty model in the dynamic programming, which operates as follows:

  1. No gaps are allowed in the region where the predicted secondary structure and the secondary structure of the template are in the same state of helix or sheet;
  2. The end gap penalty is neglected;
  3. Affine gap opening (go) and gap extension penalties (ge) are applied to other regions.

Dynamic Programming

We use the Needleman-Wunsch global alignment algorithm [26] to optimize the matching score between the query sequence and template structure based on Eq. (1) with the position-dependent gap penalty model described above.

Parameterization of the Scoring functions

The parameters used in FR-t5 and FR-t3 were trained on the ProSup dataset [66] that consists of 127 protein pairs of significant structure similarity but of low sequence identity less than 30%. The alignments of these protein pairs are given by ProSup program and used as gold standards in the parameterization. There are 5 and 7 parameters (, ), in FR-t3 and FR-t5, respectively, which were parameterized by following the same training procedure used by Zhou and Zhou [9]. In brief, to optimize the parameters, we maximized the number of matches between the gold standard alignment and the alignment made by the threading method.

Template Selection

The template rankings are based on two normalized scores: Sn1 and Sn2. Sn1 is the raw score S normalized by the length of alignment including gaps between the query and template sequences. Sn2 is the raw score normalized by the alignment length excluding query ending gaps. To rank the templates of a query sequence, if the maximal Sn1 is greater than or equal to the maximal Sn2, they will be ranked by Sn1, and otherwise by Sn2.

Author Contributions

Conceived and designed the experiments: TJ. Performed the experiments: YH XD AW. Analyzed the data: YH XD AW YC LT TJ. Wrote the manuscript: YH XD TJ.

References

  1. 1. Rice DW, Eisenberg D (1997) A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. J Mol Biol 267: 1026–1038.DW RiceD. Eisenberg1997A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence.J Mol Biol26710261038
  2. 2. Jones DT (1999) GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences. Journal of Molecular Biology 287: 797–815.DT Jones1999GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences.Journal of Molecular Biology287797815
  3. 3. Shi J, Blundell TL, Mizuguchi K (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 310: 243–257.J. ShiTL BlundellK. Mizuguchi2001FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties.J Mol Biol310243257
  4. 4. An YL, Friesner RA (2002) A novel fold recognition method using composite predicted secondary structures. Proteins-Structure Function and Genetics 48: 352–366.YL AnRA Friesner2002A novel fold recognition method using composite predicted secondary structures.Proteins-Structure Function and Genetics48352366
  5. 5. Ginalski K, Elofsson A, Fischer D, Rychlewski L (2003) 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 19: 1015–1018.K. GinalskiA. ElofssonD. FischerL. Rychlewski20033D-Jury: a simple approach to improve protein structure predictions.Bioinformatics1910151018
  6. 6. Xu J, Li M, Kim D, Xu Y (2003) RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol 1: 95–117.J. XuM. LiD. KimY. Xu2003RAPTOR: optimal protein threading by linear programming.J Bioinform Comput Biol195117
  7. 7. Zhou H, Zhou Y (2004) Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins 55: 1005–1013.H. ZhouY. Zhou2004Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition.Proteins5510051013
  8. 8. Ginalski K, Grishin NV, Godzik A, Rychlewski L (2005) Practical lessons from protein structure prediction. Nucleic Acids Res 33: 1874–1891.K. GinalskiNV GrishinA. GodzikL. Rychlewski2005Practical lessons from protein structure prediction.Nucleic Acids Res3318741891
  9. 9. Zhou HY, Zhou YQ (2005) Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins-Structure Function and Bioinformatics 58: 321–328.HY ZhouYQ Zhou2005Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments.Proteins-Structure Function and Bioinformatics58321328
  10. 10. Liu S, Zhang C, Liang SD, Zhou YQ (2007) Fold recognition by concurrent use of solvent accessibility and residue depth. Proteins-Structure Function and Bioinformatics 68: 636–645.S. LiuC. ZhangSD LiangYQ Zhou2007Fold recognition by concurrent use of solvent accessibility and residue depth.Proteins-Structure Function and Bioinformatics68636645
  11. 11. Zhang W, Liu S, Zhou YQ (2008) SP5: Improving Protein Fold Recognition by Using Torsion Angle Profiles and Profile-Based Gap Penalty Model. PLoS One 3: e2325.W. ZhangS. LiuYQ Zhou2008SP5: Improving Protein Fold Recognition by Using Torsion Angle Profiles and Profile-Based Gap Penalty Model.PLoS One3e2325
  12. 12. Jaroszewski L, Rychlewski L, Li ZW, Li WZ, Godzik A (2005) FFAS03: a server for profile-profile sequence alignments. Nucleic Acids Research 33: W284–W288.L. JaroszewskiL. RychlewskiZW LiWZ LiA. Godzik2005FFAS03: a server for profile-profile sequence alignments.Nucleic Acids Research33W284W288
  13. 13. Fernandez-Fuentes N, Rai BK, Madrid-Aliste CJ, Fajardo JE, Fiser A (2007) Comparative protein structure modeling by combining multiple templates and optimizing sequence-to-structure alignments. Bioinformatics 23: 2558–2565.N. Fernandez-FuentesBK RaiCJ Madrid-AlisteJE FajardoA. Fiser2007Comparative protein structure modeling by combining multiple templates and optimizing sequence-to-structure alignments.Bioinformatics2325582565
  14. 14. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402.SF AltschulTL MaddenAA SchafferJ. ZhangZ. Zhang1997Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Nucleic Acids Res2533893402
  15. 15. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14: 755–763.SR Eddy1998Profile hidden Markov models.Bioinformatics14755763
  16. 16. McGuffin LJ, Jones DT (2003) Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19: 874–881.LJ McGuffinDT Jones2003Improvement of the GenTHREADER method for genomic fold recognition.Bioinformatics19874881
  17. 17. Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, et al. (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Research 31: 3804–3807.K. GinalskiJ. PasLS WyrwiczM. von GrotthussJM Bujnicki2003ORFeus: detection of distant homology using sequence profiles and predicted secondary structure.Nucleic Acids Research3138043807
  18. 18. Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21: 951–960.J. Soding2005Protein homology detection by HMM-HMM comparison.Bioinformatics21951960
  19. 19. Wu ST, Zhang Y (2008) MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins-Structure Function and Bioinformatics 72: 547–556.ST WuY. Zhang2008MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information.Proteins-Structure Function and Bioinformatics72547556
  20. 20. Panchenko AR, Marchler-Bauer A, Bryant SH (2000) Combination of threading potentials and sequence profiles improves fold recognition. J Mol Biol 296: 1319–1331.AR PanchenkoA. Marchler-BauerSH Bryant2000Combination of threading potentials and sequence profiles improves fold recognition.J Mol Biol29613191331
  21. 21. Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K (2003) Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry. Proteins-Structure Function and Bioinformatics 51: 504–514.R. KarchinM. ClineY. Mandel-GutfreundK. Karplus2003Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry.Proteins-Structure Function and Bioinformatics51504514
  22. 22. Pettitt CS, McGuffin LJ, Jones DT (2005) Improving sequence-based fold recognition by using 3D model quality assessment. Bioinformatics 21: 3509–3515.CS PettittLJ McGuffinDT Jones2005Improving sequence-based fold recognition by using 3D model quality assessment.Bioinformatics2135093515
  23. 23. Silva PJ (2008) Assessing the reliability of sequence similarities detected through hydrophobic cluster analysis. Proteins 70: 1588–1594.PJ Silva2008Assessing the reliability of sequence similarities detected through hydrophobic cluster analysis.Proteins7015881594
  24. 24. Zhou H, Skolnick J (2010) Improving threading algorithms for remote homology modeling by combining fragment and template comparisons. Proteins 78: 2041–2048.H. ZhouJ. Skolnick2010Improving threading algorithms for remote homology modeling by combining fragment and template comparisons.Proteins7820412048
  25. 25. Torda AE, Procter JB, Huber T (2004) Wurst: a protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices. Nucleic Acids Res 32: W532–535.AE TordaJB ProcterT. Huber2004Wurst: a protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices.Nucleic Acids Res32W532535
  26. 26. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 443–453.SB NeedlemanCD Wunsch1970A general method applicable to the search for similarities in the amino acid sequence of two proteins.J Mol Biol48443453
  27. 27. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147: 195–197.TF SmithMS Waterman1981Identification of common molecular subsequences.J Mol Biol147195197
  28. 28. Sali A, Blundell TL (1993) Comparative Protein Modeling by Satisfaction of Spatial Restraints. Journal of Molecular Biology 234: 779–815.A. SaliTL Blundell1993Comparative Protein Modeling by Satisfaction of Spatial Restraints.Journal of Molecular Biology234779815
  29. 29. Marti-Renom MA, Madhusudhan MS, Sali A (2004) Alignment of protein sequences by their profiles. Protein Science 13: 1071–1087.MA Marti-RenomMS MadhusudhanA. Sali2004Alignment of protein sequences by their profiles.Protein Science1310711087
  30. 30. Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Research 33: 2302–2309.Y. ZhangJ. Skolnick2005TM-align: a protein structure alignment algorithm based on the TM-score.Nucleic Acids Research3323022309
  31. 31. Lindahl E, Elofsson A (2000) Identification of related proteins on family, superfamily and fold level. J Mol Biol 295: 613–625.E. LindahlA. Elofsson2000Identification of related proteins on family, superfamily and fold level.J Mol Biol295613625
  32. 32. Cheng J, Baldi P (2006) A machine learning information retrieval approach to protein fold recognition. Bioinformatics 22: 1456–1463.J. ChengP. Baldi2006A machine learning information retrieval approach to protein fold recognition.Bioinformatics2214561463
  33. 33. Yan RX, Si JN, Wang C, Zhang ZD (2009) DescFold: A web server for protein fold recognition. BMC Bioinformatics 10: 416.RX YanJN SiC. WangZD Zhang2009DescFold: A web server for protein fold recognition.BMC Bioinformatics10416
  34. 34. Moult J, Fidelis K, Kryshtafovych A, Rost B, Tramontano A (2009) Critical assessment of methods of protein structure prediction-Round VIII. Proteins-Structure Function and Bioinformatics 77: 1–4.J. MoultK. FidelisA. KryshtafovychB. RostA. Tramontano2009Critical assessment of methods of protein structure prediction-Round VIII.Proteins-Structure Function and Bioinformatics7714
  35. 35. Tress ML, Ezkurdia I, Richardson JS (2009) Target domain definition and classification in CASP8. Proteins-Structure Function and Bioinformatics 77: 10–17.ML TressI. EzkurdiaJS Richardson2009Target domain definition and classification in CASP8.Proteins-Structure Function and Bioinformatics771017
  36. 36. Sussman JL, Lin DW, Jiang JS, Manning NO, Prilusky J, et al. (1998) Protein Data Bank (PDB): Database of three-dimensional structural information of biological macromolecules. Acta Crystallographica Section D-Biological Crystallography 54: 1078–1084.JL SussmanDW LinJS JiangNO ManningJ. Prilusky1998Protein Data Bank (PDB): Database of three-dimensional structural information of biological macromolecules.Acta Crystallographica Section D-Biological Crystallography5410781084
  37. 37. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410.SF AltschulW. GishW. MillerEW MyersDJ Lipman1990Basic local alignment search tool.J Mol Biol215403410
  38. 38. Sadreyev R, Grishin N (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 326: 317–336.R. SadreyevN. Grishin2003COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.J Mol Biol326317336
  39. 39. Poleksic A, Fienup M (2008) Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms. Bioinformatics 24: 1145–1153.A. PoleksicM. Fienup2008Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms.Bioinformatics2411451153
  40. 40. Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14: 846–856.K. KarplusC. BarrettR. Hughey1998Hidden Markov models for detecting remote protein homologies.Bioinformatics14846856
  41. 41. Lundstrom J, Rychlewski L, Bujnicki J, Elofsson A (2001) Pcons: A neural-network-based consensus predictor that improves fold recognition. Protein Science 10: 2354–2362.J. LundstromL. RychlewskiJ. BujnickiA. Elofsson2001Pcons: A neural-network-based consensus predictor that improves fold recognition.Protein Science1023542362
  42. 42. Wu S, Zhang Y (2007) LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res 35: 3375–3382.S. WuY. Zhang2007LOMETS: a local meta-threading-server for protein structure prediction.Nucleic Acids Res3533753382
  43. 43. Bennett-Lovsey RM, Herbert AD, Sternberg MJE, Kelley LA (2008) Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Proteins-Structure Function and Bioinformatics 70: 611–625.RM Bennett-LovseyAD HerbertMJE SternbergLA Kelley2008Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre.Proteins-Structure Function and Bioinformatics70611625
  44. 44. Fernandez-Fuentes N, Madrid-Aliste CJ, Rai BK, Fajardo JE, Fiser A (2007) M4T: a comparative protein structure modeling server. Nucleic Acids Res 35: W363–368.N. Fernandez-FuentesCJ Madrid-AlisteBK RaiJE FajardoA. Fiser2007M4T: a comparative protein structure modeling server.Nucleic Acids Res35W363368
  45. 45. Xu J, Peng J, Zhao F (2009) Template-based and free modeling by RAPTOR++ in CASP8. Proteins 77: Suppl 9133–137.J. XuJ. PengF. Zhao2009Template-based and free modeling by RAPTOR++ in CASP8.Proteins77Suppl 9133137
  46. 46. Zhang Y (2009) I-TASSER: fully automated protein structure prediction in CASP8. Proteins 77: Suppl 9100–113.Y. Zhang2009I-TASSER: fully automated protein structure prediction in CASP8.Proteins77Suppl 9100113
  47. 47. Wang Z, Eickholt J, Cheng J (2010) MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8. Bioinformatics 26: 882–888.Z. WangJ. EickholtJ. Cheng2010MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8.Bioinformatics26882888
  48. 48. Das R, Qian B, Raman S, Vernon R, Thompson J, et al. (2007) Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home. Proteins 69: Suppl 8118–128.R. DasB. QianS. RamanR. VernonJ. Thompson2007Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home.Proteins69Suppl 8118128
  49. 49. Kanou K, Iwadate M, Hirata T, Terashi G, Umeyama H, et al. (2009) FAMSD: A Powerful Protein Modeling Platform that Combines Alignment Methods, Homology Modeling, 3D Structure Quality Estimation and Molecular Dynamics. Chemical & Pharmaceutical Bulletin 57: 1335–1342.K. KanouM. IwadateT. HirataG. TerashiH. Umeyama2009FAMSD: A Powerful Protein Modeling Platform that Combines Alignment Methods, Homology Modeling, 3D Structure Quality Estimation and Molecular Dynamics.Chemical & Pharmaceutical Bulletin5713351342
  50. 50. Zhou H, Skolnick J (2007) Ab initio protein structure prediction using chunk-TASSER. Biophys J 93: 1510–1518.H. ZhouJ. Skolnick2007Ab initio protein structure prediction using chunk-TASSER.Biophys J9315101518
  51. 51. Camproux AC, Tuffery P, Chevrolat JP, Boisvieux JF, Hazout S (1999) Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng 12: 1063–1073.AC CamprouxP. TufferyJP ChevrolatJF BoisvieuxS. Hazout1999Hidden Markov model approach for identifying the modular framework of the protein backbone.Protein Eng1210631073
  52. 52. Hunter CG, Subramaniam S (2003) Protein fragment clustering and canonical local shapes. Proteins-Structure Function and Genetics 50: 580–588.CG HunterS. Subramaniam2003Protein fragment clustering and canonical local shapes.Proteins-Structure Function and Genetics50580588
  53. 53. Hunter CG, Subramaniam S (2003) Protein local structure prediction from sequence. Proteins-Structure Function and Genetics 50: 572–579.CG HunterS. Subramaniam2003Protein local structure prediction from sequence.Proteins-Structure Function and Genetics50572579
  54. 54. Camproux AC, Gautier R, Tuffery P (2004) A hidden markov model derived structural alphabet for proteins. J Mol Biol 339: 591–605.AC CamprouxR. GautierP. Tuffery2004A hidden markov model derived structural alphabet for proteins.J Mol Biol339591605
  55. 55. Sander O, Sommer I, Lengauer T (2006) Local protein structure prediction using discriminative models. BMC Bioinformatics 7: 14.O. SanderI. SommerT. Lengauer2006Local protein structure prediction using discriminative models.BMC Bioinformatics714
  56. 56. Boomsma W, Mardia KV, Taylor CC, Ferkinghoff-Borg J, Krogh A, et al. (2008) A generative, probabilistic model of local protein structure. Proceedings of the National Academy of Sciences of the United States of America 105: 8932–8937.W. BoomsmaKV MardiaCC TaylorJ. Ferkinghoff-BorgA. Krogh2008A generative, probabilistic model of local protein structure.Proceedings of the National Academy of Sciences of the United States of America10589328937
  57. 57. Schenk G, Margraf T, Torda AE (2008) Protein sequence and structure alignments within one framework. Algorithms for Molecular Biology 3: 4.G. SchenkT. MargrafAE Torda2008Protein sequence and structure alignments within one framework.Algorithms for Molecular Biology34
  58. 58. Zimmermann O, Hansmann UHE (2008) LOCUSTRA: Accurate prediction of local protein structure using a two-layer support vector machine approach. Journal of Chemical Information and Modeling 48: 1903–1908.O. ZimmermannUHE Hansmann2008LOCUSTRA: Accurate prediction of local protein structure using a two-layer support vector machine approach.Journal of Chemical Information and Modeling4819031908
  59. 59. Simons KT, Kooperberg C, Huang E, Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 268: 209–225.KT SimonsC. KooperbergE. HuangD. Baker1997Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.J Mol Biol268209225
  60. 60. Hou Y, Hsu W, Lee ML, Bystroff C (2003) Efficient remote homology detection using local structure. Bioinformatics 19: 2294–2301.Y. HouW. HsuML LeeC. Bystroff2003Efficient remote homology detection using local structure.Bioinformatics1922942301
  61. 61. Yang JS, Chen WW, Skolnick J, Shakhnovich EI (2007) All-atom ab initio folding of a diverse set of proteins. Structure 15: 53–63.JS YangWW ChenJ. SkolnickEI Shakhnovich2007All-atom ab initio folding of a diverse set of proteins.Structure155363
  62. 62. Li TP, Fan K, Wang J, Wang W (2003) Reduction of protein sequence complexity by residue grouping. Protein Engineering 16: 323–330.TP LiK. FanJ. WangW. Wang2003Reduction of protein sequence complexity by residue grouping.Protein Engineering16323330
  63. 63. Zhang Y, Kolinski A, Skolnick J (2003) TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys J 85: 1145–1164.Y. ZhangA. KolinskiJ. Skolnick2003TOUCHSTONE II: a new approach to ab initio protein structure prediction.Biophys J8511451164
  64. 64. Rotkiewicz P, Skolnick J (2008) Fast procedure for reconstruction of full-atom protein models from reduced representations. J Comput Chem 29: 1460–1465.P. RotkiewiczJ. Skolnick2008Fast procedure for reconstruction of full-atom protein models from reduced representations.J Comput Chem2914601465
  65. 65. Wang G, Dunbrack RL Jr (2003) PISCES: a protein sequence culling server. Bioinformatics 19: 1589–1591.G. WangRL Dunbrack Jr2003PISCES: a protein sequence culling server.Bioinformatics1915891591
  66. 66. Lackner P, Koppensteiner WA, Sippl MJ, Domingues FS (2000) ProSup: a refined tool for protein structure alignment. Protein Engineering 13: 745–752.P. LacknerWA KoppensteinerMJ SipplFS Domingues2000ProSup: a refined tool for protein structure alignment.Protein Engineering13745752
  67. 67. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292: 195–202.DT Jones1999Protein secondary structure prediction based on position-specific scoring matrices.J Mol Biol292195202
  68. 68. Kabsch W, Sander C (1983) Dictionary of Protein Secondary Structure - Pattern-Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 22: 2577–2637.W. KabschC. Sander1983Dictionary of Protein Secondary Structure - Pattern-Recognition of Hydrogen-Bonded and Geometrical Features.Biopolymers2225772637
  69. 69. Peng J, Xu J (2010) Low-homology protein threading. Bioinformatics 26: i294–300.J. PengJ. Xu2010Low-homology protein threading.Bioinformatics26i294300