Accurate prediction of functional, structural, and stability changes in PITX2 mutations using in silico bioinformatics algorithms

Mutations in PITX2 have been implicated in several genetic disorders, particularly Axenfeld-Rieger syndrome. In order to determine the most reliable bioinformatics tools to assess the likely pathogenicity of PITX2 variants, the results of bioinformatics predictions were compared to the impact of variants on PITX2 structure and function. The MutPred, Provean, and PMUT bioinformatic tools were found to have the highest performance in predicting the pathogenicity effects of all 18 characterized missense variants in PITX2, all with sensitivity and specificity >93%. Applying these three programs to assess the likely pathogenicity of 13 previously uncharacterized PITX2 missense variants predicted 12/13 variants as deleterious, except A30V which was predicted as benign variant for all programs. Molecular modeling of the PITX2 homoedomain predicts that of the 31 known PITX2 variants, L54Q, F58L, V83F, V83L, W86C, W86S, and R91P alter PITX2’s structure. In contrast, the remaining 24 variants are not predicted to change PITX2’s structure. The results of molecular modeling, performed on all the PITX2 missense mutations located in the homeodomain, were compared with the findings of eight protein stability programs. CUPSAT was found to be the most reliable in predicting the effect of missense mutations on PITX2 stability. Our results showed that for PITX2, and likely other members of this homeodomain transcription factor family, MutPred, Provean, PMUT, molecular modeling, and CUPSAT can reliably be used to predict PITX2 missense variants pathogenicity.

Identifying new disease-associated variants is becoming increasingly important for genetic testing and it is leading to a significant change in the scale and sensitivity of molecular genetic analysis [14]. One of the most frequent approaches for detecting novel variants in target genes is using direct gene sequencing. However, due to increasing number of newly identified missense variants, it is often difficult to interpret the pathogenicity of these variants as not all the mutations alter protein function, and the ones that do may also have different functional impacts in disease [15,16]. Thus, prior to detailed analyses, novel variants cannot be easily classified as either deleterious or neutral, because of their unknown functional and phenotypic consequences. Therefore, further research should be conducted to validate the genetic diagnosis when a novel missense variant is discovered. Preferably, in vitro characterization of novel variants should be undertaken; however, due to facility limitation, it is often not practicable to experimentally verify the impact of large number of mutations on protein function [17]. Another robust approach to substantiate the pathogenicity is using animal models by generating the homologous mutation that recapitulates the human phenotype; but, similar to in vitro studies, these are time-consuming, labor-intensive, difficult and expensive, making this approach unfeasible to experimentally determine the pathogenicity effects of all novel identified variants [18]. To circumvent the above mentioned limitations and to provide fast and efficient methods for predicting the functional effect of nonsynonymous variants on protein stability, structure, and function, several computational tools have been developed [19][20][21].
Protein stability and structure are key factors affecting function, activity, and regulation of proteins. Conformational changes are necessary for many proteins' function and disease-causing variants can impair protein folding and stability. Missense variants are also capable of impairing protein structure, likely by affecting protein folding, protein-protein interaction, solubility or stability of protein molecules. The structural effect of mutational changes can be examined in silico on the basis of three-dimensional structure, multiple alignments of homologous sequences, and molecular dynamics [22][23][24]. Therefore, analysing sequence data in silico first and detecting a small number of predicted deleterious mutations for further experimental characterization is a key factor in today's genetic and genomic studies.
In general, bioinformatics prediction methods obtain information on amino acid conservation through alignment with homologous and distantly related sequences. The most common criteria considered in many bioinformatics programs for predicting the functional effect of an amino acid substitution are amino acid sequence conservation across multiple species, physicochemical properties of the amino acids involved, database annotations, and potential protein structural changes [23,25,26]. As mentioned above, resources for in vitro and in vivo functional analysis of novel variants are constrained in most clinical laboratories. Therefore, identifying and reporting novel variants that are likely to be pathogenic often requires accurate prediction using computational tools.
In a previous study, we examined the effect of FOXC1 variants on protein structure and function by combining laboratory experiments and in silico techniques. Our results showed that integration of different algorithms with in vitro functional characterization serves as a reliable means of prioritizing, and then functional analyzing, candidate FOXC1 variants [27]. Unlike most previous studies that focused on using only PolyPhen and SIFT to predict the pathogenicity of missense mutations, here, we investigated the predictive value of SIFT, Poly-Phen and nine other prediction tools by comparing their predictions to in vitro functional data for PITX2 variants. The bioinformatics programs found to be most reliable were then used to predict the likely consequences of 13 functionally-uncharacterized PITX2 variants. We also performed molecular modeling on all the PITX2 missense mutations located in the homeodomain and compared the results with the findings of protein stability algorithms to identify the most reliable tools in predicting the effect of missense mutations on PITX2 stability. To the best of our knowledge, this is the first study that incorporates the results of functional studies in conjunction with bioinformatics approaches for predicting the pathogenicity of mutations in PITX2 gene.

Source of missense variants
Lists of PITX2 missense variants were assembled from the previous literature and a search using the ClinVar [28], Human Gene Mutation Database (HGMD) [29], the Genome Aggregation Database (gnomAD), and the single nucleotide polymorphism database (dbSNP). This study found 47 PITX2 missense variants; 31 of which were described in the literature as being associated with ARS or coronary artery disease (CAD), while the remaining 16 variants, were considered as benign variants (Fig 1). Eighteen of the 31 variants were classified as pathogenic based on functional studies utilizing site-directed mutagenesis, expression studies, and other functional analysis ( Table 1). Thirteen of 31 variants were described as associated with ARS and CAD in the absence of functional analyses on PITX2 structure or function. Sixteen SNPs, with population allele frequencies > 0.0005 were identified from the gnomAD and the Clin-Var. Based upon the allele frequency (approximately 10-fold greater than the disease frequency of ARS) these have been considered benign polymorphisms. Nucleotide numbering of the mutations herein indicates cDNA numbering with +1 as the A of the ATG translation initiation codon in the NCBI reference sequence NM_000325.5, while the amino positions are based on the corresponding NCBI reference sequence NP_000316.2. This study is a retrospective case report that does not require ethics committee approval at our institution. All patients' mutations and phenotypes were obtained from previously published studies.
These programs were used to analyse 18 functionally characterised PITX2 missense variants plus 13 additional, functionally uncharacterized PITX2 missense variants.
SIFT program provides functional predictions for coding variants, based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences, collected by PSI-BLAST algorithm [60]. The PolyPhen-2 (Polymorphism phenotyping-2) server predicts possible effect of an amino acid change on the structure and function of a protein using several sources of information such as straightforward physical and comparative considerations [61]. PANTHER-PSEP is a new application that analyses the length of time a given amino acid has been conserved in the lineage leading to the protein of interest. There is a direct association between the conservation time and the likelihood of functional impact [62]. MutPred is a free web-based application that utilizes a random forest algorithm with data based upon the probabilities of loss or gain of properties relating to many protein structures and dynamics, predicted functional properties, and amino acid sequence and evolutionary information [52]. MutationTaster is a tool that combines information derived from various biomedical databases and uses established analysis programs. Unlike SIFT or Poly-Phen-2 which work on DNA level, MutationTaster processes substitutions of single amino  Table 2 for more information on the prediction tools used in this study.

Molecular modeling of the mutant protein structure
The NMR structure of the homeodomain of PITX2 complexed with a TAATCC DNA binding site (PDB: 2LKX) were analyzed by the SWISS-MODEL server (http://www.expasy.org/spdbv/ ; provided in the public domain by the Swiss Institute of Bioinformatics, Geneva, Switzerland). Model structures of wild-type and mutants were created in Swiss-Pdb Viewer and investigated using the ANOLEA server (http://melolab.org/anolea). For structure predictions of PITX2, sequence in FASTA format was obtained from NCBI database (NP_001191327.1).

Calculating changes in protein stability
Eight different protein stability programs (DUET, SDM, mCSM I-Mutant3.0, MUpro, iPTREE-STAB, CUPSAT, and iStable) were used to predict the effects of missense mutations on the stability of PITX2 protein. DUET is a web server that uses integrated computational approach to predict effect of missense mutations on protein stability [66]. DUET calculation is based on complementary data regarding the mutation including secondary structure [67] and a pharmacophore vector [68]. SDM, a computational method, has been demonstrated as the most appropriate method to use along with many other programs. SDM assesses the amino acid substitution occurring at specific structural environment that are tolerated within the family of homologous proteins of defined three dimensional structures and change them into substitution probability tables [69]. mCSM relies on graph-based signature concept and predicts not only the effect of single-point mutations on protein stability, but also protein-protein and protein-nucleic acid binding [70]. I-Mutant3.0 is a neural-network-based web server that predicts automatically protein stability changes upon single point protein mutations based on either protein sequence or protein structure. I-Mutant3.0 can predict the severity effect of a mutation on the stability of the folded protein [71]. MUpro is a set of machine learning programs that accurately calculates protein stability alterations based on primary sequence information particularly where the tertiary structure is unrevealed, overcoming a major restriction of previous methods which are based on the tertiary structure [72]. iPTREE-STAB is a web service and mainly provides two function modules of services including discriminating the stability of a protein upon single amino acid substitutions and predicting their numerical stability values [73]. CUPSAT uses protein environment specific mean force potentials (through solvent accessibility and secondary structure specificity) to analyse and predict protein stability changes upon point mutations [74]. iStable, a combined predictor, was designed by using sequence information and prediction data from various element predictors. iStable is available with two different input types: structural and sequential [75]. Please see Table 3 for more information on the stability predictors used in this study.

Bioinformatics functional predictions
The protein sequence and/or protein structure with mutational position and amino acid residue of 18 previously functionally characterized pathogenic PITX2 missense variants, plus 16 SNPs with a population frequency of higher than 0.05% (thus considered benign polymorphisms), were used to test the predictive value of eleven common bioinformatics prediction programs; SIFT, PolyPhen-2, PANTHER-PSEP, MutPred, MutationTaster, Provean, PMUT, FATHMM, nsSNPAnalyzer, Align GV-GD, and REVEL (Table 4 and Table 5). To evaluate the performances of the programs, seven measures (sensitivity, specificity, accuracy, precision, positive predictive value (PPV), negative predictive value (NPV), and Matthews correlation coefficient (MCC)) were calculated by comparing the results of all programs with previously generated functional data. For PITX2, MutPred, Provean, and PMUT were the most reliable of the bioinformatics tools in predicting the pathogenicity effects of all 18 functionally characterized missense variants in PITX2, with sensitivity and specificity of > 93% (Fig 2). Then, REVEL tool showed high sensitivity and specificity, 94.44% and 87.50%, respectively. Analysis of the sensitivity and specificity SIFT showed that this program had good sensitivity (72.22%) but low specificity (43.75%). Although PolyPhen-2, MutationTaster, PANTHER-PSEP, FATHMM, and Align GV-GD exhibited over 83% sensitivity, they were unable to identify the benign polymorphisms, showing specificity of 37.50%, 6.25%, 43.75%, 6.25%, and 6.25%, respectively. The predictive value of nsSNPAnalayzer was similar to that of SIFT program, with sensitivity and specificity of 66.67% and 43.75%, respectively.
The most reliable programs found in this study's analyses (MutPred, Provean, and PMUT) were then used to predict the likely pathogenicity of 13 PITX2 missense variants for which functional testing has not been performed (Table 6). Interestingly, the A30V variant unanimously was predicted as benign by all three programs. The remaining 12 PITX2 variants were predicted to be disease-associated mutations by all programs.

Molecular modeling of PITX2
Molecular models for the homeodomain of wild-type and variant-containing PITX2 proteins were designed using threading algorithms to assess impairment of PITX2 structure by missense variants.
Three functionally characterised variants, N100D, L105V, and N108T, were excluded from these molecular modeling analyses since they are not located in the homeodomain, which is the only portion of PITX2 with a known structure. Wild-type amino acids were changed to variant residues to determine putative structural effects of the remaining 15 functionally analysed PITX2 variants through ANOLEA mean force potential calculations. The molecular modeling identified three mutations as high-risk (L54Q, V83L, and R91P) to change the structure of PITX2, particularly in the H1, H2, and H3 subdomains (Fig 3). The R91P variant was predicted to grossly disrupt the non-local amino acid side chain contacts. Similar, although less profound, effects were predicted when L54 and V83 were altered to glutamine and leucine, respectively. In contrast, the remaining twelve amino acid variants showed no predicted substantially altered pairwise interactions, indicating that these missense variants are predicted to have minor or no effects on PITX2's structure (S1 Fig). Molecular modeling was also performed on the nine functionally uncharacterised PITX2 missense mutations located in the homeodomain. Four mutations (F58L, V83F, W86C, W86S) were predicted to change the structure of PITX2 (Fig 4), while, the remaining five variants (R62H, P64L, P64R, R69C, and R90P) were predicted to have minor or no impact on PITX2's structure (S2 Fig).

Evaluation of the different algorithms in predicting stability changes
To assess the performance of eight different stability predictor programs (DUET, SDM, mCSM, I-Mutant3.0, MUpro, iPTREE-STAB, CUPSAT, and iStable) in predicting the effect of missense mutations on PITX2 protein stability, the change in protein stability (ΔΔG) were computed for all 24 PITX2 homeodomain variants (15 functionally characterised and 9 functionally uncharacterised mutations) (Table 7). Of these eight programs, CUPSAT was the most consistent with the results of our molecular modeling, by identifying 5 of 7 destabilizing mutations that were also predicted to be destabilizing by molecular modeling (V83L, V83F, W86S, W86C, and R91P).  Computational analysis of PITX2 mutations SDM also showed high consistency with the results of our molecular modeling, by detecting 4 of 7 destabilizing mutations that were also predicted to be destabilizing by molecular modeling (L54Q, R91P, W86S, and W86C). DUET, mCSM, and I-Mutant3.0 identified 3 and iPTREE-STAB detected 2 of 7 destabilizing mutations detected by molecular modeling. MUpro and iStable were unable to identify any of the 7 destabilizing mutations predicted by molecular modeling.

Discussion
Although in silico programs are not a substitute for wet-lab experiments, they can provide a supportive role in the experimental validation of disease-associated alleles and can help further diagnostic strategies by prioritizing the most likely pathogenic novel variants.
While many tools are available for assessing the functional significance of variants, determining the reliability of prediction results is challenging. In this context, the current study investigated the combination of experimental findings, molecular modeling, in silico mutation prediction programs, and stability prediction software to assess the pathogenicity of PITX2 missense variants. In silico methods that correctly identify deleterious variants do not always inevitably work well for benign predictions. The methods determined by this study to be preferred for analyses of PITX2 variants were those best able to distinguish both pathogenic and benign variants, thus yielding the highest accuracy. Our results showed that MutPred, Provean, and PMUT tools were the most accurate in predicting pathogenicity of PITX2 missense variants (Fig 2). The sensitivity and specificity of these three tools in recognizing PITX2 disease-causing variants were over 93%, indicating the strong performance of these programs in identifying as pathogenic only PITX2 variants with significant functional defects. After these three tools, REVEL showed highest sensitivity and specificity, 94.44% and 87.50%, respectively. SIFT showed good sensitivity (72.22%) but low specificity (43.75%). PolyPhen-2, MutationTaster and PANTHER-PSEP, FATHMM, and Align GV-GD demonstrated > 83% sensitivity, but, they were unable to identify the benign polymorphisms, showing the specificity of 37.50%, 6.25%, 43.75%, 6.25%, and 6.25%, respectively. The predictive value of nsSNPAnalayzer was similar to that of SIFT program, with sensitivity and specificity of 66.67% and 43.75%, respectively. Our results showed, therefore, that Computational analysis of PITX2 mutations MutPred, Provean, and PMUT can be utilized with high confidence to test whether or not a PITX2 missense variant is likely to be deleterious. Interestingly, MutPred was the only in silico program that ranked in the top three programs in identifying both pathogenic and benign PITX2 and FOXC1 variants [27]. A likely explanation for MutPred's high ranking is that it evaluates the most factors in making assessments. However, since the number of variants available for testing in this study were small, a larger dataset would confirm that our results are reproducible and generally applicable.
The three programs that were found to be the most reliable (MutPred, Provean, and PMUT) were then used to assess the likely pathogenicity of thirteen PITX2 missense variants for which functional analyses have not been performed, but which have been associated with ARS or CAD (Table 6). Our results showed that MutPred, Provean, and PMUT predicted as pathogenetic 12/13 of the variants. The A30V variant was scored as non-pathogenetic/benign by all three programs. While it is possible that A30V is an example of a false negative for all three programs, it is likely that this variant is instead benign. Functional testing of the A30V variant is needed to determine which of these possibilities is accurate.
Various intramolecular interactions are involve in stabilizing and folded state of protein, including hydrophobic, electrostatic, and hydrogen-bonding [95][96][97][98]. The stability state of a protein is key factor in its proper functionality. In fact, up to 80% of Mendelian disease-causing mutations in protein coding regions are predicted to be caused by altering protein stability [99]. In recent years, due to the availability of high-throughput array-based genotyping methods [100] and next generation sequencing platforms [101,102], a large number of SNPs has been reported. However, the association of missense variants with protein stability has often been difficult to predict. Fortunately, recent advances in computational prediction of protein stability offers potential insight into this question. We used two parallel prediction methods to investigate the possible effects on PITX2 protein structure and stability of missense variants.
Knowledge of a protein's 3D structure can be used to predict the functionality of protein and the possible impact of variants on protein conformation and structure. We thus first used molecular modelling analyses to assess and compared the total energy difference between native and mutated modeled structure of PITX2 proteins. The results predicted that while most PITX2 variants did not dramatically affect the protein tertiary structure, seven variants (L54Q, F58L, V83F, V83L, W86C, W86S, and R91P) altered the total energy level in comparison with the native structure, suggesting that these amino acid substitutions changed the structure of the PITX2 protein. Molecular modeling of the PITX2 homeodomain predicted that these variants impair the required energy to maintain the proper folding of helix 1-3 and cause global destabilization of the structure of PITX2. These seven amino acids are either invariant (e.g., W86) or highly conserved in the approximately 300 homeobox proteins analyzed, consistent with a pivotal role of these residues in the homeodomain [103][104][105]. These seven amino acids are tightly packed hydrophobic amino acids responsible for holding helices  of the PITX2 homeodomain together, supporting our molecular modeling predicting that mutations of these amino acids disrupt PITX2 structure. For F58L, V83F, and V83L, the native wild-type residues and the introduced mutant residues differ in size, probably causing loss of hydrophobic interactions in the core of the protein, particularly involving helix 1-3. For L54Q, W86C, W86S, and R91P, the wild-type residues and the mutant residues are different in both size and charge, likely disturb the local structure of protein thereby altering protein structure and function. Residues V83, W86, and R91 are located within the third helix which is specifically responsible for binding with the major groove of the DNA [106]. Thus, the prediction that these mutations impair the capacity of this helix to interact with DNA is consistent with this knowledge and with previous functional characterizations that showed reduced DNA-binding capacities of the V83L and R91P mutant PITX2 proteins [5,107]. Consistent with bioinformatics predictions of deleterious affects of mutation of W86, mutations of the neighboring amino acids (R84W and K88E) have been shown to decrease the ability of the mutant proteins to interact with DNA [39,108].
Residues L54 and F58 are located in helix 1 of the homeodomain, responsible for contacting with the minor groove of the DNA. Molecular modeling of L54Q is consistent with the Computational analysis of PITX2 mutations suggestion that mutations in these highly-conserved residues in helix 1 of the homeodomain might disturb the DNA-protein binding affinity. Our prediction is supported by the fact that changing the leucine to a glutamine (L54Q) disrupts DNA-protein complex, indicating the necessity of leucine at position 54 for PITX2 binding ability [109]. Thus, consistent with our recent studies on FOXC1 protein [110], the results of molecular modeling of PITX2 are strongly consistent with the functional characterization of PITX2 missense variants. The results from our molecular modeling analysis were also compared to the predictions of eight stability predictor methods (DUET, SDM, mCSM, I-mutant3.0, MUpro, iPTREE-STAB, CUPSAT, and iStable). Based on our analyses, it appears that CUPSAT performs the best of the seven methods evaluated here in predicting the effect of missense mutations on PITX2 protein stability, with SDM, DUET, mCSM, and I-Mutant3.0, performing weaker, consistent with the results of previous studies [111,112]. Our results indicate that further studies are required to improve ΔΔG predictions, especially for buried amino acids.
In this study, for the first time, we evaluated the impact of missense variants on PITX2 stability, structure and function by integrating stability prediction algorithms, bioinformatics mutation prediction tools, and molecular modeling. Our results showed that MutPred, Provean, PMUT, molecular modeling, and CUPSAT are reliable methods to assess PITX family missense variants in the absence of laboratory experiments. However, for our analyses, it must be noted that we used sixteen SNPs as non-pathogenetic control variants to investigate the performance of prediction programs. Although we considered SNPs with a population frequency of >0.05% as benign, we cannot formally exclude that these SNPs might have un-documented pathogenic effects on PITX2. In addition, while the prediction methods used in this study are not gene-specific, generalization of the performance of these programs to other human genes may be inappropriate without additional study. When assessing the pathogenicity of missense variants, it is necessary to be cautious on depending merely on in silico programs without wetlab experiments. According to standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology, in silico predictions only serve as one supporting factor, whereas functional tests are frequently needed to assess the pathogenicity of missense variants. In particular, as per clinical guidelines for the interpretation of single substitution variants, the output of computational tools should be interpreted in the light of functional studies results, population frequency data and segregation in affected families.