Evolutionary history of calcium-sensing receptors unveils hyper/hypocalcemia-causing mutations

doi:10.1371/journal.pcbi.1012591

Evolutionary history of calcium-sensing receptors unveils hyper/hypocalcemia-causing mutations

Fig 6

Gradient Boosting Trees Machine Learning Approach to Predict the Mutation Types in CaSR.

(A)Model architecture. We took 94 GoF and 243 LoF mutations from the literature. We divided subfamily alignments and mutations randomly as 80% training and the remaining 20% test data before creating feature matrices to prevent information leakage. 25% of the training data was randomly picked as the validation data five times for cross-validation. For each dataset split we used the sklearn train test split model with stratify option to keep the LoF to GoF ratio almost the same in the datasets. We used MSA of CaSR, CaSR-likes, GPRC6A and TAS1Rs to generate features as well as amino acid physico-chemical features and domain information. We performed 50 replications. (B) The performance and feature importance of XGBoost algorithm. The AUROC and AUPR values of 50 replications are shown. The average AUC levels of 50 replications are 0.83 and 0.78 for the train and test respectively. The average AUPR levels of 50 replications are 0.93 and 0.9 for the train and test, respectively. Contributions of Shapley values for type of pathogenicity classification to the model output for XGBoost. aa0: the amino acid found in the human CaSR, aa1: substituted amino acid, AF: average flexibility, TMT: TM tendency, ZP: Zimmerman polarity, B: BLOSUM62, AWR: atomic weight ratio, TM: transmembrane domain. Further details about these features can be found in materials and methods section.

doi: https://doi.org/10.1371/journal.pcbi.1012591.g006