Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization
Fig 1
General approach to machine learning of protein (ChR) structure-function relationships: diversity generation, measurements on a training set, and modeling.
(1) Structure-guided SCHEMA recombination is used to select block boundaries for shuffling protein sequences to generate a sequence-diverse ChR library starting from three parent ChRs (shown in red, green, and blue). (2) A subset of the library serves as the training set. Genes for these chimeras are synthesized and cloned into a mammalian expression vector, and the transfected cells are assayed for ChR expression and localization. (3) Two different models, classification and regression, are trained using the training data and then verified. The classification model is used to explore diverse sequences predicted to have ‘high’ localization. The regression model is used to design ChRs with optimal localization to the plasma membrane.