Fig 1.
Steps followed for building QSAR models.
QSAR: Quantitative Structure-Activity Relationship; GPMT: Guinea Pig Maximization Test; HSDB: Hazardous Substance DataBase; LLNA: Local Lymph Node Assay; REACH: Registration, Evaluation and Authorization of Chemicals; MLP: Multi-Layer Perceptron; RF: Random Forest; SL: Simple Logistic; SMO: Sequential Minimal Optimization; Numbers in curly brackets represent the count of respective entities (i.e. molecules, descriptors and fingerprints).
Table 1.
Datasets used for building QSAR models.
Fig 2.
Descriptor sets used for QSAR models.
Fig 3.
Integration of QSAR models, similarity information and sub-structure pattern into prediction workflows (PWs).
Blue and red colors depict components that differ in the two Prediction Workflows, PW-1 and PW-2. Components in black and grey are those that are common in both PW-1 and PW-2. QSAR: Quantitative Structure-Activity Relationship; MLP: Multi-Layer Perceptron; SMO: Sequential Minimal Optimization; Eo: Energy-optimized dataset; RTS: Representative test set; Challenge-1: Challenge set-1; Challenge-2: Challenge set-2; m2, m3, m4, ssimilarity and ssubstr are predictions from QSAR. models-2, 3 and 4, similarity information and sub-structure pattern, and wm2, wm3, wm4, wsimilarity and wsubstr are their corresponding weights.
Table 2.
Weights used for components of prediction workflows in knowledge-based optimization.
Fig 4.
Percent prediction accuracy of short-listed variants of models.
Color-coded scale from green to red indicates decreasing prediction accuracy. RTS and Challenge-1 sets are expanded to show the prediction accuracy for each category of sensitizers and non-sensitizers. Internal: Internal test set; RTS: Representative test set; Challenge-1: Challenge set-1; Both: Internal & RTS; X: Extreme; St: Strong; S: Sensitizer with unknown potency; M: Moderate; W: Weak; N: Non-sensitizer.
Table 3.
Performance of prediction workflows with machine learning methods and knowledge-based optimization.
Table 4.
Leave-one out analysis to assess the contributions of QSAR models, similarity information and sub-structure pattern to the prediction performance of prediction workflows.
Fig 5.
Comparative performance of our prediction workflows with VEGA v1.08.
Panel A: Molecules of challenge set-1 processed by our prediction workflows (= 74) and VEGA v1.08 (= 69) used for computation; Panel B: 69 molecules of challenge set-1 processed by our prediction workflows as well as VEGA v1.08 were used for computation; Panel C: Molecules of challenge set-2 processed by our prediction workflows (= 77) and VEGA v1.08 (= 68) used for computation; Panel D: 68 molecules of challenge set-2 processed by our prediction workflows as well as VEGA v1.08 were used for computation. VEGA v1.08: orange bars; PW-1: blue bars; PW-2: green bars. CCR: Correct Classification Rate.
Fig 6.
Table on the left shows SMILES of input molecules; ‘Predictions’ section shows prediction result for the selected molecule along with predicted reaction mechanism and domain information; ‘Molecular Visualization’ depicts the structure of selected molecule, along with skin protein reactive sub-structure(s) (if any) highlighted in cyan; ‘Similarity Search Result’ shows parent set molecules found similar to selected input molecule along with details such as Tanimoto coefficient; ‘Export Type’ offers various options to export SkinSense result.