Integrated Computational Solution for Predicting Skin Sensitization Potential of Molecules

doi:10.1371/journal.pone.0155419

Fig 1.

Steps followed for building QSAR models.

QSAR: Quantitative Structure-Activity Relationship; GPMT: Guinea Pig Maximization Test; HSDB: Hazardous Substance DataBase; LLNA: Local Lymph Node Assay; REACH: Registration, Evaluation and Authorization of Chemicals; MLP: Multi-Layer Perceptron; RF: Random Forest; SL: Simple Logistic; SMO: Sequential Minimal Optimization; Numbers in curly brackets represent the count of respective entities (i.e. molecules, descriptors and fingerprints).

More »

Expand

Table 1.

Datasets used for building QSAR models.

More »

Expand

Fig 2.

Descriptor sets used for QSAR models.

More »

Expand

Fig 3.

Integration of QSAR models, similarity information and sub-structure pattern into prediction workflows (PWs).

Blue and red colors depict components that differ in the two Prediction Workflows, PW-1 and PW-2. Components in black and grey are those that are common in both PW-1 and PW-2. QSAR: Quantitative Structure-Activity Relationship; MLP: Multi-Layer Perceptron; SMO: Sequential Minimal Optimization; E_o: Energy-optimized dataset; RTS: Representative test set; Challenge-1: Challenge set-1; Challenge-2: Challenge set-2; m₂, m₃, m₄, s_similarity and s_substr are predictions from QSAR. models-2, 3 and 4, similarity information and sub-structure pattern, and w_m2, w_m3, w_m4, w_similarity and w_substr are their corresponding weights.

More »

Expand

Table 2.

Weights used for components of prediction workflows in knowledge-based optimization.

More »

Expand

Fig 4.

Percent prediction accuracy of short-listed variants of models.

Color-coded scale from green to red indicates decreasing prediction accuracy. RTS and Challenge-1 sets are expanded to show the prediction accuracy for each category of sensitizers and non-sensitizers. Internal: Internal test set; RTS: Representative test set; Challenge-1: Challenge set-1; Both: Internal & RTS; X: Extreme; St: Strong; S: Sensitizer with unknown potency; M: Moderate; W: Weak; N: Non-sensitizer.

More »

Expand

Table 3.

Performance of prediction workflows with machine learning methods and knowledge-based optimization.

More »

Expand

Table 4.

Leave-one out analysis to assess the contributions of QSAR models, similarity information and sub-structure pattern to the prediction performance of prediction workflows.

More »

Expand

Fig 5.

Comparative performance of our prediction workflows with VEGA v1.08.

Panel A: Molecules of challenge set-1 processed by our prediction workflows (= 74) and VEGA v1.08 (= 69) used for computation; Panel B: 69 molecules of challenge set-1 processed by our prediction workflows as well as VEGA v1.08 were used for computation; Panel C: Molecules of challenge set-2 processed by our prediction workflows (= 77) and VEGA v1.08 (= 68) used for computation; Panel D: 68 molecules of challenge set-2 processed by our prediction workflows as well as VEGA v1.08 were used for computation. VEGA v1.08: orange bars; PW-1: blue bars; PW-2: green bars. CCR: Correct Classification Rate.

More »

Expand

Fig 6.

SkinSense–Result Screen.

Table on the left shows SMILES of input molecules; ‘Predictions’ section shows prediction result for the selected molecule along with predicted reaction mechanism and domain information; ‘Molecular Visualization’ depicts the structure of selected molecule, along with skin protein reactive sub-structure(s) (if any) highlighted in cyan; ‘Similarity Search Result’ shows parent set molecules found similar to selected input molecule along with details such as Tanimoto coefficient; ‘Export Type’ offers various options to export SkinSense result.

More »

Expand