Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

An overview of the workflow.

Pink box: the steps of structure curation and preparation of test sets and datasets for training set construction. Light blue box: the steps of training set inactives selections and model building. Dark blue box: predicting the external validation test set, cross-validation sets and the REACH set in the four models. Green box: inter-model comparisons of the predictive performances from the external validations and the coverage of the REACH set.

More »

Fig 1 Expand

Table 1.

Overview of the datasets and their distributions of active and inactive experimental results.

More »

Table 1 Expand

Table 2.

The results from the 10 times 20% out LPDM cross-validations of the three modelling approaches applied to the 2:1 training set (within the structural and probability AD).

More »

Table 2 Expand

Table 3.

The results from the two times five-fold DTU Food cross-validation procedure of the cocktail models with different active-to-inactive ratios.

More »

Table 3 Expand

Table 4.

The results from the external validation of the models including model AD sizes for the test set.

More »

Table 4 Expand

Fig 2.

The most significant activity and inactivity structural features occurring in the Rational-final model.

(A) Structural features alerting for activity in the Rational-final model. (B) Structural features alerting for inactivity in the Rational-final model. The selection of activity features was based on a ranking by the formula |0.2 - |∙ χ2, where is the mean activity of all training set structures containing the feature. The selection of inactivity features was done by significance (χ2) among the ‘pure’ inactivity features, i.e. only appearing in inactive substances. In both cases χ2 denotes Chi-square independence test with one degree of freedom with Yates’ correction.

More »

Fig 2 Expand

Fig 3.

The most significant activity and inactivity structural features occurring in the Random-final model.

(A) Structural features alerting for activity in the Random-final model. (B) Structural features alerting for inactivity in the Random-final model. The selection of activity features was based on a ranking by the formula |0.2 - |∙ χ2, where is the mean activity of all training set structures containing the feature. The selection of inactivity features was done by significance (χ2) among the ‘pure’ inactivity features, i.e. only appearing in inactive substances. In both cases χ2 denotes Chi-square independence test with one degree of freedom with Yates’ correction.

More »

Fig 3 Expand

Fig 4.

Performance of QSAR2:1, QSAR3:1, QSAR4:1 and QSAR4:1R vs. REACH coverage.

The performance is described by Sensitivity (A), Specificity (B) and Balanced Accuracy (C). The following tokens correspond to the rational selection approach: a yellow diamond for QSAR2:1, a yellow triangle for QSAR3:1 and a yellow square for QSAR4:1. The blue circle corresponds to the random selection approach for QSAR4:1-R.

More »

Fig 4 Expand

Table 5.

Number of substances covered, (% of screened REACH substances), number of predicted actives (% of covered) and number of predicted inactives (% of covered) from predicting the REACH set of 80,086 substances.

More »

Table 5 Expand