Fig 1.
Compound selection process.
Fig 2.
Similarity of pairs in the 52.5K compound set.
Fig 3.
Property differences between Simple and Complex structures in a subset of the Enamine Real database and diversity analysis.
(A) Property differences between simple (S) and complex (M), (B) Murcko framework analysis comparing the value of an additional 6,000 complex compounds to an additional 6,000 simple compounds on a base of 24,000 simple compounds. The complex compounds add 15% more Murcko scaffolds and 50% more Murcko frameworks compared to the simple compounds.
Fig 4.
Final library make up of simple and complex structures.
Fig 5.
(A) Binned MWt 320–380, (B) binned sLogP 1–3, (C) binned TPSA 20–140, (D) binned SFI 2–6 (E) charge class, (F) number of HBD 0–3, (G) HBA 2–8, (H) synthetic type M (complex), S (simple), (I) number of aromatic rings 0–4, (J) number of Rotatable bonds 1–7.
Fig 6.
Properties of the first GHCDL, small polar library and GHCDL-V2 (A-C) Binned MWt, (D-F) binned sLogP, (G-I) binned TPSA (J-L) binned SP3 ratio.
Fig 7.
PCA-t-SNE analysis of the first GHCDL and GHCDL-V2 and the Small Polar Library, (A) Distribution of the three libraries (GHCDL, GHCDL-V2 and Small Polar) in the chemical space explored by t-SNE (PCA). Histogram shows the distribution of each dimension, (B). Kmeans clustering analysis of the three libraries and their distribution in each dimension of t-SNE(PCA) analysis, (C) contribution of each libraries to each cluster. Count expresses the number of compounds in each cluster.
Fig 8.
Box plots showing the different physicochemical properties of the original GHCDL, the new GHCDL-V2 and the Small Polar Library.