Fig 1.
Number of U.S. based clinical trials per year.
Fig 2.
Distribution of enrollment rate category over time.
Table 1.
Characteristics of the clinical trials.
Summary statistics (percentage for categorical variables, median with interquartile range for continuous variables) of features in each enrollment rate categories were shown. “missing” level represents missing value. Only the top 5 most prevalent MeSH terms were shown.
Table 2.
The feature sets used for information content analysis.
Fig 3.
Predictive performance of models constructed with various classification methods using the complete set of 4,636 features.
The predictive performance for the various classifiers is similar, and each outperforms the dummy classifier. Note: On most plots, the performance of the logistic classifier is not visible, since its performance is the same as the elastic net.
Fig 4.
Predictive performance of random forest models with different feature subsets.
Fig 5.
Predictive performance of random forest models with and without domain adaptation on dataset with reduced MeSH features.