Table 1.
The demographics of the NHIS dataset that was used in our ANN.
We show means and standard deviations for the continuous variables, means for the binary variables, and the percentage for each race.
Fig 1.
All lines are weights connecting one layer to next, with each circle either being an input, neuron, or output. The bias terms are analogous to intercepts and they improve the model’s performance.
Table 2.
A description of the inputs used in our ANN.
Fig 2.
The sensitivity and specificity for the training and validation datasets as functions of the cutoff values.
Fig 3.
An ROC plot for our ANN’s training and validation datasets.
Fig 4.
An ROC plot for our ANN’s training and validation datasets as well as the performance of Random Forest and Support Vector Machine.
Fig 5.
Cumulative distribution function for high risk (solid line) and low risk (dashed line) population without cancer (orange) and population with cancer (blue) populations in the validation dataset.
Allowing for a 1% misclassification rate (black line), we can divide individual cancer risk into 3 categories: high (red), medium (yellow), and low (green, too narrow to see on the left of this figure).
Table 3.
NHIS 2016 data risk stratification results by our ANN.
Table 4.
The various screening methods, with their sensitivities and specificities.