Table 1.
Summary of Related Work across the Tabular ML Pipeline.
Table 2.
Descriptive statistics for selected variables in the raw dataset.
Table 3.
Summary statistics of raw features before transformation.
Fig 1.
Three separate visualizations replacing the original pairplot.
These plots illustrate the relationships and distributional patterns among age, waiting_days, and the target label show_up. (a) Scatter: age vs. waiting_days by show_up. (b) Distribution of waiting_days across classes. (c) Distribution of age across classes.
Table 4.
Deterministic controls ensuring reproducible LLaMA-7B prompt behavior.
Table 5.
Summary of preprocessing measures applied to dataset irregularities.
Table 6.
Deterministic controls used to ensure reproducible LLaMA-7B behavior.
Fig 2.
End-to-end pipeline architecture showing the major modules from input data ingestion to profiling and final reporting.
Fig 3.
Detailed data flow diagram outlining each transformation step, branching logic, and artifact generation during the pipeline lifecycle.
Table 7.
Comparison of popular LLMs for local tabular preprocessing tasks.
Table 8.
Final dataset profile and runtime environment.
Table 9.
Comparative classification performance: LLaMA 7B vs. Mistral 7B on medical no-show prediction.
Table 10.
Final classification performance: Logistic Regression vs. XGBoost on medical no-show prediction.
Fig 4.
Spearman correlation heatmap across numeric and encoded categorical features.
Strong negative correlation observed between WaitingDays and appointment-related features.
Fig 5.
Boxen plot illustrating the distribution of waiting days by gender and no-show status.
Longer wait times are more common among no-show patients, particularly among males.
Table 11.
Categorical feature distributions.
The complete categorical distribution table is provided in S1 Table.
Fig 6.
Missing value matrix confirming complete data coverage across all key features.
Fig 7.
Unique values per feature.
Fig 8.
Heatmap of appointment counts across neighborhoods.
Fig 9.
Scheduled appointments over time.
Fig 10.
Appointment dates distribution.
Fig 11.
Age ECDF distribution.
Fig 12.
Waiting days distribution.
Fig 13.
Correlation among encoded features.
The feature encoding map used to generate the encoded matrix is provided in S2 Table.
Fig 14.
Missingness after engineering.
The full missingness report is provided in S3 Table.
Fig 15.
Scaled numeric features (violin plot).
Fig 16.
Confusion matrix of predictions.
Table 12.
Class distribution and support counts for the fine-tuned LLaMA 7B model.
Fig 17.
Precision–recall curve of the logistic regression classifier (AP = 0.87).
Fig 18.
ROC curve with AUC = 0.65.
Fig 19.
Explained variance per PCA component.
Fig 20.
t-SNE visualization of LLaMA embeddings.
Separation between ‘Show’ and ‘No-show’ is evident but not clean, indicating potential for further tuning.
Fig 21.
SHAP dependence plot for age.
Fig 22.
SHAP summary plot showing global feature importance and individual SHAP value distribution.
Fig 23.
SHAP interaction plot between waiting_days and age.
Table 13.
Top-ranked features by mean SHAP value (importance).
The complete feature ranking is provided in S4 Table.
Fig 24.
Violin plots by show-up status.
Fig 25.
PCA scatterplot of features.
Fig 26.
Feature-wise memory usage.