Generative AI mitigates representation bias and improves model fairness through synthetic health data
Fig 2
Two-dimensional representations of the sepsis dataset for Black patients, including marginal distributions of the principal components.
Top panels: PCA two-dimensional representation of real (red) and synthetic (blue) data, where CA-GAN provides more coverage than SMOTE (especially in the top right and bottom left part of the panel), while WGAN-GP* provides the lowest coverage. Bottom panels: t-SNE two-dimensional representation of real data (red) and synthetic data (blue) for the three methods SMOTE, WGAN-GP, and CA-GAN. It can be seen that SMOTE follows an interpolation pattern, while CA-GAN expands into latent space, generating authentic data points while remaining within the clusters identified by t-SNE. Data generated by WGAN-GP* fall outside of the real data.