Voxel-wise deep learning segmentation of hydroxyapatite and iodine in spectral photon-counting CT: A quantitative phantom study

doi:10.1371/journal.pone.0346825

Fig 1.

Methodology overview.

(A) Input multi-energy SPCCT data and dataset preparation (including material/placement variability and grid-puzzle augmentation). (B) Deep learning models evaluated (six U-Net–based variants) under a unified training configuration. (C) Outputs and evaluation, including macro-averaged overall and per-class metrics (Dice, sensitivity, specificity, precision, IoU) and qualitative overlays / slice-wise Dice-error plots (). Abbreviations: SPCCT, spectral photon-counting CT; IoU, intersection over union.

More »

Expand

Fig 2.

Overview of the phantom datasets and augmentation.

Left: matrix view with scans as rows and energy bins (E₁ to E₅) as columns. Right: grid puzzle augmentation on a representative slice; the original grayscale DICOM image and the corresponding ground truth (voxels color-coded by class) are shown.

More »

Expand

Table 1.

Final training setups for baseline models. Hyperparameters reported in the original papers are retained; missing values for optimizer (Opt.), learning rate (LR), schedule (Sched.), momentum (Mom.), or weight decay (WD) were set to common defaults and marked with †. SGD: stochastic gradient descent; CE: cross-entropy; Dice: Dice loss. Warm+Cos = linear warm-up followed by cosine annealing of the learning rate.

More »

Expand

Fig 3.

Architecture of the proposed SPFF–UNet.

A four-level 3D UNet preserves the spectral axis by using anisotropic down/upsampling (no pooling along energy F). Two custom modules—EnergyFiLM (per-energy affine modulation) and FourierGate (rFFT-based spectral gating)—are inserted after each double-convolution block in the encoder, bottleneck, and decoder, while skip connections retain full spectral resolution. At the network output, the preserved spectral information is fused to produce 13 class logits per spatial voxel, followed by softmax to generate dense voxel-wise labels. Inputs are SPCCT volumes with five energy bins; outputs are dense voxel-level labels for 13 material classes (hydroxyapatite and iodine concentrations, soft-tissue surrogates). SPCCT: spectral photon-counting CT.

More »

Expand

Table 2.

Macro-averaged segmentation performance on the external test set. Metrics are computed over foreground classes only (background excluded; background = all voxels not belonging to the 12 target classes). Values are mean ± SD across three seeds. IoU = Intersection-over-Union. The first five rows are baseline models; SPFF–UNet is the proposed model.

More »

Expand

Fig 4.

Per-class Dice heatmaps for six models (five published baselines and the proposed SPFF–UNet).

Each cell shows mean ± SD across three seeds on the external test set. Background (BG; all voxels not belonging to the 12 target classes) is shown for reference only. The 12 target classes are HA800, HA400, HA200, HA100, HA50, I15, I10, I5, water, adipose, liver, and lung; cell color encodes the per-class Dice mean. Rows correspond to models and columns to classes. Cells labeled N/A (light gray) indicate classes absent from the external test but present in the training set, and are excluded from all summary statistics. A consistent colormap and class order are used across panels. HA: hydroxyapatite; I: iodine.

More »

Expand

Fig 5.

Qualitative overlays of voxel-level segmentations on the external test scan.

Columns (left to right): 3D UNet, UNETR, R2UNet3D, Swin UNETR, ResUNet++, and SPFF–UNet (proposed). Predicted labels are argmax maps color-coded on grayscale SPCCT slices; all panels use identical windowing and a shared class colormap/legend. HA: hydroxyapatite; I: iodine. E: Energy bin (7-12, 12-15, 15-18, 18-21, 21-120 keV).

More »

Expand

Fig 6.

Slice-wise Dice error on the external test scan for three comparators.

(A), (D): baseline 3D UNet; (B), (E): highest-Dice published baseline (ResUNet++); (C), (F): proposed SPFF–UNet. Top row (A)–(C) corresponds to the average of hydroxyapatite classes (HA800, HA400, HA200, HA100); bottom row (D)–(F) corresponds to the average of iodine classes (I15, I10, I5). For each model, per-slice errors are plotted with a solid mean line and dashed ±1.96 SD band, averaged over three seeds. The x-axis represents the test slice index; the y-axis shows the error, defined as . Insets provide zoomed views for ResUNet++ and SPFF–UNet in the hydroxyapatite row, and for SPFF–UNet in the iodine row. These plots show per-slice deviation from perfect overlap (Dice = 1), offering a diagnostic view of segmentation consistency across slices, not classical Bland-Altman plots.

More »

Expand

Table 3.

Ablation on a spectral-preserving UNet backbone. Macro segmentation metrics (mean ± SD over three seeds) on the external test set. All variants retain the energy axis via 1×2×2 down/upsampling; background is excluded from macro averages. Rows are ordered a → e (plain→full). IoU = Intersection-over-Union.

More »

Expand