CQ-CNN: A lightweight hybrid classical–quantum convolutional neural network for Alzheimer’s disease detection using 3D structural brain MRI | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

Subfigure (a) shows the 3D MRI volume represented as voxels in a three-dimensional coordinate system; (b) presents example 2D slices from the axial, coronal, and sagittal planes; and (c) displays corresponding MRI images from these planes with non-skull-stripped images in the top row and skull-stripped images in the bottom row.

More »

Fig 2.

Visualization of the 3D-to-2D slice extraction strategy from volumetric MRI data (axial view).
The slice interval i, calculated using Eq 1, defines the spacing between the selected slices. To exclude boundary regions that primarily contain empty space or non-brain tissue, the first k₁ and the last k₂ slices are discarded. The remaining central slices, calculated using Eq 2, represent the feature-rich region.

More »

Fig 2 — Fig 2.

Visualization of the 3D-to-2D slice extraction strategy from volumetric MRI data (axial view).
The slice interval i, calculated using Eq 1, defines the spacing between the selected slices. To exclude boundary regions that primarily contain empty space or non-brain tissue, the first k₁ and the last k₂ slices are discarded. The remaining central slices, calculated using Eq 2, represent the feature-rich region.

More »

Fig 3.

Schematic depiction of a classical neural network (a) and a quantum neural network (b) for binary classification.
In subfigure (a), denote the m input neurons representing the input features. The hidden layer consists of n neurons represented as , where the superscript [1] indicates the first hidden layer, and the subscript identifies the specific neuron within that layer (e.g., is the first neuron in the first hidden layer). The output layer neurons, representing the predicted probabilities for each class given the input features, are denoted by y₁ and y₂. In subfigure (b), the input and output layers are similar to those in subfigure (a). However, the classical hidden layers are replaced by a 3-qubit PQC. The classical features are first reduced to match the number of qubits, represented as , with three black dots indicating the qubits. These features are then encoded into quantum states through data encoding. A parameterized ansatz is applied to capture complex relationships using quantum operations. Afterward, quantum measurements are performed, and the PQC outputs a classical probability. This probability passes through an intermediate linear layer, denoted as o₁. Finally, o₁ is mapped to the output probability using Eq 9.

More »

Fig 3.

Schematic depiction of a classical neural network (a) and a quantum neural network (b) for binary classification.
In subfigure (a), denote the m input neurons representing the input features. The hidden layer consists of n neurons represented as , where the superscript [1] indicates the first hidden layer, and the subscript identifies the specific neuron within that layer (e.g., is the first neuron in the first hidden layer). The output layer neurons, representing the predicted probabilities for each class given the input features, are denoted by y₁ and y₂. In subfigure (b), the input and output layers are similar to those in subfigure (a). However, the classical hidden layers are replaced by a 3-qubit PQC. The classical features are first reduced to match the number of qubits, represented as , with three black dots indicating the qubits. These features are then encoded into quantum states through data encoding. A parameterized ansatz is applied to capture complex relationships using quantum operations. Afterward, quantum measurements are performed, and the PQC outputs a classical probability. This probability passes through an intermediate linear layer, denoted as o₁. Finally, o₁ is mapped to the output probability using Eq 9.

More »

Fig 4.

The schematic depicts our PQC using ZZFeatureMap encoding, with subfigure (a) showing a 2-qubit circuit and subfigure (b) showing a 3-qubit circuit.
Each qubit is initialized with a Hadamard gate H, followed by phase rotations to encode classical data into a quantum state. Entanglement is then introduced through controlled-Z (CZ) gates, which create correlations between qubits by applying phase shifts based on their classical values. A phase rotation is applied to introduce further phase shifts based on the classical values. The ansatz circuit applies trainable single-qubit rotations to further refine the quantum state.

More »

Fig 4.

The schematic depicts our PQC using ZZFeatureMap encoding, with subfigure (a) showing a 2-qubit circuit and subfigure (b) showing a 3-qubit circuit.
Each qubit is initialized with a Hadamard gate H, followed by phase rotations to encode classical data into a quantum state. Entanglement is then introduced through controlled-Z (CZ) gates, which create correlations between qubits by applying phase shifts based on their classical values. A phase rotation is applied to introduce further phase shifts based on the classical values. The ansatz circuit applies trainable single-qubit rotations to further refine the quantum state.

More »

Fig 5 — Fig 5.

The illustration depicts our CQ-CNN architecture for binary image classification.
The input is a grayscale 2D MRI slice of size 1x128x128, which passes through a convolutional layer with a 5x5 filter, a stride of 1, and no padding, producing 2x124x124 feature maps, followed by 2x2 max-pooling, which reduces it to 2x62x62. A second convolutional layer with the same filter settings generates 4x58x58 feature maps, which are then reduced to 4x29x29 through max-pooling. A dropout layer is applied for regularization, and the output is flattened for the fully connected (dense) layer. The processed data is then fed into the PQC, where classical data is encoded into quantum states, followed by ansatz layers with learnable parameters updated using the gradient descent algorithm defined in Eq 8, and finally measured to produce classification probabilities, resulting in the output vector γ.

More »

Table 1.

Layer-by-layer configuration and parameter count of the proposed CQ-CNN model, where ω represents the number of qubits in the PQC, and indicates the number of trainable parameters within the PQC ansatz.

More »

Table 1 — Table 1.

Layer-by-layer configuration and parameter count of the proposed CQ-CNN model, where ω represents the number of qubits in the PQC, and indicates the number of trainable parameters within the PQC ansatz.

More »

Table 2.

Training configurations for the segmentation (), diffusion (), and classification () models.

More »

Table 2.

Training configurations for the segmentation (), diffusion (), and classification () models.

More »

Fig 6 — Fig 6.

The graph depicts the training progress of the segmentation model, showing the Dice and IoU coefficients over 30 epochs.
The Dice coefficient (orange) increases rapidly and stabilizes around 0.985, while the IoU coefficient (gray) converges to around 0.97.

More »

Fig 7 — Fig 7.

The visuals present the training loss curves over 800 epochs for three distinct diffusion models.
The upper section displays the progression of generated images at different stages of training, showcasing the refinement of details as training advances. The lower graph presents the training loss curves for the three models. The y-axis, shown on a logarithmic scale, highlights the sharp decline in loss during the early stages of training. All three models follow a similar convergence pattern, with losses stabilizing around 700 epochs.

More »

Fig 8 — Fig 8.

Distribution of training and testing images for the moderate dementia and non-dementia classes in the OASIS-2 dataset, shown across four variations with images from different planes: (a) axial, (b) coronal, (c) sagittal, and (d) 3-plane (a combined set containing samples from all three individual planes).

More »

Table 3.

Performance analysis of CQ-CNN models across axial, coronal, sagittal, and combined 3-plane views.
Key evaluation metrics, including precision, F1-score, specificity, accuracy, and training time, are provided for models using both 2-qubit () and 3-qubit () configurations, where i represents experiments conducted on a specific dataset variation. Each metric is reported as the mean and standard deviation over multiple runs. The analysis also examines the impact of skull-stripping (denoted by ) on model performance and compares results based on whether the models were trained with single-plane (2D) or multi-plane (3D) images. Boldface numbers indicate the best performance. The symbol ↑ denotes that a higher value is better, while ↓ signifies that a lower value is better.

More »

Table 3.

Performance analysis of CQ-CNN models across axial, coronal, sagittal, and combined 3-plane views.
Key evaluation metrics, including precision, F1-score, specificity, accuracy, and training time, are provided for models using both 2-qubit () and 3-qubit () configurations, where i represents experiments conducted on a specific dataset variation. Each metric is reported as the mean and standard deviation over multiple runs. The analysis also examines the impact of skull-stripping (denoted by ) on model performance and compares results based on whether the models were trained with single-plane (2D) or multi-plane (3D) images. Boldface numbers indicate the best performance. The symbol ↑ denotes that a higher value is better, while ↓ signifies that a lower value is better.

More »

Fig 9.

Radar plots compare the performance of models with different qubit configurations across evaluation metrics: accuracy (ACC), specificity (SPEC), F1-score (F1), precision (PRE), and training time (T. Time).
Each subplot represents a comparison between the 2-qubit model () and its corresponding 3-qubit model (), where both models are trained on the same dataset i. The radar plots highlight that despite the use of 3-qubit models (e.g., vs. ), the overall performance improvements are minimal. In contrast, training time increases significantly with the addition of qubits.

More »

Fig 9.

Radar plots compare the performance of models with different qubit configurations across evaluation metrics: accuracy (ACC), specificity (SPEC), F1-score (F1), precision (PRE), and training time (T. Time).
Each subplot represents a comparison between the 2-qubit model () and its corresponding 3-qubit model (), where both models are trained on the same dataset i. The radar plots highlight that despite the use of 3-qubit models (e.g., vs. ), the overall performance improvements are minimal. In contrast, training time increases significantly with the addition of qubits.

More »

Fig 10 — Fig 10.

The graphs present the training and validation accuracy curves for the CQ-CNN models across different MRI planes (axial, coronal, sagittal, and 3-plane) and model configurations (classical, 2-qubit, and 3-qubit), with and without skull-stripping, over several epochs.
The classical CNN (top row) shows steady, step-by-step improvement in accuracy with each epoch. In contrast, the CQ-CNN models (middle and bottom rows) exhibit slow convergence during the initial phase of training but then rapidly achieve high accuracy after a few more epochs.

More »

Table 4.

Comparison of our classical–quantum and pure classical models with recent literature approaches for AD detection, highlighting key attributes such as dataset, number of classes, model type, GPU support , segmentation usage , accuracy, parameter count, and model size.

More »

Table 4.

Comparison of our classical–quantum and pure classical models with recent literature approaches for AD detection, highlighting key attributes such as dataset, number of classes, model type, GPU support , segmentation usage , accuracy, parameter count, and model size.

More »

Fig 11 — Fig 11.

Control experiments illustrate feature separability and training stability between MNIST binary pairs (0v1, 2v3, 4v5) and the OASIS-2 MRI dataset for AD classification across axial, coronal, and sagittal views.
The top row presents t-SNE visualizations of learned features, where MNIST control tasks yield well-separated clusters, while the OASIS-2 MRI views show entangled distributions between non-dementia and moderate dementia cases. The bottom row plots training accuracy across five independent runs, annotated with ANOVA F-statistics and p-values to assess variability. Consistently low variability and stable convergence in MNIST (non-significant p-values) contrast with significant variability in the axial (p = 0.0027) and coronal (p = 0.0041) views, while the sagittal view remains marginal (p = 0.0513).

More »