Figures
Abstract
Quantum optical neurons (QONs) are emerging as promising computational units that leverage photonic interference to perform neural operations in an energy-efficient and physically grounded manner. Building on recent theoretical proposals, we introduce a family of QON architectures based on Hong–Ou–Mandel (HOM) and Mach–Zehnder (MZ) interferometers, incorporating different photon modulation strategies—phase, amplitude, and intensity. These physical setups yield distinct pre-activation functions, which we implement as fully differentiable software modules. We evaluate these QONs both in isolation and as building blocks of multilayer networks, training them on binary and multiclass image classification tasks using the MNIST and FashionMNIST datasets. Each experiment is repeated over five independent runs and assessed under both ideal and non-ideal conditions to measure accuracy, convergence, and robustness. Across settings, MZ-based neurons exhibit consistently stable behavior—including under noise—while HOM amplitude modulation performs competitively in deeper architectures, in several cases approaching classical performance. In contrast, phase- and intensity-modulated HOM-based variants show reduced stability and greater sensitivity to perturbations. These results highlight the potential of QONs as efficient and scalable components for future quantum-inspired neural architectures and hybrid photonic–electronic systems. The code is publicly available at https://github.com/gvessio/quantum-optical-neurons.
Citation: Andrisani A, Vessio G, Sgobba F, Di Lena F, Santamaria LA, Castellano G (2026) Modeling and benchmarking quantum optical neurons for efficient neural computation. PLoS One 21(3): e0341545. https://doi.org/10.1371/journal.pone.0341545
Editor: Uma Maheswari Rajagopalan, Shibaura Institute of Technology, JAPAN
Received: August 30, 2025; Accepted: January 8, 2026; Published: March 19, 2026
Copyright: © 2026 Andrisani et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All datasets used in this study are publicly available. The MNIST dataset is available at: http://yann.lecun.com/exdb/mnist/. The FashionMNIST dataset is available at: https://github.com/zalandoresearch/fashion-mnist. Both datasets are also directly accessible through the PyTorch ecosystem via the torchvision.datasets module.
Funding: This study was supported by the Italian Ministry of University and Research in the form of a grant awarded to GC (Project CLOSE – ARS01_00141). The specific roles of this author are articulated in the ‘author contributions’ section. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors declare that they have no competing interests.
1 Introduction
Artificial intelligence (AI) based on large-scale neural networks has achieved remarkable success in various fields, including computer vision, language processing, and pattern recognition. This progress is primarily due to the availability of massive datasets and increased computational power. However, the performance of such models comes at the cost of significant energy consumption, prompting interest in alternative hardware architectures that are both efficient and scalable.
One promising direction is optical computing, where information is processed using light instead of electricity [1,2]. Optical systems can perform key neural operations, such as linear combinations and nonlinear activations, using components like interferometers, modulators, and lenses [3–6]. These systems can, in principle, implement artificial neurons with high speed and low power consumption [7].
In parallel, quantum machine learning has investigated how quantum systems can efficiently perform learning tasks by exploiting entanglement [8–11]. Approaches based on physical systems that emulate qubits, such as Bose–Einstein consensation [12,13], spin glasses [14] or ensembles of polarized photons [9,15], have been shown to potentially implement algorithms for segmentation or Boltzmann machine training with an exponential speed-up compared to their classical counterparts. More recently, quantum kernels evaluated through Hong–Ou–Mandel (HOM) interference [16–23] between pairs of time- or frequency-modulated photons—effectively encoding qu-words—have been proposed [24]. Remarkably, despite relying on a minimal system composed of only two particles, this approach still achieves exponential speed-up.
A further advance toward more efficient AI was presented in Roncallo et al. [25], where the concept of quantum optical neuron (QON) was introduced. In this framework, input and weight vectors are encoded into single-photon wavefronts using spatial light modulators. The scalar product between these quantum states is then estimated through HOM interference, enabling a hardware-level realization of a neuron’s pre-activation—namely, the internal combination of inputs and weights prior to the activation function. Significantly, this approach achieves a computational cost independent of input dimensionality (super-exponential speed-up) while exploiting the inherently bosonic nature of photons.
In this work, we extend and generalize the QON model by exploring new optical configurations, particularly a Mach–Zehnder (MZ) [26–33] interferometer setup with dual modulation paths. We investigate amplitude-, phase-, and intensity-based modulation schemes, yielding a broader family of physically realizable pre-activation functions that differ nonlinearly from classical inner products.
Unlike quantum kernel methods [24,34], our approach does not involve high-dimensional Hilbert space embeddings. Instead, we focus on quantum-inspired functions that are experimentally implementable and potentially more efficient in training and inference. QONs are simulated here in software as fully differentiable modules, enabling end-to-end training with gradient-based optimization. We also introduce a layered quantum optical neural network (QONN) architecture built entirely from QONs, and demonstrate its ability to scale to multiclass classification problems.
Beyond expanding the family of physically grounded pre-activation functions, we provide a systematic and statistically robust assessment of QON performance. All experiments are repeated across multiple independent runs and evaluated under both ideal and non-ideal conditions, allowing us to quantify accuracy, convergence stability, and sensitivity to realistic optical imperfections. Performing simulations under non-ideal conditions, accounting for decoherence, noise, and photon losses, could yield valuable insights for the practical implementation of a quantum artificial neuron based on photons. In this regard, we mention that quantum machine learning implemented with photon technologies has already been experimentally verified in [35].
Our contributions can be summarized as follows:
- We develop a unified theoretical and computational framework for QONs based on HOM and MZ interferometric architectures.
- We derive a family of physically grounded pre-activation functions and integrate them into modular, fully differentiable QON and QONN designs.
- We provide a systematic and statistically robust comparison of QON variants—under both ideal and non-ideal conditions and relative to a classical baseline—clarifying their accuracy, stability, and robustness properties.
These results support the idea that QONs are not only theoretically interesting but also practically viable for future low-power and high-speed AI applications, including autonomous vision, wearable health monitoring, remote sensing, and fast scientific imaging.
2 Mathematical framework for quantum optical neurons
2.1 Model and pre-activation
Quantum optical neurons, introduced by Roncallo et al. [25], represent a paradigm in which the input data and weight parameters are independently encoded onto the quantum states of two photons. These states interfere within a quantum optical system, and the resulting interference pattern defines the neuron’s pre-activation. Remarkably, this mechanism enables a computational cost that does not scale with the number of parameters, as the scalar product is physically estimated through photon interference.
Let and
denote input and weight vectors, being N the input dimensionality. Light modulators apply transformations
and
on an initial photon state
, yielding two states:
where is an orthonormal photon state basis. The scalar product of these states defines the pre-activation:
Depending on the optical setup, the output activation can take two forms:
or alternatively,
where h denotes the activation function (e.g., sigmoid), b is a bias term, and for (4), and
are two additional scalar hyperparameters to be fixed.
For comparison, in conventional artificial neurons, the output activation is:
where the scalar product has a cost that scales linearly with N.
Roncallo et al. [25] proposed a HOM interferometer to measure . In Section 3.1, we show that a Mach–Zehnder interferometer allows a broader range of pre-activations with potentially greater flexibility. Indeed, with MZ bunching probabilities, we can recover
and
separately, in place of
as in the HOM interference case, thus allowing output expression estimation like (4).
2.2 Gradient evaluation and backpropagation
Training a QON involves computing the gradient of f with respect to . For the output form (3), the gradient is:
For (4), we have:
The main computational effort lies in evaluating the term , whose form depends on the chosen photon modulation strategy. This term cannot be obtained through straightforward physical measurements, such as
via a HOM interferometer, as demonstrated in [25], or
using a MZ interferometer, which we will discuss in Section 3.1. Instead, it must be computed mathematically.
In a classical artificial neuron, the gradient takes the well-known form:
The computational cost of (8) scales with N, as there are N derivatives that need to be calculated. The computation begins with the first derivative, which has a cost of . However, each of the subsequent derivatives can be computed at a cost that is independent of N. This is because
serves as a common factor among all the derivatives.
When QONs are embedded in deeper architectures, as discussed in Section 3.3, additional gradient terms appear due to the dependencies between layers—for instance, the derivatives with respect to both and
must be propagated through successive QON modules.
2.3 Photon state modulation
The effectiveness and feasibility of QONs depend critically on how and
are encoded onto photon states. This process is performed via light modulators, which apply specific amplitude and/or phase transformations to a reference photon wavefront.
In this section, we describe two primary strategies for modulation: amplitude modulation and phase modulation in the space domain. Each leads to a different mathematical form of the scalar product and its gradient, ultimately influencing the dynamics of training.
Modulation in the spatial domain can be achieved using spatial light modulators (SLMs), which locally alter the amplitude and/or phase components of the photon wavefronts. These modifications occur uniformly within small regions of the wavefront—denoted as pixels—which define the vector basis associated with the SLM action in (1). If the incoming wavefronts are plane and the pixel dimensions are much larger than the photon wavelength, diffraction effects can be neglected. In this case, for
one can write:
where is the normalized two-dimensional rectangular box function centered at the origin with side length L, and
are the pixel centers. We assume that photon wavefronts lie on a plane parallel to the
plane, with pixels of rectangular shape. The state functions in (9) form an orthonormal set.
When the pixel dimensions become comparable to the photon wavelength, diffraction effects can no longer be ignored, and (9) must be interpreted as a boundary condition for the propagation of the electromagnetic wave functions in the interference region. However, if photon interference occurs in the far-field regime—either at sufficiently large distances from the SLMs or at the common focal plane of two imaging lenses with the SLMs as objects—the states are given by the Fourier transform of the right-hand side of (9), in accordance with Fraunhofer diffraction. Without loss of generality, (9) can still be used in the far-field limit, since by Parseval’s theorem the scalar products between two functions remain unchanged when the functions are replaced by their Fourier transforms, up to a constant multiplicative factor.
2.3.1 Amplitude/intensity modulation.
In amplitude modulation, the local amplitude of the photon’s wavefront inside each pixel is scaled according to the corresponding entry of the input or weight vector. Taking into account the normalization condition of quantum states, a natural choice is to set each vector to unit -norm:
so that the modulated quantum states yield the scalar product:
The derivative of (11) with respect to is:
Alternatively, if intensities (rather than amplitudes) are modulated, the quantum state components become proportional to the square roots of vector entries. In this case, the scalar product reads:
The corresponding gradient is:
Observe that since and
are positive values—associated with the pixel transmission coefficients in the SLMs—the following relationship holds:
This means that can be obtained from
. Consequently, MZ interference measurements can also be achieved using HOM interference. Therefore, when limited to amplitude or intensity modulations, HOM interference is preferable to MZ interference, which requires a more complex hardware setup.
2.3.2 Phase modulation.
In phase modulation, each component of the photon’s wavefront is assigned a phase based on the input or weight vector. A typical encoding uses:
so that the scalar product becomes:
About the linear combination appearing in (4), it reads:
where and
. In the following, we set
and
and just leave a tunable phase offset θ.
The gradients in this case are:
3 Mach–Zehnder optical neurons and quantum optical neural networks
In the original QON model proposed by Roncallo et al. [25], the Hong–Ou–Mandel effect is used to define the pre-activation as the squared overlap between two modulated quantum states:
which appears as an argument of h in (3), where and
are single-photon states entering the two input ports of a beam splitter. The probability of detecting two photons in separate output ports (coincidence event) is given by:
so that the pre-activation function can be achieved by a quantum measurement with minimal computational cost:
On the other hand, with a HOM-based QON, it is not possible to efficiently reproduce the pre-activation function
which appears by (19) as the argument of h in (4), since and
cannot be directly extracted from
and would instead require calculations with computational cost
.
In this section, we extend the concept by employing a Mach–Zehnder interferometer. This alternative configuration allows the full recovery of both the real and imaginary parts of the scalar product directly from photon detection statistics, thereby enabling an efficient implementation of the pre-activation function
. After comparing the computational costs of the two QON architectures, we finally illustrate how such neurons can be employed as building blocks for multilayer quantum optical neural networks.
3.1 Mach–Zehnder-based quantum optical neuron
The core component of the MZ QON is the interferometer shown in Fig 1. A two-photon state , where both photons are initially in the same state
, enters port 1 of the first beam splitter:
where creates a photon in the state
at port 1.
The occupation number options for the two photons at different stages are shown. The interferometer includes: (1) two 50:50 beam splitters with input ports (red: 1,2) and output ports (red: 3,4); (2) two photon-number resolving detectors; (3) two light modulators implementing and
; (4) two mirrors; (5) a coincidence counter detecting: (i) two photons at port 3, (ii) two at port 4, or (iii) one at each port. This counter determines the output function f.
Assuming a symmetric 50:50 beam splitter with transfer matrix
the input transforms as:
and the two-photon state becomes:
After modulation by two light modulators:
with
the commutation rules are:
After the first beam splitter and modulation, the state becomes:
3.1.1 Passage through the second beam splitter.
The second symmetric beam splitter acts as:
leading to:
3.1.2 Orthogonal basis for
.
To simplify the evaluation of detection probabilities, we define the normalized state orthogonal to :
with
Introducing creation/annihilation operators ,
for
, the commutation rules are:
3.1.3 Detection probabilities.
We now define the output probabilities:
Using the expressions above and computing the inner products, we obtain the final results:
which satisfy the normalization:
3.1.4 Recovering the scalar product from measurements.
From (42) and (43), we can directly extract
To retrieve , we perform a second measurement session by modifying the modulators as follows:
and define ,
and
accordingly. In this case,
Thus, the scalar product can be reconstructed solely from detection statistics:
while for its modulus:
With reference to the pre-activation function defined in (25), we obtain
With an MZ interferometer, we can also efficiently evaluate , which in this case reads:
3.1.5 Alternative setups.
If both beam splitters are replaced with dielectric 50:50 beam splitters, described by the transfer matrix
all formulas remain unchanged. Conversely, if only one of the two is replaced with a dielectric beam splitter, the real part in Eqs. (41)–(43) is replaced by the imaginary part. This yields an alternative procedure for retrieving in the second measurement session, without the need to modify the light modulators as in (46).
3.1.6 Decoherence and noise effects.
Observations of phenomena in the quantum domain are generally affected by decoherence [36–38]. Decoherence arises from the interaction between a quantum system and its environment, or equivalently from the transfer of information from the system to the environment [37]. As a consequence, phase coherence in a superposition of physical states associated with definite outcomes of a given observable progressively degrades, and the inherent quantum indeterminacy is effectively converted into the classical indeterminacy of statistical mixtures [39–42]. From a mathematical perspective, this transition is described by an exponential decay in time of the off-diagonal elements of the system density matrix.
In the MZ interferometer setup depicted in Fig 1, such off-diagonal elements give rise, in the probabilities (41), (42), and (43), to terms proportional to and
. These contributions are genuinely quantum in nature, originating from the which-path indeterminacy of the photons inside the interferometer and having no classical counterpart. Decoherence attenuates these quantum interference terms through a multiplicative factor V, commonly referred to as the visibility of the interferometer [43]:
Visibility takes values in the interval : values of V close to unity correspond to nearly ideal constructive and destructive quantum interference, while values approaching zero indicate the onset of classical behavior. The visibility of a given interferometer can be experimentally estimated through preliminary calibration measurements.
The probabilities ,
, and
in Eqs. (52)–(54) remain properly normalized, and their relationship with the real part of the scalar product
is given by
An analogous relation holds for the imaginary part of the scalar product, which can be reconstructed from the probabilities and
measured in the complementary configuration:
Quantum measurements are also affected by noise, often at a more intrinsic level than classical measurements, since quantum phenomena are inherently probabilistic. In addition to statistical errors arising from finite sampling of probabilities by event occurrences, other relevant noise sources include dark counts—i.e., detector clicks not originating from photons produced by the interferometer—finite detector efficiency, and the possible lack of photon-number-resolving capabilities. Moreover, environmental fluctuations can induce variability in the interferometer visibility V.
By replacing probabilities with the experimentally observed detection frequencies and
, Eqs. (55) and (56) become
where ,
, and
denote the fluctuations of
,
, and V around their mean values. To first order, these uncertainties can be approximated as
Using Eq. (23), an analogous expression can be derived for the HOM interferometer:
where
3.2 Computational cost of QONs
In this section, we analyze the computational costs of the output functions and their gradients for the QONs previously introduced. For simplicity, we consider the case of an ideal QON. From (3), (4), (19), (23), (45), (47), the output functions read:
In all these cases, the computational cost during the inference phase, i.e., when querying a trained QON, is independent of N, the number of parameters. This results in a super-exponential speed-up compared to the output function of a classical neuron, whose evaluation scales as .
Concerning the computational costs of an artificial neuron during training, which mainly involve computing the derivatives of the output functions and
, they must be analyzed according to the chosen photon modulation scheme.
3.2.1 Amplitude/Intensity modulation.
In the case of amplitude modulation, from (6), (11), and (12) the gradients read:
so that, by (23):
Although the term is estimated by
, the computational cost of
remains
. This is because determining
and
both require
computational resources. Since these calculations are common to all N components of the gradient, the overall computational cost is still
. Thus, when it comes to calculating the gradients, QONs do not have any advantages over classical neurons. A similar analysis applies to the intensity modulation case.
3.2.2 Phase modulation.
From (7) and (20), the partial derivatives of are given by:
Using the HOM interference measurement results from (23), Eq. (63) simplifies to:
The computational cost of evaluating this derivative is , since it requires explicitly computing
, which cannot be derived directly from its modulus
. However, because
is a common term for all first-order derivatives, the total cost for computing the gradient
remains
, matching that of a classical neuron.
Alternatively, using the MZ interference measurement results from (45), (47), and (49), Eq. (63) becomes:
In this case, the computational cost of each derivative is constant, i.e., independent of N. Nevertheless, since N derivatives must be computed, the overall time complexity for the full gradient
remains
.
A similar analysis applies to . Using (7) and (21), its derivatives are given by:
which, using the MZ measurement results from (45) and (47), becomes:
In conclusion, training a QON under phase modulation can require significantly less computation per derivative compared to a classical neuron, especially when using MZ interferometry, since the quantities of interest are extracted directly from detection statistics. However, as the gradient has N components, the overall time complexity for backpropagation remains .
3.3 Quantum optical neural network architecture
We now extend the QON model to define a multilayer quantum optical neural network. Each layer is indexed by , and contains
neurons. A neuron in position p of layer l is denoted by the pair
.
Each neuron is characterized by a set of parameters , from which the complex weight coefficients
are derived. The inputs to each neuron are denoted by
, shared across neurons in the same layer, while the outputs of layer l are collected into the vector
.
The quantum states interfering at the node are
and
, and the output of each neuron is given by a function f of their scalar product. For
, the dependencies are as follows:
3.3.1 Gradient computation for training.
To apply gradient descent during training, we compute the derivative of the final output with respect to a generic parameter
. From (66), we obtain:
This recursive formulation can be conveniently expressed in matrix form:
where:
is the weight matrix;
;
;
is a diagonal matrix with entries
on the diagonal.
By recursively applying (68), we obtain the general formula:
where indicates the product of a sequence in which the factors with higher indices are positioned to the left of those with lower indices, while
3.3.2 Physical setup and beam-splitter configuration.
Fig 2 illustrates a schematic implementation of a feed-forward, fully-connected QONN based on MZ interferometers. Other network architectures, such as recurrent or convolutional neural networks, can also be considered, potentially replacing MZ with HOM interferometers. Each neuron (one highlighted in red) receives light from a common coherent laser source (bottom left), which is distributed by a cascade of beam splitters. The laser intensity is adjusted so that each QON receives at most a pair of photons at a time—identical due to the coherent nature of the laser beam—as required in Section 3.1. Inputs (bottom right) and weights (left) are encoded into quantum states via light modulators.
Blue lines indicate electrical connections; green lines indicate laser paths. Each layer is color-coded. In the first layer (pink), the right-side light modulators are governed by input data, while weights govern the left-side light modulators. In deeper layers, right-side light modulators are driven by the outputs of the previous layer. QON outputs also feed the weight module for training. For clarity, not all connections are shown.
In the first layer, the right-side modulators are driven by input data, while the left-side modulators encode the weights. In subsequent layers, the inputs to each neuron correspond to the outputs of QONs from the previous layer. All outputs are also routed to the weight update module for gradient estimation.
Beam splitters are organized at two levels: global (,
), distributing light across layers, and local (
,
), distributing light to individual neurons within each layer. To ensure proper synchronization, all neurons must receive the same photon flux F. Let
be the total number of neurons, and the total flux from the laser. Then, the flux
entering the l-th layer must satisfy:
To fulfill condition (70), the global reflection coefficients must be set as:
Similarly, within each layer, the local beam splitters must satisfy:
Note that and
, ensuring that the last layer and the last neuron within each layer receive all the remaining light.
4 Practical implementation of a QON
Although practical, implementing a QON presents several technical challenges. In particular, the use of spatial light modulators and spatially resolved detectors typically requires the optical setup to be realized almost entirely in free space. Moreover, the highest-performing SLMs—both in phase and amplitude modulation—are usually reflective architectures [44], which are characterized by limited acceptance angles. This constraint significantly complicates the design of interferometric configurations when compared to schemes based on transmissive SLMs, such as the idealized MZ setup depicted in Fig 1 or the HOM-based configuration described in [25]. Reflective geometries generally require additional optical components, including beam splitters, mirrors, and relay optics, to properly direct and recombine optical paths. As a consequence, careful optical design and precise alignment are necessary to maximize interference visibility.
In addition, in the MZ-based QON, the reconstruction of the bunching probabilities and
appearing in Eqs. (45) and (47) requires photon-number-resolving detectors [45], which are more complex than the non-resolving detectors sufficient for the HOM-based QON.
Beyond physical feasibility, practical deployment of QONs as artificial neurons also requires sufficiently short processing times. In Section 5, we perform simulations assuming an event-counting time of 1 s. Under typical laboratory conditions—considering factors such as laser pulse repetition rate and detector efficiency—shorter observation times may lead to significant statistical deviations between measured event frequencies and their underlying probabilities. Preliminary simulations indicate that QON performance degrades markedly at high noise levels. However, an observation time of 1 s per scalar product would be impractical for an artificial neuron, particularly during training.
To mitigate this limitation, careful optimization of the optical apparatus is required, including the use of high-efficiency detectors, low-loss beam splitters, and stable optical components to minimize photon losses. In addition, operating at higher laser repetition rates can provide sufficient statistics within much shorter acquisition times. For instance, by employing superconducting nanowire single-photon detectors [46] with efficiencies of approximately and photon emission rates on the order of 1 MHz—rather than single-photon avalanche diode (SPAD) detectors [47] with efficiencies around
and emission rates of
Hz, as assumed in the simulations of Section 5—the required observation time could be reduced to approximately
s. Finally, as indicated by Eqs. (58) and (60), maintaining high and stable visibility, achievable through effective environmental isolation of the optical setup, is a critical requirement for reliable QON operation.
5 Benchmarking QON architectures: Performance of pre-activation variants
We introduced different implementations of quantum optical neurons, which vary both in the experimental setup (Mach–Zehnder or Hong–Ou–Mandel) and in the type of photon modulation (phase, amplitude, or intensity). While the modulation strategy determines how input and weight parameters
contribute to the scalar product
, the interferometric architecture dictates how this scalar product is transformed into a pre-activation.
These choices result in different mathematical forms of the pre-activation function , which acts as the core computation within the QON. Below, we summarize the quantum-inspired variants explored in this study:
- MZ interferometer with phase modulation (from (18) and (25) with
):
(73)
- HOM (or MZ) interferometer with phase modulation (from (17) and (22)):
(74)
- HOM interferometer with amplitude modulation (from (11) and (22)):
(75)
- HOM interferometer with intensity modulation (from (13) and (22)):
(76)
In each case, b is a bias term, while θ in (73) is a tunable phase offset (we considered and
). These functions can be compared to the standard linear pre-activation of a classical artificial neuron:
We consider both the ideal formulation of each QON variant and a non-ideal version incorporating decoherence and noise effects, as described in Section 3.1.6. This allows assessing not only accuracy and convergence, but also the robustness of quantum-inspired pre-activations under realistic imperfections.
As discussed in Section 3.1.6, decoherence and noise effects induce fluctuations in the estimation of the real and imaginary parts, or of the squared modulus, of the scalar product between two photon quantum states, as inferred from the interference patterns of an MZ or HOM interferometer, respectively. Since the pre-activation functions defined in Eqs. (73)–(76) depend explicitly on these quantities (apart from the bias term b), their evaluation through a QON is inevitably affected by such fluctuations.
From Eqs. (58) and (60), it follows that the uncertainty in arises from two main contributions: fluctuations in the interferometer visibility and statistical deviations of the observed event frequencies from the corresponding probabilities, both of which depend on the mean visibility value. For the HOM interferometer, Tsujimoto et al. [48] reports a visibility of
. For the MZ interferometer, Kim et al. [49] reports
for coincidence events, but does not provide an uncertainty estimate. We therefore assume the same relative uncertainty as in the HOM case and set
.
Under these assumptions, Eqs. (58) and (60) allow us to model visibility-induced fluctuations of as zero-mean Gaussian noise with standard deviation
for the HOM interferometer and
for the MZ interferometer.
Additional fluctuations arise from finite sampling statistics and detector imperfections, and depend on the measurement apparatus and observation time. Assuming a photon emission rate of Hz, detector efficiency of
for typical SPAD detectors, dead time of
s, and dark count rate of
Hz per detector, preliminary simulations indicate that, for an observation time of 1 s, these effects can be approximated by zero-mean Gaussian noise with standard deviation
for the HOM interferometer and
for the MZ interferometer.
By assuming these two noise sources to be independent, the output pre-activation of a realistic QON can be modeled as a Gaussian random variable with mean value given by Eqs. (73)–(76) and standard deviation
for the HOM-based QONs, and
for the MZ-based QONs, for both and
.
In the remainder of this section, we experimentally evaluate the six pre-activation functions above—five quantum-inspired and one classical—using two types of classification tasks: (i) binary classification using a single QON, and (ii) multiclass classification using a full feed-forward network composed of QON layers. The goal is to investigate how the design of the pre-activation function affects learning dynamics, accuracy, convergence, and robustness across tasks of increasing complexity.
It is worth noting that all experiments presented in this study were conducted using software-based simulations of QONs implemented in PyTorch. Each quantum optical model was translated into a differentiable pre-activation function that faithfully reflects the mathematical formulation derived from its physical interpretation. However, no physical optical hardware was used: the aim was to evaluate the computational potential of these physically inspired neurons within standard learning frameworks.
5.1 Binary classification with a single neuron
To evaluate the computational effectiveness of different quantum-inspired pre-activations , we conducted controlled experiments using a deliberately simple setup: a single QON trained end-to-end on a binary classification task.
We used two well-known image recognition datasets: MNIST [50] and FashionMNIST [51], each restricted to a binary setting with class labels 0 and 1. For MNIST, this corresponds to handwritten digits 0 and 1, while for FashionMNIST, the labels refer to T-shirts/tops and trousers, respectively. Each grayscale image was flattened into a vector
and normalized to lie in
. This basic preprocessing was sufficient, as normalization specific to each QON variant was embedded in their respective formulations.
The QON receives the input and computes a scalar pre-activation
using a learnable weight vector
of matching dimension. For the classical neuron, we include the additive bias term. In contrast, we found that including the bias term in quantum-inspired pre-activations consistently degraded performance. We hypothesize that this behavior is primarily related to optimization difficulties rather than to a fundamental limitation of quantum optical neurons. QONs encode information through highly nonlinear pre-activation functions that often rely on normalized or bounded interference-based quantities, reflecting relative and coherent relationships between optical fields. In this context, the introduction of a purely linear and additive bias term may disrupt the intrinsic normalization underlying the interference process, effectively altering the geometry of the loss landscape. Such a mismatch can hinder gradient propagation, leading to less stable convergence and weaker generalization. Based on these empirical observations, we excluded the bias term from all QON variants and retained it only for the classical baseline, as this choice consistently improved training stability while preserving the physical interpretability of the optical analogy. Nevertheless, this aspect will be the subject of future investigation to develop a bias formulation that could further enhance QON performance.
All scalar pre-activations were followed by batch normalization [52] and a final sigmoid activation to output a probability. Batch normalization was consistently found to improve stability and convergence across all models.
We implemented six differentiable PyTorch modules, each corresponding to a different pre-activation function, as defined above. These include the classical inner product and five quantum-inspired functions derived from Mach–Zehnder or Hong–Ou–Mandel interferometry, using various encoding schemes (phase, amplitude, intensity). For MZ-based neurons, we tested both the vanilla cosine formulation and a phase-shifted variant using .
All models were trained using the Adam optimizer [53] with a learning rate of 0.01, batch size 64, and binary cross-entropy loss, for a total of 25 epochs. To assess the robustness of each pre-activation variant, every experiment was repeated five times independently under identical training conditions. All reported curves correspond to the mean across runs, while the shaded regions denote the standard deviation.
Under ideal conditions, most models converged rapidly and achieved excellent generalization on MNIST (Fig 3, top row), with final test accuracies close to or above 99%. MZ-based neurons (both cosine and phase-shifted) displayed the most stable training dynamics and the smallest variance bands. HOM-based neurons showed more heterogeneous behavior: the intensity-based encoding converged more slowly, while the amplitude-modulated HOM neuron consistently underperformed relative to the other QON variants.
Ideal (top) and non-ideal (bottom) scenarios are shown. Curves represent the mean over five runs; shaded bands denote the standard deviation.
In the non-ideal setting (Fig 3, bottom row), the impact of noise and decoherence became evident. MZ neurons remained comparatively robust, maintaining stable convergence profiles. HOM-based neurons were significantly more affected: phase and intensity variants showed increased variance and reduced accuracy, reflecting their greater sensitivity to perturbations in the underlying interference patterns.
The FashionMNIST results in Fig 4 confirm the same qualitative trends but with amplified differences among variants. In particular, HOM phase modulation exhibited sharp oscillations and larger uncertainty bands in the non-ideal scenario, whereas MZ neurons preserved stable generalization across runs.
Ideal (top) and non-ideal (bottom) scenarios are shown. Mean over five runs with standard deviation.
5.2 Multiclass classification with a layered neural network
To assess the applicability of quantum optical neurons in more complex scenarios, we extended our experiments to multiclass classification using a layered neural network.
Unlike the single-neuron setup, this architecture consists of three fully-connected QON layers: two hidden layers followed by an output layer. Each neuron in the network uses the same type of pre-activation function , drawn from the previously defined variants.
The input is a flattened grayscale image from either MNIST or FashionMNIST, normalized to
and passed through the QON stack. The first two layers contain 128 and 64 neurons, respectively, each followed by batch normalization and ReLU activation. The final layer outputs 10 unnormalized logits, one per class, which are converted into probabilities via the softmax function. The network is trained using the cross-entropy loss, appropriate for multiclass tasks.
All previously described QON variants were applied within this more advanced architecture. Based on earlier findings, the bias term was again excluded from non-classical variants, as its presence degraded performance. Batch normalization was retained in all layers for consistency and improved stability.
Training was conducted for 25 epochs using the Adam optimizer with a learning rate of 0.01 and a batch size of 64. No further preprocessing was applied beyond pixel normalization. As in the binary setting, each multiclass experiment was repeated five times independently, and all reported curves show the mean value with a shaded band indicating the standard deviation.
For MNIST, the classical network and the HOM amplitude-modulated variant achieved the strongest performance under ideal conditions (Fig 5, top row). MZ-based neurons achieved competitive accuracy with stable convergence, though they did not surpass the classical baseline. HOM phase and intensity variants again performed less consistently, with higher variance and less stable loss trajectories.
Training and test accuracy and loss under ideal (top) and non-ideal (bottom) conditions. Curves represent the mean over five runs with standard deviation.
Under non-ideal conditions (Fig 5, bottom row), the HOM amplitude-modulated variant remained the most robust QON variant, while MZ neurons degraded moderately but retained stability. HOM phase and intensity showed the most significant drop in performance.
FashionMNIST further accentuated these differences (Fig 6). The classical model achieved the best performance overall, followed by the amplitude-modulated HOM neuron. MZ neurons maintained reasonable accuracy but showed greater sensitivity to noise in the deeper architecture. HOM phase and intensity models again exhibited the weakest stability.
Training and test accuracy and loss under ideal (top) and non-ideal (bottom) conditions. Mean over five runs with standard deviation.
6 Conclusions
Building on the foundational work of Roncallo et al. [25], we extended the study of quantum optical neurons by investigating alternative implementations based on different photon modulation strategies and interferometric architectures. In particular, we introduced a Mach–Zehnder-based design that enables access to both the real and imaginary parts of the scalar product between modulated wave functions, thereby complementing the original formulation based on the simpler HOM interferometer and allowing for additional QON versions. From these physically motivated models, we derived a family of pre-activation functions and systematically evaluated their learning capabilities on a range of classification tasks.
Our experiments show that HOM amplitude-modulated and MZ phase-shifted neurons often achieve strong performance across both binary and multiclass image classification tasks, demonstrating competitive accuracy and stable convergence. By evaluating all models across multiple independent runs and under both ideal and non-ideal conditions, we obtained a comprehensive picture of their robustness and sensitivity to realistic imperfections. This unified analysis indicates that MZ-based neurons retain comparatively stable behavior even in the presence of noise, while HOM amplitude modulation remains competitive—particularly in deeper architectures. In contrast, phase- and intensity-modulated HOM-based variants exhibit reduced stability and a higher susceptibility to noise. The comparison with a classical neuron baseline further shows that several QON variants can approach classical performance while offering physically grounded operations that may translate into practical advantages in photonic implementations.
This study also presents several strengths and limitations that contextualize its findings. Among the strengths, our evaluation is comprehensive: all QON variants are assessed within a unified mathematical and experimental framework, across multiple runs, under both ideal and non-ideal conditions, and in direct comparison with a classical baseline. This provides statistically grounded and physically meaningful insights into the behavior of quantum-inspired optical neurons. However, the analysis is limited by its reliance on software simulations rather than physical implementations, and by the use of relatively shallow architectures and standard image benchmarks. While these controlled settings allow for clear, interpretable comparisons, they do not capture all hardware constraints or the broader range of workloads relevant to photonic accelerators. These limitations point toward natural extensions of this work, including the study of deeper QONNs, hardware-in-the-loop experimentation, and application-specific tasks where optical systems may offer distinctive advantages.
Looking ahead, future work will investigate deeper QON architectures, hybrid optical–electronic designs, and potential hardware realizations. These research directions could pave the way toward integrating QONs as fundamental building blocks for the next generation of energy-efficient and physically grounded AI systems.
Acknowledgments
The authors would like to thank Giovanni Tempesta and Michele Ciro Di Carlo for their valuable support during the preliminary stages of this work. The authors also acknowledge the use of Grammarly for assisting in the revision of the manuscript’s language and style.
References
- 1. Ahn J, Fiorentino M, Beausoleil RG, Binkert N, Davis A, Fattal D, et al. Devices and architectures for photonic chip-scale integration. Appl Phys A. 2009;95(4):989–97.
- 2. Caulfield HJ, Dolev S. Why future supercomputing requires optics. Nature Photon. 2010;4(5):261–3.
- 3. Zhang H, Gu M, Jiang XD, Thompson J, Cai H, Paesani S, et al. An optical neural chip for implementing complex-valued neural network. Nat Commun. 2021;12(1):457. pmid:33469031
- 4. Lin X, Rivenson Y, Yardimci NT, Veli M, Luo Y, Jarrahi M, et al. All-optical machine learning using diffractive deep neural networks. Science. 2018;361(6406):1004–8. pmid:30049787
- 5. Zuo Y, Li B, Zhao Y, Jiang Y, Chen Y-C, Chen P, et al. All-optical neural network with nonlinear activation functions. Optica. 2019;6(9):1132.
- 6. Sui X, Wu Q, Liu J, Chen Q, Gu G. A Review of Optical Neural Networks. IEEE Access. 2020;8:70773–83.
- 7.
Anthony M. Discrete Mathematics of Neural Networks. Society for Industrial and Applied Mathematics; 2001.
- 8. Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S. Quantum machine learning. Nature. 2017;549(7671):195–202. pmid:28905917
- 9. Lloyd S, Garnerone S, Zanardi P. Quantum algorithms for topological and geometric analysis of data. Nat Commun. 2016;7:10138. pmid:26806491
- 10. Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, et al. Supervised learning with quantum-enhanced feature spaces. Nature. 2019;567(7747):209–12. pmid:30867609
- 11. Ortolano G, Napoli C, Harney C, Pirandola S, Leonetti G, Boucher P. Quantum-Enhanced Pattern Recognition. Phys Rev Appl. 2023;20:024072.
- 12. Bary G, Ru P, Zhang W-N. Normalized multi-pion Hanbury–Brown–Twiss correlation functions of pion-emitting sources with Bose–Einstein condensation. J Phys G: Nucl Part Phys. 2019;46(11):115107.
- 13. Mohseni M, Vasyuchka VI, L’vov VS, Serga AA, Hillebrands B. Classical analog of qubit logic based on a magnon Bose–Einstein condensate. Commun Phys. 2022;5(1).
- 14. Kusumoto T, Mitarai K, Fujii K, Kitagawa M, Negoro M. Experimental quantum kernel trick with nuclear spins in a solid. npj Quantum Inf. 2021;7(1).
- 15. Cai XD, Wu D, Su ZE, Chen MC, Wang XL, Li LL. Entanglement-Based Machine Learning on a Quantum Computer. Phys Rev Lett. 2015;114:110504.
- 16. Hong C, Ou Z, Mandel L. Measurement of subpicosecond time intervals between two photons by interference. Phys Rev Lett. 1987;59(18):2044–6. pmid:10035403
- 17. Bouchard F, Sit A, Zhang Y, Fickler R, Miatto FM, Yao Y, et al. Two-photon interference: the Hong-Ou-Mandel effect. Rep Prog Phys. 2021;84(1):012402. pmid:33232945
- 18. Pittman T, Strekalov D, Migdall A, Rubin M, Sergienko A, Shih Y. Can Two-Photon Interference be Considered the Interference of Two Photons?. Phys Rev Lett. 1996;77(10):1917–20. pmid:10061812
- 19. Branning D, Migdall AL, Sergienko AV. Simultaneous measurement of group and phase delay between two photons. Phys Rev A. 2000;62(6).
- 20. Lyons A, Knee GC, Bolduc E, Roger T, Leach J, Gauger EM, et al. Attosecond-resolution Hong-Ou-Mandel interferometry. Sci Adv. 2018;4(5):eaap9416. pmid:29736414
- 21. Sgobba F, Andrisani A, Dello Russo S, Siciliani de Cumis M, Santamaria Amato L. Attosecond-Level Delay Sensing via Temporal Quantum Erasing. Sensors (Basel). 2023;23(18):7758. pmid:37765818
- 22. Sgobba F, Andrisani A, Santamaria Amato L. Photon Phase Delay Sensing with Sub-Attosecond Uncertainty. Sensors. 2024;24(7).
- 23. Triggiani D, Tamma V. Estimation with Ultimate Quantum Precision of the Transverse Displacement between Two Photons via Two-Photon Interference Sampling Measurements. Phys Rev Lett. 2024;132(18):180802. pmid:38759164
- 24. Bowie C, Shrapnel S, Kewming MJ. Quantum kernel evaluation via Hong–Ou–Mandel interference. Quantum Sci Technol. 2023;9(1):015001.
- 25. Roncallo S, Morgillo AR, Macchiavello C, Maccone L, Lloyd S. Quantum optical classifier with superexponential speedup. Commun Phys. 2025;8(1).
- 26. Zehnder L. Ein neuer Interferenzrefraktor. Z Instrumentenkd. 1891;11.
- 27. Mach L. Ueber einen Interferenzrefraktor. Z Instrumentenkd. 1892;12.
- 28. Zhang J, Lu ZH, Wang LJ. Precision refractive index measurements of air, N2, O2, Ar, and CO2 with a frequency comb. Appl Opt. 2008;47(17):3143–51. pmid:18545287
- 29. Scott H, Branford D, Westerberg N, Leach J, Gauger EM. Noise limits on two-photon interferometric sensing. Phys Rev A. 2021;104:053704.
- 30. Qi F, Wang Z, Xu W, Chen XW, Li ZY. Towards simultaneous observation of path and interference of a single photon in a modified Mach-Zehnder interferometer. Photon Res. 2020;8(4):622–9.
- 31. Kim H, Kwon O, Moon HS. Two-photon interferences of weak coherent lights. Sci Rep. 2021;11(1):20555. pmid:34654865
- 32. Doughan I, Oyemakinwa K, Ovaskainen O, Roussey M. Strip-loaded Mach-Zehnder interferometer for absolute refractive index sensing. Sci Rep. 2024;14(1):3064. pmid:38321087
- 33. Kim D, Kim M, Moon HS. Which-path information of path-polarization hybrid state in polarization-based nested Mach-Zehnder interferometer. Sci Rep. 2025;15(1):12664. pmid:40221438
- 34. Hofmann T, Schölkopf B, Smola AJ. Kernel methods in machine learning. Ann Statist. 2008;36(3).
- 35. Wang K, Xiao L, Yi W, Ran SJ, Xue P. Experimental realization of a quantum image classifier via tensor-network-based machine learning. Photon Res. 2021;9(12):2332–40.
- 36.
Zurek WH. Information Transfer in Quantum Measurements: Irreversibility and Amplification. In: Meystre P, Scully MO, editors. Information Transfer in Quantum Measurements: Irreversibility and Amplification. Boston (MA): Springer US; 1983. p. 87–116.
- 37. Zurek WH. Decoherence, einselection, and the quantum origins of the classical. Rev Mod Phys. 2003;75:715–75.
- 38.
Schlosshauer M. Decoherence, and the quantum-to-classical transition. The frontiers collection, editor. Berlin, Heidelberg: Springer; 2007.
- 39. Brune M, Hagley E, Dreyer J, Maître X, Maali A, Wunderlich C, et al. Observing the Progressive Decoherence of the “Meter” in a Quantum Measurement. Phys Rev Lett. 1996;77(24):4887–90. pmid:10062660
- 40. Sonnentag P, Hasselbach F. Measurement of decoherence of electron waves and visualization of the quantum-classical transition. Phys Rev Lett. 2007;98(20):200402. pmid:17677675
- 41. D’Auria V, Lee N, Amri T, Fabre C, Laurat J. Quantum decoherence of single-photon counters. Phys Rev Lett. 2011;107(5):050504. pmid:21867054
- 42. Vepsäläinen AP, Karamlou AH, Orrell JL, Dogra AS, Loer B, Vasconcelos F, et al. Impact of ionizing radiation on superconducting qubit coherence. Nature. 2020;584(7822):551–6. pmid:32848227
- 43. Biswas T, García Díaz M, Winter A. Interferometric visibility and coherence. Proc R Soc A. 2017;473(2203):20170170.
- 44.
Harriman J, Serati S, Stockley J. Comparison of transmissive and reflective spatial light modulators for optical manipulation applications. In: Dholakia K, Spalding GC, editors. Optical Trapping and Optical Micromanipulation II. vol. 5930. International Society for Optics and Photonics. SPIE; 2005. p. 59302D.
- 45. Jönsson M, Björk G. Evaluating the performance of photon-number-resolving detectors. Phys Rev A. 2019;99:043822.
- 46. Natarajan CM, Tanner MG, Hadfield RH. Superconducting nanowire single-photon detectors: physics and applications. Supercond Sci Technol. 2012;25(6):063001.
- 47. Acerbi F, Gundacker S. Understanding and simulating SiPMs. Nucl Instrum Methods Phys Res A. 2019;926:16–35.
- 48. Tsujimoto Y, Sugiura Y, Tanaka M, Ikuta R, Miki S, Yamashita T, et al. High visibility Hong-Ou-Mandel interference via a time-resolved coincidence measurement. Opt Express. 2017;25(11):12069–80. pmid:28786565
- 49.
Kim TS, Kim HO, Ko JH, Park GD. Two-Photon Interference Experiment in a Mach-Zehnder Interferometer. J Opt Soc Korea. 2003;7(2):113–8.
- 50.
LeCun Y, Cortes C, Burges C. MNIST handwritten digit database. ATT Labs [Online] Available: http://yannlecuncom/exdb/mnist. 2010;2.
- 51. Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR. 2017.
- 52.
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. pmlr; 2015. p. 448–456.
- 53.
Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014.