Deep neural networks to recover unknown physical parameters from oscillating time series

Deep neural networks are widely used in pattern-recognition tasks for which a human-comprehensible, quantitative description of the data-generating process, cannot be obtained. While doing so, neural networks often produce an abstract (entangled and non-interpretable) representation of the data-generating process. This may be one of the reasons why neural networks are not yet used extensively in physics-experiment signal processing: physicists generally require their analyses to yield quantitative information about the system they study. In this article we use a deep neural network to disentangle components of oscillating time series. To this aim, we design and train the neural network on synthetic oscillating time series to perform two tasks: a regression of the signal latent parameters and signal denoising by an Autoencoder-like architecture. We show that the regression and denoising performance is similar to those of least-square curve fittings with true latent-parameters initial guesses, in spite of the neural network needing no initial guesses at all. We then explore various applications in which we believe our architecture could prove useful for time-series processing, when prior knowledge is incomplete. As an example, we employ the neural network as a preprocessing tool to inform the least-square fits when initial guesses are unknown. Moreover, we show that the regression can be performed on some latent parameters, while ignoring the existence of others. Because the Autoencoder needs no prior information about the physical model, the remaining unknown latent parameters can still be captured, thus making use of partial prior knowledge, while leaving space for data exploration and discoveries.


INTRODUCTION
Deep neural networks (DNNs) have been successfully used in a wide variety of tasks, such as regression, classification (e.g, in image or speech recognition [1,2]), and time-series analysis. They are known for being able to construct useful higher-level features from lower-level features in many applications, however, these feature representations frequently remain incomprehensible to humans. This property is one of the reasons why DNNs are not more widely used in physics, in which the approach to data exploration is usually drastically different.
Most systems studied in physics are well described by physicals models, generally referred to as equations of motion. The experimental data are analysed with respect to a particular model. When doing so, the equations of motion are analytically or numerically solved, yielding a theoretical description of the data-generating process. The resulting model generally includes a set of mathematical variables that can be adjusted to span the data. The true values of these variables are generally unknown and must be recovered. For that reason, we refer to them as latent parameters. The true latent parameters are approximated by comparing the data to the model, typically by fitting the model to the data.
With this in mind, the ability of DNNs to find abstract representations of the data features rather than a quantitative generating process is generally seen as a limitation rather than an advantage by physicists. For that reason, DNNs are still often viewed as black boxes in physics and started to be used in the field only in recent years [3].
We find this to be a missed opportunity for the physics community. With physical models at hand, one can generate arbitrarily large volumes of synthetic data to train the DNNs, and later process real-world signals [4]. This circumvents many challenges of supervised learning during which DNNs are trained with data for which the true latent parameters (labeled data) need to be known. Making full use of this possibility, DNNs were recently trained on synthetic nuclear magnetic resonance (NMR) spectroscopic data, simulated by accurate physical models [5]. The large amount of labeled data generated this way enables convergence of the DNN, which is then used to process real NMR data with great accuracy. A similar approach has become popular in robotics and autonomous driving.
Moreover, extensive work was done in order to disentangle and make sense of DNN representations. A notable example is that of the β-variational autoencoder architecture [6]. Correlation loss penalties can also be used arXiv:2101.03850v1 [cs.LG] 11 Jan 2021 during DNN training, without prior knowledge of the data-generating process [7,8]. These methods consist of penalizing the DNN if its feature representation becomes entangled during training. While doing so, the DNN is encouraged to produce an efficient or disentangled feature representation. While disentangled, the representations achieved through these methods are not readily interpretable and usually require further analysis.
Nonetheless, DNNs are being increasingly used in physics data processing, in particular for signal classification -during which unusual datasets are flagged for further analysis. It was shown that Autoencoders can effectively be trained on Large Hadron Collider particlejet data to detect events or anomalies [9]. In this instance, the DNN is successfully able to increase the events' signal-to-noise ratio by a factor 6. Other searches in high-energy physics, including [10,11], have recently been performed also with the aim of detecting data displacement from a null-hypothesis (no anomalies). All these searches seek to perform data analyses in a modelindependent setting, that is, with minimal prior information or bias. More recently, DNNs have been applied to time-series processing in nano-NMR [12]. In nano-NMR settings the noise model is complex and noise overpowers the weak signals, rendering standard data analyses inefficient. The DNN was tasked to classify signals (i.e. discriminating two frequencies) and outperformed full-Bayesian methods.
While often achieving great successes, to our knowledge most applications of DNNs in physics are geared toward classification problems. In addition, DNNs are still rarely employed for time-series analyses, although they are the most common form of data acquired during physics experiments. In this article, we propose to use a DNN to disentangle components of monochromatic, amplitude-and frequency-modulated sine waves (AM/FM-sine waves respectively), arguably the most prevalent forms of time-domain signals in physics. The method yields similar performance as more standard analyses such as least-square curve fittings (LS-fits), during which the data-generating process is assumed to be known and a least-squares regression is performed to predict the signal's latent parameters.
LS-fits, however, require the user to input latentparameters initial guesses prior to regression. These initial guesses are the prior estimation of the true latent parameters and provide a starting point for the LS-fit gradient descent. The trained DNN however, needs no initial guesses, thus requiring less prior information about the data-generating process. Indeed, we show that, precisely because DNNs find abstract data representations, they can be used in settings when prior knowledge exists, but is not complete, as it is particularly the case in "new-physics" searches [13], thus leaving space for data exploration and discoveries.
The first part of this article describes the synthetic data that we generate and use throughout this work, i.e. monochromatic, AM-and FM-sine waves time series, and their relevance to real-world physics experiments. We then describe our DNN architecture, which incorporates two tasks: A Regressor [23] DNN performs a regression of the signal's latent parameters that are known to be present in the data-generating process. In addition, an Autoencoder [14] denoises the signals by learning an approximation of the unknown latent parameters. As a benchmarking method, we evaluate the DNN by comparing its performance to an LS-fit with true initial guesses. We later employ the DNN in realistic settings, when prior knowledge about the data-generating process is incomplete: LS-fit fidelity is typically highly sensitive to initial guesses, thus requiring the user to perform preprocessing work or to possess prior information in order to perform optimally. As a first application, we show that the DNN can be used to predict initial guesses for the model fit evaluation. While consistently converging to optimal solutions, the technique circumvents the usual difficulties arising from fitting signals, such as the need for initial-guesses exploration.
Next, we show that the DNN can be used when the user ignores if the time-series are monochromatic-, AMor FM-sine waves, but still wishes to recover their main frequency component. In such settings, the user is generally required to repeat the analysis by exploring the space of data-generating processes and initial guesses. Using our architecture enables the user to input only the known information when performing the analysis. That is, the Regressor is tasked to recover the user-expected latent parameters while ignoring the existence of others. Because the Autoencoder needs no prior information, it is still able to capture unknown information.

Data description and generation procedure
The time series studied throughout the article are exponentially decaying monochromatic, FM-and AM-sine waves. Gaussian noise is linearly added to the pure signals. An example of FM-signal is shown in Fig.1 (top) alongside its sub-components (decaying carrier, frequency-modulation signal, and noise).
Decaying monochromatic-sine waves appear and are prevalent in all fields of physics. They arise from solving the equations of motion of the two-level quantum system, or of the classical harmonic oscillator; to which a multitude of other physical systems can be mathematically reduced to. Notorious examples include the spin-1/2 particle in a DC magnetic field, the orbital motion of planets, or RLC circuits. In information theory, the two-level quantum system also provides a complete description of the qbit. Frequency and amplitude modulation generally arise from external factors such as oscillating magnetic or electric fields applied by the experimenters. Amplitude and frequency modulation of a carrier frequency are also the most common scheme of information communication links. Some form of Gaussian noise, while not necessarily always dominant, is in general present in every real-world signal. The statistical Gaussian noise formalism provides an accurate description of electronic thermal-noise, quantum shot noise, black-body radiation, and of White noise in general.
All time series used throughout the article are 512 s long, sampled once per second. The latent parameters used to generate the monochromatic sinewaves are the carrier frequency, F c and phase φ, in addition to the coherence time τ . The AM-and FM-sinewaves are generated by adding a modulation function to the carrier. The modulation function's latent parameters are the modula-tion frequency and amplitude, F m and I m , respectively. Noise is linearly added to the pure signals by sampling the Gaussian distribution with zero mean and standard deviation σ. The carrier amplitude is normalized to 1 such that the signal-to-noise ratio is solely given by σ. The mathematical descriptions of the monochromatic, AMand FM-sine waves are given in the Supplementary Materials.
Before each sample generation, the latent parameters are randomly and uniformly sampled within their respective allowed range, also given in the Supplementary Materials. The range of F c ensures the carrier frequency remains well within the Fourier and Nqyist limits. The modulation amplitude range ensures the majority of the signal's power remains in its first sidebands and carrier.
Despite requiring only 6 latent parameters to generate the samples, these ranges enable a wide scope of functions to be realized. AM/FM-signals with minimum I m reduce to decaying monochromatic-sine waves and reach 100% modulation with maximum I m . The coherence time range is wide enough to span underdamped signals up to virtually non-decaying signals. These latent parameter ranges are wide enough such that they would encompass many foreseeable real-world signals. A random selection of FM-signals with and without noise is shown in Fig. 1 (bottom), illustrating the richness of the data in a more qualitative manner.
The choice of studying monochromatic, AM-, and FMsine waves is not only motivated by their richness and prevalence in real-world physics experiments. Indeed, despite originating from different physical models and having different mathematical descriptions, the time series share similar visual features. As a result, within some range of parameters, even expert users could mistake the three generating processes. This is especially the case for weak modulations in the presence of noise, for which visual discrimination in time-or frequency-domain (inspecting the spectrum) may be impossible. For all the reasons cited above, monochromatic-, AM-or FM-sine waves appear as good representative signals on which to perform our study. Nevertheless, the methods presented in this article can be applied to other types of signals as well.
Most DNN implementations generally require input and target data to be normalized such as to avoid exploding and vanishing gradients during training [15,16]. All signals and latent parameters are normalized to lie within the 0-to-1 range prior to the application of the DNN. The phase φ is mapped to two separate parameters, φ :→ sin(φ)+1 2 ; cos(φ)+1 2 , such as to account for phase periodicity during loss computation, while keeping both targets properly normalized. All other latent parameters are normalized using their respective range.

Deep neural network architecture
The latent-parameters regression and signal denoising are performed by two separate architectures described (in Python code) in the Supplementary Material.
Denoising is performed by an Autoencoder architecture [14] composed of an Encoder followed by a Decoder. Noisy signals are first passed through the Encoder. The Encoder output layer has 64 neurons and thus produces a compressed representation of the input signal. Following this step, the Encoder output is passed through the Decoder, which decompresses the signal to its original size. This type of [Encoder-Decoder] architecture, called autoencoder, is widely used, inter alia, for data denoising [17]. As the Encoder output dimension is smaller than the dimension of the input data, the Encoder's output layer acts as an information bottleneck, or more specifically dimensionality reduction, thus encouraging the network to capture relevant latent features while discarding noise or redundant information [14].
Latent-parameters regression is also performed while passing the data through the Encoder. The Encoder output is then passed through a third DNN referred to as the Regressor. The output dimension of the Regressor is adjusted to the number of latent parameters that the Regressor is tasked to detect.
The Autoencoder and Regressor are trained on identical sets of samples. The Regressor's target data consists of the latent parameters, and the Decoder target data are the noiseless signals. For both, the loss function is the mean squared error (MSE). The optimized architectures, shown in the Supplementary Materials, achieve sufficient performance, while keeping the number of trainable parameters under 1 million, such as to be able to perform training on a modern laptop GPU under 12 hours for a typical training session of 12 training sets of 100 000 samples, over 17 epochs. Due to the number and characteristics of the instances, asymptotic loss is reached within a small number of epochs. In general, increasing the number of instances of the training set was more beneficial than increasing the number of epochs.
After refining the base Encoder, Regressor and Decoder, we unify the three architectures into a single DNN such that the Regressor and Decoder share the same Encoder. We find that unification is best achieved by merging them into a single DNN as depicted in Fig. 2. The Encoder output is passed through the Regressor, which predicts the signal's latent parameters. The Decoder input consists of a concatenation of the Regressor and Encoder outputs. The latent parameter regression and signal-denoising losses are computed simultaneously (M SE reg and M SE dec , respectively). The loss used during backpropagation is computed as a weighted sum of M SE reg and M SE dec as   (1) where the hyperparameter β is the bias adjustment between the two tasks.
This architecture presents the advantage of enabling bias control via a unique hyperparameter. Moreover, both networks are naturally trained at the same time rather than alternatingly, thus accelerating training ap-proximately two-fold and enabling high-momentum gradiant optimizers. To illustrate the effect of the bias parameter, we train the unified DNN on identical FM-sine waves datasets with varying values of β. For this experiment, training is performed using 12 training sets of 100 000 randomly generated samples for 10 epochs. Because the number of synthetic samples is large and the latent parameters are continuous random variables, overfitting (controlled by a validation set, unseen during training) was never an issue.
The performance of the trained DNN is evaluated using a test set of 100 000 randomly generated FM-samples, which were unseen during training. Figure 3 shows the test-sample losses for the denoising (top) and regression (bottom) tasks after training. Setting β = 0 fully biases training towards the denoising tasks, which achieves best performance, while the parameter regression yields the worst results; vice versa for β = 1. This behaviour is also observed in Fig. S2 in the Supplementary Materials, which shows the validation-losses during training. The training curves show that extremum values of β prevents validation-loss improvement of the negativelybiased task. Middle-range values enable both tasks to be learned simultaneously.
We find that the best values of β are those for which the initial β-weighted regression and denoising losses are within the same order of magnitude. As a result, determining a good value for β is a trivial task: A sin-gle forward pass is performed to obtain the initial values of M SE reg and M SE dec . We then compute β such that βM SE reg ≈ (1 − β)M SE dec . Regardless of the type of data (monochromatic-, AM-and FM-samples), DNNs trained with β = 0.001 achieve good overall performance (lowest weighted total loss) and little bias towards any of the tasks. This value of β is employed throughout the entire article. For all that follows, training is always performed using 12 training sets of 100 000 randomly generated samples for 17 epochs. This training is always enough to reach asymptotic loss, while exhibiting no noticeable overfitting. Training can be performed on decaying monochromatic-, AM-, FM-sine waves or a combination of all three processes.
To illustrate the architecture's output, we train the DNN on AM-sine waves and show an example of a prediction in Fig. 4 alongside the noisy input signal. The DNN outputs a denoised prediction of the noisy AM-sine wave and a prediction of the latent parameters used to generate the signal.
FIG. 4: Example of DNN prediction using a noisy AM-sine wave input. The DNN was trained only on AM-sine waves samples. Top: Noisy input (blue) and Decoder denoised prediction (orange). Bottom: True latent parameter targets (blue) and Regressor prediction (orange). The phase φ is mapped to two separate latent variables to accommodate for phase periodicity during loss computation.

Performance evaluation
As a first evaluation method, we train the DNN on a random selection of decaying monochromatic sine waves (no modulation). The training, validation, and test samples are generated using random frequency, phase, coherence time, and noise levels. After training, we evaluate the DNN performance by comparing its prediction error to an LS-fit using the Python Scipy library. When performing the LS-fit, the input data is the noisy signals and the objective function is with respect to the noiseless data-generating process. The LS-fit then produces predictions of the true latent parameters. To this end, the LS-fit requires latentparameters initial guesses to start the gradient descent. The initial guesses used here are the true latent parameters (i.e. true frequency, phase, and coherence time). After gradient descent, the LS-fit outputs an estimation of the latent parameters, from which we generate a prediction of the noiseless signals. This is done by inputting the LS-fit latent-parameters predictions in the data-generating process. The LS-fit and DNN performance are then compared in two ways: (i) the latentparameters regression loss is the MSE from the true latent parameters for both the LS-fit and DNN (M SE reg ), and (ii) the denoising error is the MSE from the true noiseless signals for both the LS-fit and DNN (M SE dec ). Note that this comparison drastically favors the LS-fit, which then constitutes a good benchmark method. Indeed, in any practical applications the true value of the latent parameters are hidden from the user, and LS-fits are employed precisely to approximate them.
A random selection of 300 noisy signals from the test set is processed using this method. Figure 5  A similar evaluation is performed using AM-samples. In this experiment, the DNN is specifically trained on AM-samples. The test samples are all generated using identical latent parameters, while the noise level is increased. Examples of such samples are given in Fig. S1 in the Supplementary Materials. Figure 6 shows the prediction errors of all samples, for both the DNN and LS-fit sorted by noise levels.
For both monochromatic and AM-signals, the DNN performs generally worse than the LS-fit for low-noise signals. However, the DNN reaches LS-fit performancelevel once the noise reaches the top half of the allowed range (corresponding to a noise level σ ∈ [1, 2] before normalization), while requiring no initial guesses. The latent-parameters regression follows a similar trend. We note that, in general, DNN outputs are less sensitive to noise, and the performance is more consistent throughout both datasets.
These results show that our architecture is a good alternative to LS-fits for time-series analysis, as it reaches acceptable performance when benchmarked to standard LS-fits with true guesses, while needing no initial guesses.

DNN-assisted LS-fit
We now wish to apply our DNN in more realistic settings. Fitting oscillating time series using LS-fits is notoriously difficult because the MSE is in general a nonconvex function of the latent parameters and possesses numerous local minima. Consequently, the quality of the LS-fit is highly dependent on the initial guesses in addition to the noise. In the previous experiments, LSfits were only performed as a benchmark method, and the initial guesses were the true latent parameters. In any real-world setting, the user must perform some preprocessing work or use prior information to find initial guesses leading to the global minima. In this section, we propose to employ the DNN as a preprocessing tool to assist LS-fit in the situation when the user possesses no prior information about the initial guesses and wishes to recover the signal's latent parameters. The sine wave samples from the previous experiment are fitted while using the DNN latent predictions as initial guesses. Results of this experiment are shown in Fig. 7 alongside LS-fits with true initial guesses results.
Because the DNN predictions are always within the venicity of the true parameters, almost all DNN-assisted LS-fits converge to optimal solutions. In settings when the initial guesses are unknown or samples are numerous, the user can initially train the DNN on synthetic data and use it for DNN-assisted fits. As the latter performs optimally regardless of the noise level, this enables fast and accurate analysis of large datasets by removing the need for initial guesses exploration.

Partial information regression and denoising
In the experiments presented above, the datagenerating process was assumed to be fully known by the user. The DNN or DNN-assisted LS-fits were employed to recover the signal latent parameters and denoise the signal. We now wish to explore the possibility of employing the DNN in a situation where the datagenerating processes to be explored are multi-fold and guesses must be done. This is typically the case in "newphysics searches" experiments [13], during which hypothetical and undiscovered particles may cause signals deviating from the null-hypothesis (i.e. no new particles). As the hypothetical particles are numerous, they may have many potential effects on the signals. We take the situation in which a potential external source could modulate a carrier signal produced by the experiment, as it is sometimes the case for bosonic dark-matter [18].
Specifically, we study the case in which the end-user is aware of the existence of an oscillation in the signal provided by the experimental setup. The user ignores if the signal is monochromatic, amplitude or frequency modulated. Nonetheless, the user wishes to recover the frequency, phase, and coherence time of the expected oscillation.
In this situation, the typical approach is to test all allowed processes by varying the LS-fits objective functions and explore the space of initial guesses for each process. This approach presents a new set of challenges, as this exploration is time consuming and sometimes unrealistic, if the data is too large or if too many processes are to be tested. Moreover, in some situations, all guesses can be wrong.
We show that it is possible to perform the regression and denoising with partial prior information about the physical process producing the data. That is, the DNN FIG. 8: Performance comparison of the specialized DNN (trained on AM-sine waves, tasked to denoise signals and recover all latent parameters of AM-sine waves) and of the partial DNN (trained on monochromatic, AM-and FM-sine waves, tasked to denoise signals and recover the carrier frequency, phase, coherence time and noise level only). Top: Example of noisy input AM-signal (blue), alongside specializedand partial-DNN denoised and latent predictions. Bottom: Individual latent parameters and signal denoising root mean squared error (RMSE), averaged over the whole AM-sinewave test set (100 000 samples) for both DNNs.
is tasked to perform the regression only on the narrow set of latent parameters that exist across all models: frequency, phase, coherence time, and noise level. However, the DNN ignores any form of modulation. This is done by decreasing the number of neurons in the Regressor's output layer. The DNN is then trained on signals from every explored model (monochromatic, AM and FM). We now refer to this DNN as the partial DNN (ignoring the existence of particular modulation type).
After training, we compare the performance of the partial DNN to a specialized DNN, trained specifically on AM signals, which performs a regression of all latent parameters. Figure 8 shows the M SE reg and M SE dec av-eraged over the AM-sine wave test set (100 000 samples) for both the AM-specialized DNN and the partial DNN. The denoising task reaches the same level of precision to that of the specialized DNN. Moreover, the estimation of the carrier frequency, phase, and coherence time reaches similar performance to that of the specialized DNN.
Using this method, the user's prior information is encoded into the Regressor architecture and training data. The Regressor then captures the expected latent parameters, thus removing the need to iteratively explore models. The Encoder and Decoder remain unchanged and are still able to capture unknown latent parameters by reproducing noiseless signals. This method enables partial prior information to be employed, while leaving space for signal exploration and unexpected discoveries.

CONCLUSION & OUTLOOK
We have presented an efficient DNN that combines the denoising of times series and regression of their latent parameters. The DNN was trained and evaluated on synthetic monochromatic, frequency-and amplitudemodulated decaying sine waves with Gaussian noise; some of the most prevalent forms of signals acquired in physics.
For high-noise signals, the DNN reaches same levels of precision as an LS-fit with true initial guesses, in spite of the DNN needing no guesses at all. In addition, the architecture requires no hyperparameter fine tuning to perform consistently. Moreover, because large volumes of synthetic training data can be generated, the DNN is quickly adaptable to a broad range of physical signals. This makes our architecture a good alternative to LS-fits for analysing large volumes of data, when fitting individual signals requires too much computation or user time.
The DNN architecture is flexible and can accommodate for various levels of user prior information. First, the DNN was used to assist LS-fits and predict initial guesses, unknown by the user. In this situation, DNNassisted LS-fits consistently converge to the optimal solutions. Moreover, the regression task can be adapted to accommodate for partial prior information about the data-generating process. The known latent parameters are encoded in the Regressor and training data, while the Decoder helps the Encoder to still capture unknown signal features, thus leaving space for data exploration and discoveries.
Because training is done on arbitrarily large volumes of synthetic data, raw performance could be improved by increasing the number of trainable parameters such as adding more layers or neurons, without too much concern for overfitting. The architecture itself could be augmented by adding an upstream classifier DNN-module, which could identify the type of signals being analyzed. Classified signals could then be processed via specialized versions of our architecture, trained on the corresponding type of signals.
Time-domain oscillations generally appear as peaks or peak multiplets in frequency-domain spectra. Frequency, amplitude, and phase information is then localized to narrow regions of the spectral data. For that reason, we believe further improvements could be attained by making use of frequency-domain information. We suggest to use Fourier transforms or power spectra as DNN inputs, in addition to the raw time series.
The proposed DNN architecture can be used to detect and approximate hidden features in time series data. The Regressor outputs a prediction of prior known parameters, but real signals could still contain unknown latent variables. These hidden latent variables can be detected and approximated by our DNN, as it also incorporates an Autoencoder-like structure. As such, the bottleneck layer contains a feature representation of the time series, used by the Decoder to recreate the original signal. This bottleneck layer will be further investigated, in order to detect and specify hidden latent parameters.
We remain aware that in physics data analysis, a sole estimation of latent parameters often provides insufficient information. Standard analysis usually requires a quantitative estimation of the prediction uncertainty, often represented as error bars or confidence intervals. In LSfits, this uncertainty is naturally obtained by maximizing the fit likelihood under the assumption of Gaussian distributed latent variables [19]. Despite extensive efforts, DNNs still lack the capacity for reliable uncertainty evaluation [20][21][22] and more work needs to be performed in this area to further generalize DNN usage in physics signal processing.
Nonetheless, we believe this architecture is readily applicable to existing physics experiments, in particular bosonic dark-matter searches, in which large quantities of data are to be analyzed with partial prior information. A.G. acknowledges funding from the Emergent AI Center funded by the Carl-Zeiss-Stiftung. Competing interests: All authors have read and contributed to the final form of the manuscript and declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions of this article are present in the article and/or the Supplementary Materials. Additional data and source code related to this article may be requested from the authors.

Data generation procedure
The time series used throughout the article are generated by propagating the time, t, from 0 to 511 (with length T = 512) in 1 s increments, and using the following formula: where F c and φ are the sine wave carrier frequency and phase, respectively. F m and I m are the modulation frequency and amplitude. The noise N σ (t) is sampled from the Gaussian distribution with zero mean and standard deviation σ. J 0 and J 1 are the first and second Bessel functions of the first kind, respectively. Before each sample generation, the latent parameters are randomly and uniformly sampled within the following ranges: