Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

High-resolution raindrop counting via instantaneous frequency sensing on hydrophobic elastic membranes

  • Rytis Paškauskas

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    rytis.paskauskas@ictp.it

    Affiliation Science, Technology and Innovation Unit, The Abdus Salam International Centre for Theoretical Physics (ICTP), Trieste, Italy

Abstract

In this paper, we introduce a novel approach that paves the way for the creation of affordable, high-precision rainfall sensors utilizing microphone data. The cornerstone of this methodology is an innovative algorithm capable of converting audio recordings into distinctive features, which are subsequently processed by a compact machine learning model. Our findings demonstrate that this technique can attain a temporal resolution of 10 milliseconds with an accuracy of 80%, underscoring its potential to overcome the limitations imposed by the necessity for power infrastructure and specialized expertise in traditional rain sensing methods.

1 Introduction

A disdrometer is a precision instrument that counts and measures drop sizes of natural rain [1, 2]. An acoustic disdrometer works by converting the mechanical energy of an impact on a surface to an equivalent electrical pulse. The well-known shortcomings of a passive measurement method were summarized already in a seminal work introducing the industry standard Joss-Waldvogel Disdrometer (JWD) [1]: “The amplitude of the electrical pulses depends on where the raindrop falls on the membrane; The temporal resolution of approx. 30 ms is insufficient, …and splashes from large drops are counted as small drops (translated from the original German)”. An important technological breakthrough of JWD improved the temporal resolution by using electromechanical feedback loop to compensate the force of impact [2]. However, the high cost, infrastructure requirements (such as an AC source), and the need for technical expertise can be barriers for its widespread use.

Simpler, less expensive, piezoelectric transducer-based rainfall measurement instruments rely on amplitude thresholding [3, 4]: a time window where the response exceeds a certain magnitude is ‘locked’ for calculations, during which the subsequent drops are either not counted (locked out) or incorrectly counted [2]. Linear relaxation broadens the acoustic response of a physical impact, limiting the temporal resolution. To reduce the lock-out window, materials with large attenuation coefficients are used (plastic [4], aluminium [5], stainless steel [6], glass [7]). Unfortunately this also reduces sensitivity and may lead to small droplet under-counting [2, 8, 9].

Although potentially more sensitive, elastic materials have not been considered due to their noisy acoustic response (the top pane of Fig 1 illustrates the case in point using a HDPE membrane), material degradation, high risk of damage, and limitation to liquid-only hydrometeors. On the other hand, devices with soft membranes are easy to make or replace with highly available materials. This practical advantage as well as the simplicity and versatility of microphone-based, contactless sensing that does not require specialized electronic circuitry would be of value where ultra low cost, short-term campaigns or educational and outreach goals are prioritized. Among the latter we consider TinyML education [10] and affordable weather station projects [1113] as particularly relevant to our work. In this article, we describe such a device and its data analysis methodology.

thumbnail
Fig 1. Challenges of detection with a soft membrane and the proposed solution.

Top pane: a noisy amplitude profile that poses several challenges to amplitude thresholding. Wind (A) is a rich source of false positives and noise, concealing true events (B). Large lock out windows (C and inset) ‘buries’ some impacts (inset expands a 50 ms window containing 3 events). Middle pane: the acoustic feature x is good at discriminating high frequency transient from noise. Vertical bars (cyan): annotations, D: suspected missed annotation (label noise). Bottom panel: convolutional neural networks (CNN) are used to improve the predictions. Vertical bars (yellow): predictions by cluster mean method.

https://doi.org/10.1371/journal.pone.0311995.g001

Our main contribution is a new proposal to identify impacts with high frequency transients in membranes made with elastic hydrophobic materials. By relying exclusively on frequency detection this method is, in principle, tolerant to low frequency noise and is time limited only by the observed ∼5 millisecond transient time. The short lock out window and the decoupling from low frequency spectrum are its main advantages over the amplitude thresholding method. To further emphasize this point the impact detection is implemented without performing amplitude calculations, achieving temporal resolution of about 10 milliseconds with the added benefit of high impact sensitivity. The sensing surface construction and the phenomenology are described in Sects. 2 and 3. Our contribution includes a concrete implementation theory using two original components: A lightweight acoustic feature model to detect instantaneous high frequency vibrations, as described in Sect. 4.4, and a machine learning framework that uses compact, physics-informed convolutional neural networks (CNN) to improve the quality of detection, described in Sect 4.5. Our implementation targets embedded ML ready devices [10] capable of 44.1kHz audio capture (a wide range of consumer microcontrollers and all smartphones fit this bill) and is intended to run ‘near the source’, on small and resource constrained microcontrollers. This is achieved by design considerations taking into account TFLite Micro framework operator support [14], small CNN models with parameters, and half floating point data precision. We provide background for the statistical model in Sect. 4.1 and propose a rigorous self-evaluation methodology and criteria in Sect. 4.2. Several CNN architectures are compared from the point of view of high frequency noise suppression, which remains a challenge.

Our long term goal is an ultra low cost TinyML enabled rainfall measurement device. The results so far and suggestions for moving forward are discussed in Sect. 5. The present methodology can be applied to count impact arrivals directly [15]. With the addition of mean drop size calibration, the rainfall rate estimation is feasible [16, 17]. An addition to the present methodology will be sought for the estimation of each droplet’s kinetic energy. The last two steps, as well as field testing are outside the scope of the present work that focuses on describing a new detection principle. We believe the underlying causes of high frequency transients to be related to droplet-surface interaction as the former disintegrates upon impact. Despite significant recent advances [1821] the setting of high velocity droplets impacting elastic hydrophobic surfaces, to our knowledge has not been studied from the point of view of rainfall metrology. A more detailed understanding of microphysical effects in this context would provide valuable insights into useful material parameter ranges, etc.

The interest to use microphones for rainfall monitoring is growing, with applications in indirect rainfall measurement [2225], biodiversity monitoring [2628] (where rainfall is a nuisance parameter) and leveraging existing surveillance data [29, 30]. A few studies applied deep learning such as convolutional [25, 31], recurrent [32], and attention based neural networks [29]. Our work addresses similar questions. The previous studies target rainfall classification using a few states of (e.g. no, light, moderate, heavy) rain from, typically, unstructured ‘soundscape’ recordings. Our methodology aims at counting each droplet on a purpose made enclosure but does not yet directly estimate the rainfall rate. The previous studies use acoustic deep learning methodologies with generic features (several are explored in [31]) and benefit from well studied feature models and deep learning architectures, but tend to result in memory and computational requirements that are not feasible for small microcontrollers. We use physics informed features and CNN models which may require additional studies, but have so far resulted in extremely frugal algorithms, with most of examined CNNs having less than 1000 weights.

2 Hardware and software

The hardware for an external casing is illustrated in Fig 2, pane 1. We use a standard 100 mm diameter PVC pipe, one end of which is covered with a sturdy plastic wrap to serve as a sensing membrane. We used metalized Mylar (polyethylene terephthalate; PET) and high-density polyethylene (HDPE), both of which are easily available. The advantages of PET (Young modulus E ≈ 2.0 GPa, density ρ ≈ 1.35 g/cm3, thickness d ≈ 0.01 mm) are good physical and chemical stability, humidity and heat protection. HDPE (E ≈ 0.5–1 GPa, ρ ≈ 0.95 g/cm3, d ≈ 0.1 mm) can be obtained from recycled materials, e.g. shopping bags, and be sufficient for short campaigns. The enclosing pipe is cut into segments that are long enough to house a smartphone. The wrap is fastened to the pipe using stationery and duct tapes. The goal is a smooth, well tempered surface with uniform stress. Taking into account 97mm internal diameter, the sensing surface has an area of 73.9cm2. The assembly, excluding the cutting of a pipe, is a simple and safe process that does not require technical equipment.

thumbnail
Fig 2. External casing and acoustic signatures of a raindrop impacting its surface, 2: Time series of a representative impact with visible high frequency vibration.

3: STFT logarithmic spectrogram, showing 100 ms on both sides of the impact. 4: Frequency filter banks, averaged over 5 steps (5 ms) centered on (black) and off (red) the impact. Note a 12–18 kHz mode on the impact. 5: Different contrasts averaged over high frequency (black) and low frequency (red) bands. Vertical bars (green): annotated events.

https://doi.org/10.1371/journal.pone.0311995.g002

Data preparation and handling are conducted entirely within Python 3.11 [33]. The SoX [34] and Audacity [35] software were used for signal processing and visualization, respectively. The deep learning networks are developed with TensorFlow v2.15 [14]. The network training was done on NVIDIA GPUs. All other computing was performed on a CPU-only laptop running a Linux OS. Data, code samples and additional instructions are available at https://github.com/rytis-paskauskas/RainDropCounter.

3 Drop microphysics

The subsequent methodology hinges on an observation, made on several hydrophobic soft materials, that a high frequency vibration is concomitant with a physical impact. An example of the acoustic response is illustrated in Fig 2, panes 2–5. In pane 2, a tenuous transient is visible for about 1 ms, superimposed on a ‘slow’ relaxation of a membrane’s mode. Pane 3 illustrates the same impact using short time Fourier transform (STFT) with 128-sample Hamming windows (2.67 ms) and 1 ms temporal steps. The reference (at zero) and several surrounding impacts are identifiable as narrow vertical lines across the spectrum. Pane 4 shows two slices of the spectrogram ‘horizontally averaged’ over 5 time steps: on impact (black line) and off impact (red line, +60 ms offset). All these observations suggest the existence of a broad mode in the 12–18 kHz range, which we found to characterize many impacts.

Lastly, two band averages (magenta: low frequency, black: 12–18 kHz band), shown in pane 5, suggest that the high frequency band displays a better contrast. In Sect. 4.4 we use this insight to develop a feature model for high frequency detection without the use of Fourier transforms.

The high-frequency transient was observed with both PET and HDPE materials, using pipes of various diameters and different surface tension levels across trials, due to the inability to precisely control surface tension in our DIY setup. These widely varying parameters influenced the resonance frequencies, which stayed within the 1 kHz range, but seemingly had no effect on the high-frequency transient. To our understanding, the time scales and frequencies involved do not fit a conventional picture of impulse response of an elastic membrane. The presence of high frequencies suggest that scales comparable to the droplet size might be involved. Recently there has been considerable interest in soft surfaces [20, 21]. The most common natural rain drop diameters D vary in the range 0.2–5.0 mm [36]. The terminal velocity is well approximated by V(D) = 9.55(1 − exp(−0.6D)) [37]. Even though the high terminal velocity generally leads to splashing [19], additional effects might be in place due to the hydrophobic and elasticity factors. The microphysics of liquid droplet impacting surfaces has longstanding interest [18, 19]. In a different context it has been suggested, for example, that soft surfaces impacted by liquid droplets trap more air underneath than their rigid counterparts, increasing the possibility of surface-droplet interactions through capillary waves [21]. It is presently unknown whether a microphysical effect is the cause of the observed high frequency mode. Additional high resolution experiments could help to identify the optimal material properties. However, even without detailed understanding of the causes, we are able to leverage the apparent acoustic effect and devise a methodology that is based solely on local frequency detection. Its reliance on frequencies increases the robustness to low frequency amplitude noise, whereas the short duration of the transient improves the temporal resolution.

4 Methodology

Rain drop impacts are routinely counted by disdrometers, and databases of aggregated data are available [38]. Our setting has specific requirements: impacts should land on a soft hydrophobic surface (as described in Sect. 2) and high resolution data is required; The sampling frequency should be at least 44,100 Hz. Therefore we first commit a few recordings with annotated impacts to a dataset.

We begin the analysis with an acoustic signal , sampled at fixed time intervals 1/fs with a sampling frequency fs. At this point we diverge from standard practices of machine learning in acoustics [39] (to use raw wave forms or frequency bank averages) and propose a new feature model. Described in detail in Sect. 4.4, it amounts to a nonlinear mapping g(ν) to a pair of time and amplitude sequences using an ‘encoder+ denoiser’, . Here and are, respectively, time and amplitude sequences representing locations of ‘relevant’ acoustic events with a relationship A(ti) = ai; ν = (ν1, ν2) is a control parameter whose components are, respectively, the time and amplitude denoising thresholds. Finally, we use only half of this data through a new variable x, (1) as a single component feature, representing a local frequency estimate at t(xi) = ti. At this point, we have discarded the amplitude variable a from further considerations.

For convenience, we will drop the explicit ν parameter notation, except where it clarifies the discussion. Now we introduce some required notation for data slicing and for relating the parameterized variable x(ν) and the physical time t. Any sub-sequence of the form [xi, …, xi+L−1] (of length L) is denoted as x[i,i+L). Any sub-sequence x[i,i+L) naturally induces a time interval, I, defined by the boundaries I[i,i+L) = [t(xi), t(xi+L)). To simplify further the notation, we will use xr to specify a sub-sequence with a ‘generic’ range r = [i, i + L), and likewise Ir = I(xr).

4.1 The statistical model

Our statistical model is a Bernoulli process (‘coin toss’) over events EI ∈ Ω = {#(impacts in I) = 0, #(impacts in I) > 0} ∼ {0, 1}, with the probability (2) The log odds is modelled by a convolutional neural network (CNN) h(xr; θ) (defined in Sect. 4.5), and σ(h) = 1/(1 + exp(h)) is the ‘sigmoid’ function, mapping h to [0, 1) interpreted as a probability. To summarize, the CNN operates on parameterized sequences xr of various lengths (the minimal length is determined by a model’s hyperparameters), and provides the probability that a physical impact occurs somewhere within a corresponding time interval Ir. We will adopt a standard decision rule parameterized by a cutoff value c ∈ (0, 1). Namely, if σ(h(xr);θ) > c and otherwise.

4.1.1 Datasets from parameterized signal representations.

In the supervised learning context we provide a dataset consisting of pairs where the ranges ri are of fixed length, |ri| = L. For this work we chose L = 15 as a compromise between the benefit of longer sequences for step identification and the benefit of shorter sequences for greater temporal precision. The feature is a sequence of frequency estimates over a (typically short) time interval , and . The ranges are carefully determined to either sample from the gaps between annotated events (with 2 millisecond margins) for each , or to make sure that the annotated event is collocated near the center of a corresponding sequence, if . In this context, two data augmentations are used: 1) oversampling of y = 1 features by shifting the event location 4 positions around the center, and 2) strided sampling of the ‘gap’ features. These are dataset balancing measures, slightly improving heavily skewed datasets (see Table 2).

The features we are considering in a statistical model come from a parametric representation of the acoustic signal in which ν is an unspecified control parameter. A natural question is what value to use for the dataset. We will assume that our data is exposed to different parametric processes each possibly representing a different distribution of data points. To generate a dataset that explores the range of possible denoising parameters we create a union ∪ν,A,r{yr(ν, A), xr(ν, A)}, where ν is sampled from some distribution for each available recording.

To ensure statistical correctness of the training and validation partitions, an ‘arbitrary’ time interval Ivalid = [10, 15) is reserved for the validation dataset, while the remaining available recording time is used for training. For simplicity, we chose an interval that each recording has and is not near the starting boundary. This approach guarantees that any ‘impact’ or ‘gap’ feature in all parametric representations appears in at most one of the partitions.

4.1.2 Standard performance metrics.

The standard classification evaluation tools, where one class is underrepresented are the precision-recall curves (PR) and receiver operating characteristic (ROC) applied to the validation partition. The PR displays the precision, PPV = TP/(TP+FP) against the true positive rate (recall), TPR = TP/(TP+FN) for all available test cutoffs. Similarly, ROC displays TPR against the false positive rate 1−TNR, where the true negative rate is TNR = TN/(TN+FP). The area under curve (AUC) provides useful single measures for the overall model performance that is independent of the cutoff. In our case, ROC describes sensitivity to detecting the events, whereas PR describes the discriminating power. For example, a large ROC, but low PR values suggest that true positives are mostly identified, but a significant number of background is misidentified.

4.2 Time domain evaluation

Ultimately, we are interested in impact arrival time measurements, performed on contiguous recordings. In the inference mode, a fixed strided split of the recording into (overlapping) samples is used with a fixed stride S. It corresponds to taking ri = [Si, Si + L), i = 0, …, where S and L are the stride and the sequence length, respectively. The nonlinearity of the acoustic feature model and sampling from multiple ν parameterizations complicate the picture in which fixed length features ri represent variable length time intervals . Therefore, the standard dataset metrics do not directly address temporal precision-related questions.

4.2.1 Cluster mean method: From interval to spike predictions.

Cluster mean addresses the issue of translating interval predictions by Eq (2) into a sequence of impact time predictions and address two problems: mapping It of an interval to a time prediction, and dealing with the possibility that multiple overlapping intervals may predict the same physical impact. It is relevant for short length features, when {#(impacts in I) > 1} is negligible for most I.

The first problem cannot be unambiguously resolved due to loss of information. We use a fixed offset approach: choose K such that 0 ≤ K < L and posit t(I[i,i+L)) = t(x[i+K)). To address the second problem, we leverage the fact that a strided split of the form x[Si,Si+L) can also be viewed as a i-indexed sequence of intervals. Let r = [i1, i2) be a range, in which all interval predictions are positive and are surrounded by at least one negative prediction on each side: for all i1i < i2 and . We identify such range with an impact, whose event time and the probability score are defined as (3) The cluster mean is a convenient method of translating interval to event predictions. It allows to treat both the ground truth events and predictions on equal footing, as point processes (PP) which are ordered sequences of random numbers.

4.2.2 Temporal pattern comparison based on fixed resolution scores.

Precision of predictions is addressed by comparing a pair of given PPs , on an interval I, derived from respective empirical cumulative distribution functions. Typically one would be the annotations, the other—predicted impact times. We propose to measure their similarity as follows. Let ϵ > 0 be fixed temporal precision, and let be a union of ϵ-neighborhoods centered on each event in . Consider the fraction of events ‘in the other process’ that fall within . There are two non equivalent alternatives: (4a) (4b) In practice, will be be determined by the parameterization ν and the cutoff c, which we have indicated as dependent variables. The temporal resolution ϵ is assumed small in comparison to the mean nearest neighbor distances of either PPs. p1 and p2 stand to each other in a similar relation as ‘precision’ stands to ‘recall’: either one, but not both, could be made arbitrarily close to 1 by making one of the PPs cover the interval I with high enough density. Therefore, in analogy to the ‘F score’, we propose their harmonic mean as the objective function to maximize: (5) Point processes have a broad range of science and engineering applications; we have considered several popular distances and metrics including filtering-based metrics with Laplacian kernel [40, 41] and co-occurrence metric [42, 43]. They provide dimensionless ranking of predictions that are valuable, but do not directly address temporal resolution estimation, prompting us to propose the above methodology.

4.3 The main workflow: Training, validation and precision evaluation

The main steps of the workflow can be summarized as follows: pre-training on a dataset sampled from a ‘broad’ parameter distribution Φ0, followed by optimality study using scoring functions R(ϵ, ν, c) and p1(ϵ, ν, c), and a final evaluation on a new ‘narrow’ dataset Φ1. These steps will lead to understanding of the role of ν, temporal precision and suggest a real time inference strategy.

As a first step, each recording with available annotations is encoded as described in Sect. 4.4, using a sample of ν taken from Φ0, defined as (6) Here denotes uniform random variable between a and b, F−1(0.5) and F−1(0.99) are, respectively, the median and the 99-th percentile with respect to the relevant variable. The latter are the parameter-free encoding deltas, discussed in Sect. 4.4 (see also Fig 3, left pane). Since the dataset size scales with the number of samples, 10 samples are used to generate Φ0. It is probable that none of them are nearly optimal, and some are quite poor.

thumbnail
Fig 3. Acoustic feature model: Differences between encoding and denoising.

Left: Distribution of , shows a large concentration of noise near the origin. A boundary Bν (thick curve) will be used to define noise. Right: Distribution of δt, |δa| after denoising shows a cleaned up signal and a partially recovered 1kHz frequency. Middle: a sample of the signal showing the acoustic events. Connected dots: original sampling, crosses: encoded , large circles: denoised (t, a) = g(ν)(A).

https://doi.org/10.1371/journal.pone.0311995.g003

The negative log likelihood amounts to the standard logistic regression loss function (7) with averaging over ν included. The theoretical goal is (8) In practice, we search for a minimum of Eq (7) using the stochastic gradient descent (SGD) numerically [14] using several trials and a fixed number of iterations. Model performance is evaluated as described in Sect. 4.1.2 on the validation partition and the best model is selected using an additional score, e.g. PR AUC.

The hope is to obtain a fault-tolerant model that generalizes to unseen data, but the premise may look ill conceived: The model is trained on an averaged Φ0 (confirmed by modest training performance) and there is no guarantee that any specific ν should perform better than the average. To investigate this important effect the tools of Sect. 4.2 will be valuable. The high road is to study optimality of the scores R(ϵ, ν, c) or pi(ϵ, ν, c) with a fixed ϵ as a function of ν, e.g.: (9a) (9b) since they are informative about the percentages of hits and misses with ϵ precision. This leads to a better understanding of a ‘score topography’ and allows to locate the best ν for a given score function. However, since ν and c have a different standing: ν is a control parameter, but is a result of optimization, a more robust estimate is of interest. This argument motivates to consider ν*, defined by (10) which is optimal ‘on average’ over a typical range of cutoff, and would not require the prior knowledge of its optimal value. Other statistics such as median or quantiles could be used, and yield similar results.

With the knowledge of a tentative best value ν* we propose to generate a ‘narrow’ dataset Φ1(ν) which is equivalent of ν*+ noise. (11) Although not strictly necessary, the last step shows how to quantify the optimality by comparing Φ0 and Φ1 using dataset-based performance evaluation tools.

4.4 Acoustic feature model

The basic idea of the proposed audio feature model is to track the relevant local peaks and troughs of the signal A(t), recording both the extremal amplitude values, ai, and their temporal locations ti. In the first step, we track all local extrema with a parameter-free encoder g0: , with implied relations and . This encoder can be implemented using bit-shift and comparison operations and used as a low latency audio processing component (see Algorithm 1). A quadratic polynomial interpolation based on three adjacent nodes was adapted to refine the locations of the extrema.

Algorithm 1 Encoder pseudo codet’ and ‘b’ are memory cells, recording local ‘top’ and ‘bottom’ conditions, respectively. Incremental time is recovered from the differential encTj and the sampling frequency fs. An optional improvement step uses a quadratic interpolation between the three values straddling a local extremum.

1: function encode(input, encoded)

2:  Initialize(t, b, steps, i, j)

3:  currentinputi

4:  while MoreData(input, encoded) do

5:   do

6:    ii + 1

7:    nextinputi

8:    stepssteps + 1

9:   while next = current      ⊳ Addressing the ties

10:   t ← (ShiftLeft(t, 1) | (current < next)     ⊳ ‘top’ marker

11:   b ← (ShiftLeft(b, 1) | (current > next)     ⊳ ‘bottom’ marker

12:   if (t & 0x03 = 0x01) | (b & 0x03 = 0x01) then     ⊳ Extremum found!

13:    encAjinputi−1

14:    encTjsteps − 1     ⊳

15:    Interpolate(encAj, encTj | inputi−2, next)     ⊳ Optional step

16:    encodedj = (encAj, encTj)

17:    steps ← 1

18:    jj + 1

19:   end if

20:   currentnext

21:  end while

22: end function

A more realistic model requires handling the electronic noise and undesirable environmental sounds that may be mixed into the target signal A(t). We propose a heuristic denoiser g1(ν) aiming to mitigate high to moderate signal to noise ratio situations. It is based on the notions of outliers and ‘noisy sequences’ with respect to the delta variables , and . The main assumption is that the distribution of and is concentrated near small values, and is mostly noise, whereas the signal of interest is represented by ‘rare’ and large excursions. To approximately subtract the noise from the total distribution we use a heuristic boundary function Bν, parameterized by a denoising control parameter ν = (ν1, ν2). For example, an ‘oval’ boundary and ν are defined so that (12) and implies that ti, ai are part of a noisy sequence. Data that satisfies such condition typically form sequences, which we simply eliminate. The full acoustic feature model is a composition g(ν) = g1(ν) ∘ g0.

Example: pure signal Let A(t) = sin(2πft) be sampled at a rate fs. Then . Denoising has no effect: , and ; ξ = 2 for a quadratic interpolation, ξ = 1 for no interpolation. Note that the result may have considerably fewer samples than the original. Lastly, the vector x, defined by Eq (1), is constant, xn = log f − log(fs/2).

Example: noisy signal Consider a sinusoid with f = 1 kHz, fs = 16 kHz mixed with 5dB white noise. Fig 3 shows the encoded , distribution, the boundary to separate noise from the signal (left pane), and the distribution after denoising (right pane). The middle pane illustrates the actual ‘events’ that we’re focusing on, i.e. the local extremes of the amplitude profile. This example shows that the denoiser g(ν) can approximately recover the frequency of a contaminated signal, making it an effective first defense against high-frequency noise encountered in practice.

There is longstanding acoustic feature research for music, human voice and animal sound classification. To our knowledge, the present model has not been previously proposed. To an extent, it is analogous to the ‘zero crossings’ method, if the crossings were applied to the derivative A′(t) instead of A. The analogy does not extend further both in terms of implementation as well as use cases: zero crossings are biased toward low frequencies, whereas our AFM is sensitive to high frequency detail and, in addition, we apply denoising. It is interesting to note that zero crossings has been previously proposed for percussive sound classification in contemporary music [44].

4.5 Compact CNN architectures for improved impact detection

Our goal in using machine learning is to improve the noisy local frequency estimates provided by the feature x. The framework of choice is one-dimensional artificial convolutional neural network (1D-CNN).

The rain drop counting is posed here as an object (event) detection supervised machine learning task, where each physical impact is counted as an event. What sets this problem apart from other acoustic event or scene identification tasks [45, 46] are very short duration, almost instantaneous events. A further complication is that the relaxation dynamics conceals the trailing edge of a physical impact, leaving only the leading edge and very little temporal depth to go by for developing a pattern recognition strategy. All considered CNN architectures (illustrated in Fig 4) have a ‘denoiser-detector-resolver’ structure. The ‘resolver’ consists of a global max pooling layer (GM), one or more fully connected layers (FC), and a softmax unit for binary classification. The GM implements a ‘best of’ logic and erases the event location within a sequence, but allows inputs of various lengths to be used. This ‘resolver’ part of the model is fixed (except for the hyperparameter values).

thumbnail
Fig 4. Comparison of 1D-CNN architectures.

Residual, DenseNet, and RDN denoising units explore convolutional topologies for high-frequency noise suppression. Conv: Convolution layer followed by batch normalization and ReLU activations. GM: global max, FC: fully connected layers; Step detector: Conv* and Sep. Conv* (separable convolution with unit depth multiplier). Add: adds ‘Input’ to each remaining channel. Multiple inputs to all Conv layers are concatenated. Plots: samples of a random Conv* initializer and trained model filters.

https://doi.org/10.1371/journal.pone.0311995.g004

4.5.1 Basic detector model.

The basic idea of a ‘step detector’ is to identify the leading edge of an impact with a step within a sequence x[i, i + L), and then to use ‘step filters’ to detect them as sufficiently large peaks. The step filter is implemented as a convolutional layer with C filters, resulting in C output channels. To improve the initial guess, we use a non-standard weight initializer imitating a random step function (see the ‘random’ pane in Fig 4, right column).

The main reason to investigate more complex models is that the basic model is susceptible to high frequency noise, whose common cause is wind. The detector is still considered a useful component, implemented in the subsequent models as a separable convolutional layer, and marked with an asterisk in Fig 4, to distinguish the non-standard weight initializer. One perceived disadvantage of the basic model is the use of a single input channel. The following models could be interpreted as a study to generate more input to the detector in a way that produces useful context for denoising.

4.5.2 Models with additional denoising layers.

The deep residual network architecture (ResNet) is well known in computer vision [47] and has also been applied to image denoising [48]. In a model with residual denoiser, we use a single residual block consisting of two convolutional layers with C1, C2 channels, respectively, adapted to a 1D-CNN setting. The input is channel-wise added to each output channel of , therefore the input to the detector layer is of the form , j = 1, …, C2. The parameter λ is controlled by the weight regularization of the convolutional layers of .

In a similar fashion, a model with DenseNet denoiser uses a single DenseNet block [49], combining i = 1, 2 convolutional layers with Ci channels respectively and a DenseNet connectivity pattern. Since the the DenseNet block concatenates all combinations of its layers, the detector input has 1 + C1 + C2 channels.

The Residual DenseNet architecture, which combines ResNet and DenseNet topologies, has been proposed for image denoising [50]. Similarly in a model with Residual DenseNet (RDN) denoiser we employ a single RDN block, consisting of four convolutional layers each with Ci channels, suitably adapted to 1D-CNN setting.

All model topologies are summarized in Fig 4, and their hyperparameters are shown in Table 1. The kernel size is fixed at K = 15 for all models, which is equal to the dataset sequence length (K = L). All considered models are applicable when LK. Where applicable, L2 regularization is used. The hyperparameters are chosen so as to make all models have similar extents.

thumbnail
Table 1. Summary of 1D-CNN model parameters, complementary to Fig 4.

Denoisers: channels of respective Conv layers; Input and output channels of a step detector; FC: units of a fully connected layer.

https://doi.org/10.1371/journal.pone.0311995.t001

5 Results and discussion

5.1 Contribution of data

We committed a few recordings with annotated impacts to a new dataset. The physical impacts are identified with 1 millisecond bounding boxes, annotated in practice to three decimal digits. Higher resolution is possible but not required. Recordings have been collected in urban environment (northern Italy) with a smartphone with controlled gain and 48kHz sampling rate, downsampled to 44100 kHz for further analysis. The devices were not shielded from wind or environmental noise. For example, recording with ID 1 is marked as containing thunders, 2—street noises, 3—strong wind.

Annotating such recordings is a highly laborious task since ideally each impact should be identified, sometimes in the presence of strong background noise (see Fig 1). The methodology we found to work best is to play them at 5%–10% of the original speed implementing, essentially, a downconverter from 20 kHz frequency range to a well audible range where an impact’s ‘thump’ is very clear; for that we used the ‘Audacity’ software [35]. Tentative impacts were cross-examined visually, using an amplitude profile and ‘feature’ plots such as those, shown in Fig 1. Generally we followed a conservative policy: ‘do not annotate if in doubt’. The result so far is 1875 unique events from five recordings, summarized in Table 2.

thumbnail
Table 2. Summary by recording: Annotations, training features, and optimal parameters.

1) Recordings: Len—Total recording time in seconds, N—Number of annotated events, τ—mean inter arrival time (ms), λ—arrival rate per unit area (s−1m−2); 2) Φ0-sampled training data: n1—#{y = 1}, n0—#{y = 0}; 3) Optimal estimates, ν*—Eq (10), , , and event counts .

https://doi.org/10.1371/journal.pone.0311995.t002

5.2 Model training and validation on datasets

Models are trained exclusively on the Φ0(ν)-sampled dataset, see Eq (6). The initial part of the training progression is illustrated in the right column of Fig 5. All models are optimized to a class-weighted binary cross entropy using the stochastic gradient descent (SGD) with variable learning rates and mini batches of 512 or 256 samples [51]. The class weighting is necessary because of highly imbalanced training dataset (see Table 2). Training was restarted multiple times with random initial conditions and trained for 150 or 300 epochs. The denoising layer weights (upstream of the respective step detectors) were L2 regularized with a parameter 0.01 and batch-normalized (with the exception of the basic model that does not use denoising). No other regularization measures were applied. The prediction-recall and receiver operating characteristic areas under the curves (PR AUC and ROC AUC) at the end of each training cycle, and their rms values are summarized in Table 3. The curves (best PR case) are shown in Fig 5, middle panes. Validation datasets with Φ0 (above) and Φ1 (below) are confronted for the same models (which were trained on the Φ0 dataset).

thumbnail
Fig 5. Training progress on Φ0 dataset and validation performance on Φ0 vs Φ1.

Right column: initial training progress showing multiple sample runs. Left four panels: validation performance comparison on Φ0 and Φ1 datasets for the models, trained on Φ0 dataset.

https://doi.org/10.1371/journal.pone.0311995.g005

thumbnail
Table 3. Summary of model performance.

Complementary to Fig 5 summary of PR and ROC AUC for Φ0 and Φ1 validation datasets, showing respective mean and rms (in parentheses).

https://doi.org/10.1371/journal.pone.0311995.t003

The training progressions in Fig 5 suggest that the general tendency is to underfit. A ‘dip’, seen in some early stages of progression is explained by the initial weight distribution of the detector layer, which presumably identifies both true and false frequency jumps, so that most of the remaining iterations are burdened with reducing the number of false positives.

5.3 Search for optimal control parameters

Next, we evaluate Eqs (9a), (9b) and (10) on individual recordings with ϵ = 0.01, using a Φ0-trained residual model from Sect. 4.5.2 as a case study. Available recordings are parameterized on a 10 × 10 ν-grid, with ν1fs/2 and log ν2 spanning, respectively [1, 10] and [11, 38] dB. Cluster mean predictions, Eq (3), are generated for a range of 70 cutoffs c ∈ [0.2, …, 0.9), yielding 7000 predictions per recording. Their comparison with corresponding annotations yield p1(ϵ, ν, c), p2(ϵ, ν, c) and R(ϵ, ν, c). First, and are computed by a straightforward application of Eqs (9a) and (9b) and visualized in Fig 6, panes 2–6: as density maps of , and as contour lines of , overlaid. Then, c ∈ [0.5, 0.8) is used to compute the mean in Eq (10) and to find ν*, which is listed in Table 2. Fig 6, pane 1 shows the p1(ϵ, ν*, c) against p2(ϵ, ν*, c) for all available cutoff values c. Finally, Fig 7 illustrates a sample for the optimal ν*, namely for a range of ϵ thresholds up to 30 ms (left pane), the predictions in the validation range (middle column), and the cumulative probability distribution of the first arrival time, , which is an example of statistics afforded by the present methodology (right column). A considerable fraction of predictions was found to be paired at less than 10 ms. The red line shows the full prediction distribution, whereas the green line shows the distribution a subset filtered by {ti+1ti ≥ 0.01}.

thumbnail
Fig 6. Landscape of scores over a grid of denoising parameters, panes 2–6, for each recording.

Color: (lighter is better). Contour lines: (higher is better). Pane 1: p1(ϵ, ν*, c) vs p2(ϵ, ν*, c) for all available cutoff values c.

https://doi.org/10.1371/journal.pone.0311995.g006

thumbnail
Fig 7. Selected sample predictions.

Left: fraction of predicted events ϵ-close to the ground truth, Eq 4a, for various ϵ and each recording. Middle: audio signal in the validation range and ground truth (green verticals, above zero) vs predictions (red verticals, below zero). Right: the corresponding CDFs. Red line: raw CDF, green line: predictions with spurious duplicates removed.

https://doi.org/10.1371/journal.pone.0311995.g007

5.4 Discussion

5.4.1 Data considerations.

It is of interest to include additional data to cover extreme conditions (large and small drop size limits, high arrival intensity) as well as explore material properties and their temporal degradation. Note that a better ROC vs PR performance in Fig 5 suggests that improving high frequency denoising is more relevant than collecting more impacts, for which the acoustic feature model is quite effective.

The first outstanding question is therefore how to efficiently generate more reliable impact annotations. Natural rain recordings are easy to obtain but the labelling is difficult and possibly prone to bias because the audible signature (percussive bursts) and detection principle (steps in local frequency) are of the same origin. Manual precision dispensers could be used to create individual drops of controllable dimensions. Simultaneous recording with high-speed camera could provide valuable independent cross-validation. Both these methods are feasible only for short time spans with a caveat that their own data would have to be properly interpreted.

Notice that our annotation policy is biased to marking only highly plausible impacts. Almost certainly, we failed to mark a few true impacts with ‘weaker’ signatures (among possible causes being high noise, drop-on-drop impacts, very small droplets). Our data should be viewed from the perspective of label noise with an asymmetric distribution (more unlabelled impacts than incorrectly labelled noise), or as data that contains positive and unlabelled data (PU learning). Since both are mature branches of machine learning, the most promising direction to efficiently generating new annotations or improving the existing ones, and reducing the label noise, may be to adapt an appropriate methodology from reviews [52, 53].

Lastly, Figs 6 and 7 (left column) identify the recording ID = 5 as an outlier, although it should have been an ‘easy’ case. It is suspected that a defective sensing surface lost tension, possibly following a deformation of the outer casing.

5.4.2 A blueprint for a real time application.

The second outstanding question is how to obtain useful models and denoising parameters for a real time application. We believe that a blueprint for a solution lies within the previous sections: Ample evidence is provided by the Φ0–Φ1 comparison (Fig 5 and Table 3) and landscapes of Fig 6 that Φ0-trained models are highly selective. This implies that a suitable pre-trained model will exceed average performance in a broad range of conditions if the denoising parameter is attuned to these conditions. A possible way forward for performing such selection is to turn a larger number of covariates , , ν* from short time intervals as well as recordings without rain into a predictive model. Such a model would perform dynamic calibration by estimating ν* for the current time interval from the past data.

5.4.3 Model training lesson learned.

We found that sampling from a broad distribution such as Φ0 is also one of the main ingredients to successfully train models that generalize to unseen data, the second being a sufficiently large batch size. A large batch size (e.g. 256) is required to reasonably guarantee that a few positive samples are included in each each mini-batch. Our experiments show that the workflow is not iterative: retraining on Φ1 fared worse on the generalization test. We speculate that a broad distribution (Φ0) sampling mitigates the sharp minimizer problem [54] associated to large batch training.

5.4.4 Model differences.

The primary goal of investigating several CNN architectures with denoising layers is to mitigate high-frequency noise, usually caused by wind, for which the basic model seems too simple. The Φ0 vs Φ1 PR and ROC comparison is interesting, suggesting that all models perform well on optimal parameters, but the denoising models perform better on average. It remains to find substantial arguments in favor of either model in Sects. 4.5.2.

5.4.5 Temporal and event count precision.

The left pane of Fig 7 shows as a function of ϵ, interpreted as the fraction of predictions that are within ϵ to the ground truth. It follows that 80–90% of predictions lie within 10 ms in all cases except for the outlier ID = 5. This observation is the basis of our initial claim of accuracy.

The event count density is systematically greater with respect to ground truth, as seen in Table 2, but also indirectly in Fig 6, pane 1, showing prevalence of . As suggested by Fig 1 case D, some extraneous predictions are in fact plausible true events that have not been marked, i.e. they represent corrected label noise. The numerically largest fraction of spurious predictions come from an artifact most likely due to the cluster mean prediction algorithm. It consists in predicting pairs of events a few milliseconds apart for a single reference event. This is demonstrated using cumulative distribution functions (CDFs) in Fig 7, right column. Filtering out events at <10ms yields much better CDFs. It is believed that improvement of the cluster mean method and, possibly, CNN hyperparameters will solve this issue in the future. The cause of most concern are false positives due to wind, since strong gusts can excite deceptively similar acoustic signatures. Wind as a source of inaccurate readings due to shaking has been identified with traditional impact disdrometers as well. Possible remedies include wind shielding, experimenting with thicker materials, or adopting more effective denoising methodologies.

5.4.6 Generalizations and extensions.

An interesting generalization to pursue is an acoustic equivalent of algorithm unrolling. The goal would be to convert the current methodology, composed of several isolated components, to an equivalent, end-to-end deep learning workflow. The paradigm of algorithm unrolling [55] has been successfully used in image denoising and, importantly, it presents a path to embed known fundamental facts in an explainable deep learning modelling [56]. Explainable models are easier to generalize to other settings. For example, it is of interest to extend these results to the physics of crumpling thin sheets, which has recently attracted considerable attention [57] with a wealth of experimental and theoretical [58, 59] results available.

6 Conclusions

High precision instrumentation usually implies materials of superior quality and fine manufacturing processes involved. It is of interest when high precision can be achieved with off-the-shelf materials and coarse ‘manufacturing’. This type of innovation is the main theme of the present work, focusing on precision rain drop counting with an ultra low cost rig, a new detection principle and innovative signal analysis. Our main proposal is based on frequency detection of a high frequency ‘percussive’ acoustic transient, using a drum-like sensor, which is an unfamiliar detection principle for rainfall measurement instrumentation. The origin of this transient is unclear and invites further experimentation with soft hydrophobic materials. In a ‘proof is in the pudding’ approach, we demonstrate the feasibility with a concrete numerical implementation and a detailed analysis of the results. From signal analysis perspective we are seeking to identify percussive impacts in a noisy signal, leveraging time scale separation. Our implementation has two original components: a non-local audio processing unit that we call ‘acoustic feature model’ and a convolutional neural network architecture, optimized to deal with noise and to improve the predictions. This methodology is designed with an explicit goal to run ‘near the source’, on small, resource constrained microcontrollers or smartphones. It will be of interest to a growing community of embedded engineers and TinyML enthusiasts, because it is compatible with embeddedML ready [10] microcontrollers. As a meteo-instrument, the hardware is limited by the durability standard. While there are also implementation details to be optimized (such as improving the method of translating interval predictions to event counts) and field testing in real time conditions to be performed, we nevertheless believe this methodology to be of interest in certain environmental sensing applications, especially where ultra low budget and short term campaigns are involved. We also believe the methodology to have broader interest in applications where detection of percussive events is of interest, from music classification to the crumpling of thin sheets.

Acknowledgments

Marco Zennaro (ICTP) is acknowledged for useful insights. The Information and Communication Technology Section is acknowledged for providing access to the EuroHPC supercomputer LEONARDO [60], hosted by CINECA (Italy).

References

  1. 1. Joss J, Waldvogel A. Ein Spektrograph für Niederschlagstropfen mit automatischer Auswertung. Pure and Applied Geophysics. 1967;68(1):240–246.
  2. 2. Michaelides S, Levizzani V, Anagnostou E, Bauer P, Kasparis T, Lane JE. Precipitation: Measurement, Remote Sensing, Climatology and Modeling. Atmospheric Research. 2009;94(4):512–533.
  3. 3. Nystuen JA, Prony JR, Charles A Lauter, Jr, Bufkin J, Rivero U, Boland M, et al. APL Disdrometer Evaluation. Miami, Florida: NOAA; 1994. AOML-83.
  4. 4. Lane J, Kasparis T, McFarquhar G. Acoustic Rain Gauge Array Experiment- Phase I. In: International Conference on Remote Sensing for Marine and Coastal Environments, 4 Th, Orlando, FL; 1997.
  5. 5. de Jong S. Low Cost Disdrometer. TU Delft; 2010.
  6. 6. Lu J, Yang Z, Wang J. A Membrane Disdrometer Based on Membrane Vibration. Measurement Science and Technology. 2015;26(11):115103.
  7. 7. Antonini A, Melani S, Mazza A, Baldini L, Adirosi E, Ortolani A. Development and Calibration of a Low-Cost, Piezoelectric Rainfall Sensor through Machine Learning. Sensors. 2022;22(17):6638. pmid:36081097
  8. 8. Lanza LG, Merlone A, Cauteruccio A, Chinchella E, Stagnaro M, Dobre M, et al. Calibration of Non-catching Precipitation Measurement Instruments: A Review. Meteorological Applications. 2021;28(3):e2002.
  9. 9. Tokay A, Bashor PG, Wolff KR. Error Characteristics of Rainfall Measurements by Collocated Joss–Waldvogel Disdrometers. Journal of atmospheric and oceanic technology. 2005;22(5):513–527.
  10. 10. Plancher B, Buttrich S, Ellis J, Goveas N, Kazimierski L, Sotelo JL, et al. TinyML4D: Scaling Embedded Machine Learning Education in the Developing World. Proceedings of the AAAI Symposium Series. 2024;3(1):508–515.
  11. 11. van de Giesen N, Hut R, Selker J. The Trans-African Hydro-Meteorological Observatory (TAHMO). WIREs Water. 2014;1(4):341–348.
  12. 12. Theisen A, Ungar M, Sheridan B, Illston BG. More Science with Less: Evaluation of a 3D-printed Weather Station. Atmospheric Measurement Techniques. 2020;13(9):4699–4713.
  13. 13. Muita R, Kucera P, Aura S, Muchemi D, Gikungu D, Mwangi S, et al. Towards Increasing Data Availability for Meteorological Services: Inter-comparison of Meteorological Data from a Synoptic Weather Station and Two Automatic Weather Stations in Kenya. American Journal of Climate Change. 2021;10(3):300–316.
  14. 14. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. TensorFlow: Large-scale Machine Learning on Heterogeneous Systems; 2015.
  15. 15. de Villiers D, Schleiss M, ten Veldhuis MC, Hut R, van de Giesen N. Something Fishy Going on? Evaluating the Poisson Hypothesis for Rainfall Estimation Using Intervalometers: Results from an Experiment in Tanzania. Atmospheric Measurement Techniques. 2021;14(8):5607–5623.
  16. 16. Atlas D, Ulbrich CW. Path-and Area-Integrated Rainfall Measurement by Microwave Attenuation in the 1–3 Cm Band. Journal of Applied Meteorology and Climatology. 1977;16(12):1322–1331.
  17. 17. Tokay A, Short DA. Evidence from Tropical Raindrop Spectra of the Origin of Rain from Stratiform versus Convective Clouds. Journal of Applied Meteorology and Climatology. 1996;35(3):355–371.
  18. 18. Yarin AL. Drop Impact Dynamics: Splashing, Spreading, Receding, Bouncing…. Annual Review of Fluid Mechanics. 2006;38(1):159–192.
  19. 19. Driscoll MM, Nagel SR. Ultrafast Interference Imaging of Air in Splashing Dynamics. Physical Review Letters. 2011;107(15):154502. pmid:22107295
  20. 20. Chen L, Bonaccurso E, Deng P, Zhang H. Droplet Impact on Soft Viscoelastic Surfaces. Physical Review E. 2016;94(6):063117. pmid:28085484
  21. 21. Mitra S, Vo Q, Tran T. Bouncing-to-Wetting Transition of Water Droplets Impacting Soft Solids. Soft Matter. 2021;17(24):5969–5977. pmid:34047748
  22. 22. Trono EM, Guico ML, Libatique NJC, Tangonan GL, Baluyot DNB, Cordero TKR, et al. Rainfall Monitoring Using Acoustic Sensors. In: TENCON 2012 IEEE Region 10 Conference. IEEE; 2012. p. 1–6.
  23. 23. Guico ML, Abrajano G, Domer PA, Talusan JP. Design and Development of a Novel Acoustic Rain Sensor with Automated Telemetry. 2018;.
  24. 24. Ghada W, Estrella N, Menzel A. Machine Learning Approach to Classify Rain Type Based on Thies Disdrometers and Cloud Observations. Atmosphere. 2019;10(5):251.
  25. 25. Avanzato R, Beritelli F. An Innovative Acoustic Rain Gauge Based on Convolutional Neural Networks. Information. 2020;11(4):183.
  26. 26. Sánchez-Giraldo C, Bedoya CL, Morán-Vásquez RA, Isaza CV, Daza JM. Ecoacoustics in the Rain: Understanding Acoustic Indices under the Most Common Geophonic Source in Tropical Rainforests. Remote Sensing in Ecology and Conservation. 2020;6(3):248–261.
  27. 27. Ferroudj M, Truskinger A, Towsey M, Zhang L, Zhang J, Roe P. Detection of Rain in Acoustic Recordings of the Environment. In: Pham DN, Park SB, editors. PRICAI 2014: Trends in Artificial Intelligence. vol. 8862. Cham: Springer International Publishing; 2014. p. 104–116.
  28. 28. Bedoya C, Isaza C, Daza JM, López JD. Automatic Identification of Rainfall in Acoustic Recordings. Ecological Indicators. 2017;75:95–100.
  29. 29. Chen M, Wang X, Wang M, Liu X, Wu Y, Wang X. Estimating Rainfall from Surveillance Audio Based on Parallel Network with Multi-Scale Fusion and Attention Mechanism. Remote Sensing. 2022;14(22):5750.
  30. 30. Wang X, Glade T, Schmaltz E, Liu X. Surveillance Audio-Based Rainfall Observation: An Enhanced Strategy for Extreme Rainfall Observation. Applied Acoustics. 2023;211:109581.
  31. 31. Wang X, Wang M, Liu X, Glade T, Chen M, Xie Y, et al. Rainfall Observation Using Surveillance Audio. Applied Acoustics. 2022;186:108478.
  32. 32. Sah DK, Cengiz K, Ivković N, Gehlot A, Salah B. Acoustic Signal-Based Indigenous Real-Time Rainfall Monitoring System for Sustainable Environment. Sustainable Energy Technologies and Assessments. 2023;60:103398.
  33. 33. Python.Org. https://www.python.org.
  34. 34. SoX—Sound eXchange; 2024. https://sourceforge.net/projects/sox/.
  35. 35. Audacity ® | Free Audio Editor, Recorder, Music Making and More!;. https://www.audacityteam.org/.
  36. 36. Tokay A, Wolff PB, Dursun OK. On the Measurements Errors of the Joss–Waldvogel Disdrometer, 31st International Conference on Radar Meteorology. 2003;.
  37. 37. Serio MA, Carollo FG, Ferro V. Raindrop Size Distribution and Terminal Velocity for Rainfall Erosivity Studies. A Review. Journal of Hydrology. 2019;576:210–228.
  38. 38. Adirosi E, Porcù F, Montopoli M, Baldini L, Bracci A, Capozzi V, et al. Database of the Italian Disdrometer Network. Earth System Science Data. 2023;15(6):2417–2429.
  39. 39. Bianco MJ, Gerstoft P, Traer J, Ozanich E, Roch MA, Gannot S, et al. Machine Learning in Acoustics: Theory and Applications. The Journal of the Acoustical Society of America. 2019;146(5):3590–3628. pmid:31795641
  40. 40. van Rossum MC. A Novel Spike Distance. Neural computation. 2001;13(4):751–763. pmid:11255567
  41. 41. Schreiber S, Fellous JM, Whitmer D, Tiesinga P, Sejnowski TJ. A New Correlation-Based Measure of Spike Timing Reliability. Neurocomputing. 2003;52:925–931. pmid:20740049
  42. 42. Quian Quiroga R, Kreuz T, Grassberger P. Event Synchronization: A Simple and Fast Method to Measure Synchronicity and Time Delay Patterns. Physical Review E. 2002;66(4):041904. pmid:12443232
  43. 43. Lyttle D, Fellous JM. A New Similarity Measure for Spike Trains: Sensitivity to Bursts and Periods of Inhibition. Journal of neuroscience methods. 2011;199(2):296–309. pmid:21600921
  44. 44. Gouyon F, Pachet F, Delerue O. On the Use of Zero-Crossing Rate for an Application of Classification of Percussive Sounds. In: Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00), Verona, Italy. vol. 5; 2000. p. 16.
  45. 45. Serizel R, Turpault N, Shah A, Salamon J. Sound Event Detection in Synthetic Domestic Environments. In: ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona, Spain: IEEE; 2020. p. 86–90.
  46. 46. Nam H, Kim SH, Ko BY, Park YH. Frequency Dynamic Convolution: Frequency-Adaptive Pattern Recognition for Sound Event Detection; 2022.
  47. 47. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 770–778.
  48. 48. Zhang K, Zuo W, Chen Y, Meng D, Zhang L. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Transactions on Image Processing. 2017;26(7):3142–3155. pmid:28166495
  49. 49. Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks; 2018.
  50. 50. Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y. Residual Dense Network for Image Super-Resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 2472–2481.
  51. 51. Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT press; 2016.
  52. 52. Bekker J, Davis J. Learning from Positive and Unlabeled Data: A Survey. Machine Learning. 2020;109(4):719–760.
  53. 53. Frenay B, Verleysen M. Classification in the Presence of Label Noise: A Survey. IEEE Transactions on Neural Networks and Learning Systems. 2014;25(5):845–869. pmid:24808033
  54. 54. Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima; 2017.
  55. 55. Gregor K, LeCun Y. Learning Fast Approximations of Sparse Coding. In: Proceedings of the 27th International Conference on International Conference on Machine Learning; 2010. p. 399–406.
  56. 56. Elad M, Kawar B, Vaksman G. Image Denoising: The Deep Learning Revolution and Beyond—A Survey Paper. SIAM Journal on Imaging Sciences. 2023;16(3):1594–1654.
  57. 57. Lahini Y, Rubinstein SM, Amir A. Crackling Noise during Slow Relaxations in Crumpled Sheets. Physical Review Letters. 2023;130(25):258201. pmid:37418710
  58. 58. Steinbock C, Katzav E, Boudaoud A. Structure of Fluctuating Thin Sheets under Random Forcing. Physical Review Research. 2022;4(3):033096.
  59. 59. Steinbock C, Katzav E. Dynamics of Fluctuating Thin Sheets under Random Forcing. Physical Review E. 2023;107(2):025002. pmid:36932563
  60. 60. Turisini M, Cestari M, Amati G. LEONARDO: A Pan-European Pre-Exascale Supercomputer for HPC and AI Applications. Journal of large-scale research facilities JLSRF. 2024;9(1).