Cough detection using a non-contact microphone: A nocturnal cough study

An automatic non-contact cough detector designed especially for night audio recordings that can distinguish coughs from snores and other sounds is presented. Two different classifiers were implemented and tested: a Gaussian Mixture Model (GMM) and a Deep Neural Network (DNN). The detected coughs were analyzed and compared in different sleep stages and in terms of severity of Obstructive Sleep Apnea (OSA), along with age, Body Mass Index (BMI), and gender. The database was composed of nocturnal audio signals from 89 subjects recorded during a polysomnography study. The DNN-based system outperformed the GMM-based system, at 99.8% accuracy, with a sensitivity and specificity of 86.1% and 99.9%, respectively (Positive Predictive Value (PPV) of 78.4%). Cough events were significantly more frequent during wakefulness than in the sleep stages (p < 0.0001) and were significantly less frequent during deep sleep than in other sleep stages (p < 0.0001). A positive correlation was found between BMI and the number of nocturnal coughs (R = 0.232, p < 0.05), and between the number of nocturnal coughs and OSA severity in men (R = 0.278, p < 0.05). This non-contact cough detection system may thus be implemented to track the progression of respiratory illnesses and test reactions to different medications even at night when a contact sensor is uncomfortable or infeasible.


Introduction
During sleep, different sounds can be recorded and analyzed. These sleep sounds can be related to physiological characteristics of the recorded individual, such as breathing, snoring, movement, speech, and coughing. Cough is a respiratory reflex considered to have defensive capabilities aimed at removing mucus and foreign particles from the lower airways [1]. It is also associated with many pathological conditions, most of which are related to the respiratory system (e.g., Chronic Obstructive Pulmonary Disease (COPD), asthma, and COVID- 19), although others are seasonal such as allergies or colds.
A typical cough sound contains two or three phases. The first phase is an initial explosive phase with a very sharp increase in energy while air is released [2,3]. The second phase is composed of a laminar airflow characterized by smaller amplitudes. The third phase (not always present) is composed of a turbulent airflow that includes a pitch frequency caused by a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 and hypopnea events per hour. The mean prevalence of OSA was reported to be 22% in men and 17% in women [16]. Daytime sleepiness is the main complaint among patients with OSA. Other common features include loud snoring, restless nocturnal sleep, and choking. Interventions for obstructive sleep apnea, such as Continuous Positive Airway Pressure (CPAP), have recently been shown to lessen the frequency of coughing [11][12][13][14]. It is important to know what sleep stages are involved during OSA events. Because REM sleep is associated with greater sympathetic activity, lower vagal tone, and more cardiovascular instability than non-REM sleep, obstructive events during REM sleep may disproportionately lead to hypertension and other adverse cardiovascular outcomes [17]. Since OSA is associated with coughs, much would be gained by investigating the prevalence of nocturnal coughs during different sleep stages.
Previous works have also explored nocturnal coughs and their association with different subject characteristics [18,19]. One study analyzed nocturnal coughs and found a positive correlation between cough rate and age and BMI in Primary Ciliary Dyskinesia (PCD) patients [18], while [19] found that nocturnal cough was more frequent in asthmatic adults who were either overweight or obese.
The evaluation of cough intensity and its frequency of occurrence can provide valuable clinical information [20]. Currently, cough assessment is based on the administration of a patientcompleted questionnaire [21] or the doctor's opinion. These subjective measures tend to be biased [22]. By contrast, automatic cough detection, an objective measure of cough evaluation [9], can contribute to the diagnosis, the tracking of the progression of respiratory diseases, and testing for reactions to different medications, in addition to saving manpower.
There is growing interest in developing algorithms to automate the process of detecting cough events from audio signals. In Zigel et al. [9], whole-day (24h) audio recordings of 70 subjects equipped with wearable microphones were analyzed. In that study, the cough detection system was based on the adaptive Gaussian Mixture Model (GMM) and resulted in a sensitivity of 88.5%, a specificity of 95.6%, and a low Positive Predictive Value (PPV) of less than 20%. Another study [1] used the K-Nearest Neighbor (KNN) classifier with local Hu moments as inputs and achieved 88.51% sensitivity with a PPV of 87.51% and a specificity of 99.72%. You et al. [23] employed a Support Vector Machine (SVM) with non-Negative Matrix Factorization (NMF), and achieved an average sensitivity of 80.1%, a specificity of 83.1%, and a PPV of 83.5%. Matos et al. [24] used a Hidden Markov Model (HMM) with Mel-Frequency Cepstral Coefficients (MFCC) and achieved 71% sensitivity. Artificial Neural Networks (ANN) and Deep Neural Networks (DNN) have also been implemented in several studies to detect cough sounds using a variety of features; Drugman et al. [25] extracted 222 features including MFCC and its derivatives, spectral variation, and the spectral centroid. After feature selection, 50 features were fed into an ANN, resulting in classification performance of 89.85% sensitivity and 89.97% specificity (PPV not reported). Liu et al. [26], used a combination of a Hidden Markov Model (HMM) and DNN for classification with MFCC features as input and achieved a performance of 83.6% sensitivity and 90.9% specificity. A Convolutional Neural Network (CNN) was used in [27] on a database of 627 cough events, and reported a sensitivity of 86.8% and a specificity of 92.7%. Simou et al. [28] used mel-spectrograms and a Long-Short Term Models (LSTM) for cough detection on a database of 4,062 cough events. They achieved a sensitivity of 87.8%, a specificity of 98.9%, and an Area under the Curve (AUC) of 98.6% (PPV not reported). A comparative study was conducted by [29], which compared different sets of features and a variety of neural networks to classify cough events. The database was based on 3,114 cough events and 4,667 other sound events (such as speech). Their best system achieved an accuracy of 91.2% and an AUC of 0.965 (the sensitivity and PPV performances were not reported).
As suggested above, there are a number of main challenges to developing cough detection algorithms. 1) the variability of cough sounds between and within individuals, as well as the complexity of different respiratory diseases; 2) The differentiation of cough sounds from background noises or other sounds produced by the sleeper such as snoring, throat clearing, speech, sneezing, and other ambient sounds [24]. The former can lower the sensitivity of the algorithm for a particular recording, due to the lack of generalization of the algorithm to the type of cough sounds present in that recording, whereas the latter can lower its specificity and PPV, due to the occurrence of events incorrectly detected as cough events.
Here we present a reliable, simple automatic non-invasive cough detector that distinguishes spontaneous nocturnal coughs from snores and other sounds and noises (e.g., speech, manipulation of the device, drinking). The main contributions of this work are: (1) the usage of an extensive dataset containing nocturnal audio recording from 89 patients, (2) the development of dedicated features to best describe a variety of cough sounds, (3) the development of an innovative cough detection system based on DNN and a comparison to a GMM based system, (4) analysis of the correlations between nocturnal cough rate and sleep stages as well as between cough rate and OSA severity (AHI), age, gender, and Body Mass Index (BMI).

Experimental setup
The data were collected at a university-affiliated Sleep-Wake Disorder center and biomedical signal processing laboratory. These routinely collected data were analyzed anonymously; therefore, informed consent was not required. The Institutional Review Committee of Soroka University Medical Center approved this study protocol (protocol number 10141). The institutional review board waived the need for written informed consent from the participants.
The database was composed of whole-night recordings (mean recording time 7.1 hours) of 89 adults with a variety of health conditions (asthma, bronchitis, sinusitis, etc.). All are typical to our region (Israel), do not sleep with CPAP, and represent a wide range of ages, BMIs (kg × m -2 ) and AHIs (events × hr -1 ) ( Table 1). All the subjects were referred for a Polysomnography test (PSG), the gold standard for sleep stage evaluation (SomnoPro 19 PSG, Deymed Diagnostic, Hronov, Czech Republic) in the sleep-wake disorder unit (Soroka University Medical Center) for a sleep evaluation. PSG scoring includes sleep-wake patterns determined by a trained technician and underwent a second scoring by the head of the sleep lab. The scoring included labeling of each epoch (30 sec) as one of the 5 sleep stages using the PSG signals (for more details see [30]), and AHI was estimated by a trained technician. Audio signals were recorded from each participant by a digital audio recorder (Edirol, R4-pro, Shizuoka, Japan) connected to a condenser microphone (Rode, NTG-1, Sydney, Australia) that was suspended 1 m above the subject's head. The audio signals were digitized at a sampling frequency of 44.1 kHz with 16-bit resolution and were synchronized with the PSG data.
The dataset was randomly divided into three groups: train, development, and test (Table 2), while maintaining the balance of the number of events, AHI, gender age, and BMI in each dataset. A total of 1533 cough events were detected. Methods A cough detection system was developed to detect and analyze cough sound events in nocturnal audio files. Fig 2 presents a block diagram of the system. Since cough is a transient energetic sound event, detection of energetic events was performed; the sound events were further divided into three categories: cough, snore, and noise. Each dataset group went through pre-processing, event detection, and feature extraction. All events in the training dataset were manually labeled and used to train the three system cough, snore, and noise models. The models were fed a selected set of acoustic features from the time and spectral domains.
This article has online Supporting Information, S1-S4 Datasets.

Pre-processing
The pre-processing stage included DC removal and down-sampling from 44.1 kHz to 16 kHz since most of the relevant spectral information was below 8 kHz.

Event detection
In the event detection step, to identify suspicious events as coughs, the energy was calculated from each frame (20ms length, 50% overlap) using the following equation: where j represents the frame number, i represents the sample index in the j th frame, and N is the total number of samples in a frame. Energy and duration thresholds were applied to the energy signal. Three energy thresholds were calculated on each recording using the energy histogram calculated from the whole recording. The first energy threshold, th 1 , was used to detect the presence of a sound event and was calculated from a binned energy histogram, E hist , with the equation: where E hist {j} was the histogram value of the j th bin. B was set to 0.08, and A was empirically set such that E{B�max{E hist }}<th 1 <max{E}. The idea was to set an event detection threshold that was much higher than the background noise (the main histogram peak, see Fig 3). The purpose of the other two thresholds was to detect the beginning and end of each sound event ( Fig 4B). The beginning and the end thresholds were lower than th 1 and were calculated by (3) and (4), respectively: Thresholds th 1 -th 3 are recording-dependent. Their values were calculated for each audio file (subject) using the energy histogram of that file and some global parameters; hence, the thresholds are not dependent on one specific recording setup. In addition, the constants (global parameters) that appear in Eqs ((2)-(4)) were chosen based on the training dataset, which enables repeatability of the method.
An example of an energy histogram of a whole night's audio signal and its energy thresholds is shown in Fig 3. The energy histograms of each event type (cough, snore, noise), and the energy of other sounds (background noises), which did not pass the energy thresholds in the event detection are shown in Fig 5. After detecting the events above the thresholds, the detection algorithm only kept those events with a duration between 0.15s and 1.1s (duration threshold), a common cough length [9].

Manual classification
The manual classification (labeling) step was performed using a Graphical User Interface (GUI) by two listeners, with 93% agreement (Cohen's kappa = 0.875), where most of the disagreements were between snores and noises. Each detected sound event from all the dataset groups was marked manually as a cough, snore, or noise sound. This was done to generate labels for the detected events, which were used later as the gold standard for classification and evaluation. For the labeled data information see S1 Table in the S1 Dataset.

Feature extraction
A feature vector was extracted from each detected sound event. For precise values of the extracted features, see S2 -S4 Datasets in the supplementary information. These features were needed to characterize and differentiate cough sounds from other interference (such as snores and other noises). To extract these features, and since a cough sound has two to three parts (see Introduction), we automatically divided each detected sound event into two to three main parts. To define these cough parts, the energy peak, E p , was first calculated from the first part of the event (between 0 and 0.079s). The onset of the second part was defined as the first local energy minimum (E min1 , see Fig 4C). A voiced feature (Voiced H −see description below) was calculated to determine the existence of a 3 rd part of the event; this feature was calculated from each frame in the event, starting from the onset of the second part (see Fig 4D). The onset of the third part, if present, was determined when Voiced H > 0.45 (E min2 , Fig 4C and 4D). Based on this feature, the proportion of the voiced versus unvoiced coughs was 2:3, respectively. From each detected sound event, 34 features were extracted from 12 feature types (Table 3) as detailed below.
Energy index. The frame (time) index of maximum energy in the first part of cough sound (Fig 4C).

Mel Frequency Cepstral Coefficients (MFCC).
A widely used set of spectral features in speech processing [29]. Twenty-four cepstral features were obtained: 12 MFCCs from the first part of the event and 12 from the second part. Voiced H . Since some coughs have a voiced part (the third part), and since the fundamental frequency of the vocal cords is usually between 50 and 255Hz [31], a voiced feature was determined. This feature was calculated as the amplitude of the autocorrelation peak in the region of these frequencies [32]. This feature contributed to discriminating between voiced coughs and non-voiced noises.
Voiced L .
The assumption was that snores are usually more harmonic (semi-periodic) in this low range of frequencies (10-100Hz) [33]. For this reason, this feature was used to better separate coughs from snores.
LPC variance. A measure of stationarity using spectral content. From each frame, eight LPCs were extracted and the variance of each coefficient was calculated. The LPC variance feature is defined as follows: where K is the number of frames of the detected event, � a m is the mean value of the m th LPC along the event, M is the number of LPCs, and σ 2 is the notation for variance.
Energy ratio. Given the exploding phase of the first part of a cough [34], the assumption was that the ratio between the energy peaks of the first part and second part of a cough event would usually have higher values than other sounds (Fig 4C).
Zero-Crossing Rate (ZCR). The mean ZCR value was calculated from all frames in the detected event. It usually results in higher values for noisy and high-frequency signals where sign changes are more frequent [35].
Kurtosis. The kurtosis, which measures how heavily the tails of a probability distribution of the sound differ from the tails of a normal distribution, was calculated on the audio sample values of the detected sound event.
Skewness. The skewness, which measures the asymmetry of the probability distribution of the sound, was calculated on all the sample values of the detected sound event.
Entropy. Entropy was defined as the average rate at which information is produced by a stochastic source of data. Here, the entropy was estimated on the time samples of the detected event, as in: where P is the appearance probability of x(i). Spectral Centroid (SC). A measure that indicates where the "center of mass" of the spectrum is located. This feature was calculated with the general equation: where f(i) is the frequency of the i th spectral bin and X is the Fourier transform of x.

Feature selection
Feature selection is used to find a subset of features that are the most relevant to the data. The algorithm here was based on Sequential Forward Selection (SFS) [36] and a criterion, J, which was calculated using the calculated sensitivity and PPV of the GMM classifier: where the definitions of Sensitivity and PPV appear in the Evaluation measures section. For this purpose, the feature selection algorithm used the training and the development datasets and their true labels. This method saves the most informative features at each step by maximizing the criterion and choosing the next feature depending on features that have already been selected. The selected features for each step were stored with their criterion values. The feature subset that achieved the highest criterion value was then chosen (see in Fig 6).

Classifiers
Two different classifiers were developed and tested: a GMM and a DNN. GMM. Training. To determine the GMM order; i.e., the number of Gaussians per model (cough, snore, and noise), multiple combinations of Gaussian numbers were tested (up to 20 Gaussians per model). For feature selection, the models were trained on the training dataset and were evaluated on the development dataset to calculate the feature selection criterion (8). The selected feature subset is shown in Table 3 and Fig 6). The test dataset was only used for the final analysis (see Results section).
Classification (testing). This step used the extracted set of features from the detected events and three trained GMM models to classify the test dataset. To determine whether the event was a cough, each model produced a likelihood score for belonging to this model (9): where β is the feature vector; λ j is the j th model (j = 1,2,3): cough, snore, or noise; m j is the number of Gaussians in the j th model; ω i is the weight of each Gaussian in the mixture model; μ i is the mean of the i th component of the model; S i is the covariance matrix of the i th component of the model, and D is the size of the feature vector β.
After calculating the three probabilities for each event, a Log-Likelihood Ratio (LLR) score was calculated. In order to differentiate between coughs and other sound events, the probability of belonging to the cough class (S 1 ) was divided by the maximum probability of being included in the snore (S 2 ) or noise class (S 3 ), as described in the following equation: The threshold was set according to the Receiver Operating Characteristic (ROC) curve of sensitivity vs. the PPV for different thresholds, as shown in Fig 7. Above this threshold, the events were marked as coughs.
DNN. Training. The model architecture is presented in Fig 8. The model's input was the training dataset containing the same selected features as the previous classifier. Several model parameters and architectures were evaluated for classification. The chosen model was trained using an Adam optimizer (an extension to stochastic gradient descent), with a learning rate of 5×10 −6 and a batch size of 32. The model was a simple DNN, which contained two Fully Connected hidden layers with a 50% dropout in between. The model had 13,502 learnable parameters and the training converged after 250 epochs. To deal with the unbalanced data, class weights were integrated into the model. Although a GMM model was trained with a threeclass classification, the best classification results for the DNN were obtained with a model trained on two classes, where the snore and noise events were merged into one class.
Classification (testing). Each detected sound event from the test dataset underwent feature extraction; this feature vector was the input to the DNN classification system. The model outputted two probability scores (by the softmax output layer), one for each class (S 1 -cough,   S 2 -non-cough). Then, the LLR score was calculated by: Events whose LLR exceeded a pre-defined threshold (see Results section) were marked as coughs.

Evaluation measures
The evaluation parameters described below were implemented. Accuracy. The percentage of correct classifications out of the total number of examined events.
Sensitivity (recall). The percentage of coughs that were correctly identified out of the total number of true-labeled coughs.
PPV (precision). The percentage of coughs that were correctly identified out of the events that were classified as coughs.
Specificity. The percentage of non-cough events that were correctly classified. NPV (negative predictive value). The percentage of coughs that were correctly identified as non-coughs out of the total number of true-labeled non-coughs.
Cohen's kappa (κ). This measure is very useful for multiclass problems. Specifically, it takes the unbalanced deviations of the different classes into account [37].

Cough analysis
After cough detection using the best classifier (DNN in this case, see Results), the detected cough events were analyzed (cough rate, #/hour) as a function of the subjects' characteristics; namely, AHI, BMI, gender, age, and sleep stages. A Wilcoxon rank-sum test (α = 0.05) [38] was used to assess the significance of the results.

Cough detection
The cough detection algorithm was trained on 36 subjects and tested on 26 subjects (Table 1). The feature selection algorithm yielded a set of 31 features (see Fig 6 and Table 3). These features were then used for training and testing both classifiers. GMM. Different model orders were tested using the training and development datasets; the optimal model orders assigned were one Gaussian for the cough model, one Gaussian for the snore model, and 16 Gaussians for the noise model. The LLR score was calculated from each detected sound event. The distribution of LLR values for each of the classes can be seen in Fig 7. These distributions exhibited a good separation between coughs and non-coughs. The sensitivity, specificity, and PPV were calculated by comparing the LLR scores of each event to an LLR decision threshold. Analyzing the Area Under Curve (AUC) of the ROC curve for the two classifiers on the development dataset yielded an AUC of 0.984 for GMM and 0.995 for DNN. The precision-recall-AUC was higher in the DNN (0.86) than the GMM (0.794). Therefore, the DNN was chosen as the classifier for further analysis of coughs during sleep. Fig 9  depicts two ROC plots of these performance measures on the test dataset. To find a suitable LLR decision threshold for classification, the performance measurements were examined on the development dataset: an arbitrary LLR threshold was set to 1.17 to achieve a reasonable compromise between sensitivity and specificity (sensitivity of 85.06%, and a specificity of 99.64% for the development dataset).

DNN.
In addition to the GMM classifier, a DNN classifier was tested as well. The LLR decision threshold in this system was set to be -2.22 to maintain sensitivity of 85.06% as in the GMM classifier. Tables 4 and 5 show the confusion matrices of the test dataset classification for the GMM and DNN models, respectively, in classifying coughs versus non-coughs. Table 4 shows the results after the fusion of the classified noise and snore events.
Since the data were unbalanced (fewer coughs than non-cough sounds), Cohen's kappa was calculated as well. The tables show that the classification results of the DNN outperformed the GMM; for similar sensitivity values, Cohen's kappa values were 0.77 and 0.82 for the GMM and the DNN, respectively.
To assess the generalization of the classifiers, a 5-fold cross-validation was conducted for each classifier (see Table 6). It compares a GMM classifier that was trained to differentiate between coughs, snores and noises (multiclass classification) to a DNN classifier that was trained to differentiate between coughs and non-coughs (binary classification). The results

Analysis of detected coughs during sleep
Based on the participants' PSG results, all the detected coughs (from all datasets) were assigned to their wake/sleep stage (wake/REM/N 1 /N 2 /N 3 , see S2 Table in S1 Dataset) and the number of coughs was normalized according to the sleep stage duration of each subject (cough rate). S3 Table in the S1 Dataset details the duration of each sleep stage for each subject. S4 Table in the S1 Dataset presents the resulted normalized values for each subject in each corresponding sleep stage. Each subject's audio file was analyzed between lights off and lights on (the time annotation of lights off and on in the subject's room). Fig 10 shows the average cough rate for each wake/sleep stage. It indicates that most of the coughs detected during the night occurred in the wake phase and fewer during the sleep stages (p < 0.0001). In addition, there was a significant difference (p < 0.0001) between deep sleep (N 3 ) and the other sleep stages (N 1 , N 2 , & REM) where there was a considerably smaller number of coughs in the deep sleep stage. There was also a significant difference between deep sleep (N 3 ) and light sleep (N 1 & N 2 ) (p < 0.0001), and between the drowsiness stage (N 1 ) and the most frequent stage (N 2 ) (p < 0.0001).
The correlations between the subjects' number of coughs (cough rate, cough frequency) and their demographics, including gender, age, AHI, and BMI are depicted as histograms in Figs 11 and 12. The distribution of coughs for BMI shows that most of the coughs occurred in subjects with higher BMIs (Pearson R = 0.232, p < 0.05). A separate analysis for women and men revealed that women with higher BMIs (obese) had more nocturnal coughs than women with lower BMI values (Pearson R = 0.414, p < 0.05). However, when removing age, the partial correlation between the number of coughs and BMI was not significant, probably because there is a correlation between BMI and age (Pearson R = 0.237, p < 0.05). In general, women coughed significantly more than men (p < 0.05), but there was a similar number of coughs in each AHI range with relatively wide standard errors (Fig 12B). There was a weak positive correlation (Spearman R = 0.278, p < 0.05) between the AHI and the number of coughs in men (no correlation for women).
Notably, age emerged as an important factor. Fig 12A shows that most of the coughs occurred in older subjects (Spearman R = 0.387, p < 0.001). A partial correlation, excluding BMI and gender, also indicated a significant result, with R > 0.3. Furthermore, there were significantly more coughs in subjects older the age of 40 (p < 0.01) compared to the younger subjects (age thresholds of 50y and 60y were similar, p < 0.001 and p < 0.01, respectively). There was also a significant difference between the cough rate of subjects under the age of 25 versus subjects above 60 (p < 0.05), as well as between subjects aged 25-40 and subjects over the age of 60 (p < 0.001).

Cough detection
In this paper, an algorithm for automatic cough sound detection using a non-contact microphone was presented. The detection system was specially designed to detect and analyze nocturnal audio recordings, by taking nocturnal sound events such as snoring, breathing, and choking into account. The cough sound events were detected from whole-night recordings of 89 subjects referred to the sleep laboratory. After sound event detection, a 34-dimension feature vector was extracted from each detected sound event. These features were defined to provide a good acoustic separation between coughs and other nocturnal sound events. Analysis of the skewness and kurtosis values of coughs versus non-cough events confirmed a significant difference between the two (p<0.0001). Feature selection was used to reduce feature space dimensionality. Two types of classifiers were implemented for event classification: GMM and DNN. For comparison purposes, we fed the two classifiers the same feature set. Compared to other studies in the field of cough detection [1,9,27,29] our DNN model obtained the highest accuracy and specificity values (above 99.8%). It also had a higher PPV than most studies, as well as a high sensitivity rate. However, each system was evaluated on a different database, different environments, and equipment. The DNN classifier demonstrated better performance with an accuracy of 99.8% (86.1% sensitivity, 99.8% specificity, and 78.4% PPV). The GMM produced similar accuracy and sensitivity (99.7% and 84.6% respectively); however, the PPV was much lower (70.3%). Note that it is possible to raise the sensitivity to above 90% using different LLR thresholds, but at the expense of the PPV, which adds more false alarms of detected coughs (e.g., a LLR th of -2.8 will result in a sensitivity of 90% and PPV of 67.1% using the DNN classifier). In addition, we also examined a GMM classifier that was trained on two classes, and a DNN classifier that was trained on three classes; however, their performances were inferior. We split the data into train, development (validation), and test using different subjects (patients), so there were no common subjects between the three datasets. The model parameters and the decision threshold were determined using the development dataset, and the test dataset was reserved for the final evaluation. In addition, we evaluated a 5-fold cross-validation to observe the generalization of the system. Similar accuracy, sensitivity, and specificity values were achieved, whereas the PPV value was somewhat lower. The key advantage of this system is that it is relatively simple and fast. Although it is a DNN-based system, it can run on a PC without GPU: a processing time (during the test) of~1min is required for each hour of audio recording (processor: Intel i7-7700 CPU@3.60GHz).
For comparison, in a pilot study, we implemented a CNN-based cough detection system. We used mel-spectrograms (128x109) for each sound event as input to the CNN model. The network was composed of five convolutional layers, three max-pooling layers, two batch normalization layers, and three dense layers. This network totaled 87,795 learnable parameters, which is approximately 6 times more than our DNN model, and required 12 fold additional processing time. The preliminary results on the test dataset showed better PPV values (94.7%), but lower sensitivity values (81.5%). In future work, we plan to further develop and test this system.

Analysis of detected coughs during sleep
The results of the analysis of the clinical data were performed on both the test and all subjects. However, in order to draw more accurate conclusions and for statistical robustness, we chose to present the analysis on the complete large database.
One of the main novelties of this work is the focus on nocturnal audio signals to better understand the manifestation of coughs between lights off and lights on. Correlations were calculated between sleep stages and the relative cough rate. The findings indicated that coughs occurred significantly less often during the sleep stages and more frequently during the wake stage. These results are consistent with previous studies [6,[8][9][10] that reported similar findings; for example [6,9] analyzed 24-hour recordings and showed that coughs were more frequent during the daytime. Furthermore, our analysis showed that most of the coughs that occurred during sleep were less frequent in deep sleep (N 3 ) than in the other sleep stages (REM, N 1 , and N 2 ).
When analyzing the transition between sleep stages when cough occurs, we found that when cough occurred during N2, in 10% of the times the next stage was awake, and in the rest of the time it remained in N2. When cough occurred during N3, in 6% of the times the next stage was N2 (the rest remained in N3). When cough occurred during REM, there was no transition to other sleep stages, except wake (3% of the time). We also examined the cases where cough caused awakenings. We found that in 92.1% of the time, the subject was already awake, in 7.7% of the time the cough occurred during N2, and in 0.2% of the time the cough occurred during REM. Please note, that the sleep stages were annotated for 30-second epochs.
Further statistical analysis of the subjects' demographics indicated that higher BMI values were related to a higher cough rate (Fig 11). Similar results were reported in [18], where a positive correlation was found between the nocturnal cough rate and BMI in Primary Ciliary Dyskinesia (PCD) patients. In general, [7] indicated that the chronic cough risk is twofold to threefold higher in obese individuals in the general population. This may be attributed to excess fat mass in the airways that causes cough production [39]. A significant positive correlation between age and the number of nocturnal coughs emerged here (Fig 12A), as was also found in [40]. Finally, the comparison of the cough rate in terms of gender revealed that women coughed significantly more than men during the night, as was also found in [40] on a database of 100 subjects (65 women), and in [41] on a database of 933 subjects (52.2% women). One possible explanation for this outcome is that women have more episodes of upper airway resistance, which may cause cough episodes [42].
According to [14,43], there is an association between cough and OSA; specifically [43] suggested that upper and lower inflammation and trauma to the upper airways during each apnea-hypo-apnea episode observed in OSA, may contribute to chronic cough. Several studies have investigated the prevalence of OSA in patients with chronic cough; [11] found that 44% of 75 subjects were found to have OSA.
There was no significant correlation between OSA severity and the number of nocturnal coughs in terms of the severity of OSA in women, but there was a weak positive correlation with the severity of OSA in men. This finding may be attributed to the fact that OSA is more prevalent in men [42] and that in our database the AHI mean and variance were higher in men (18.70 ± 15.43 vs. 14.12 ± 9.68 in women). When removing the smokers (16) from the entire database, a weak positive correlation emerged (R = 0.304, p = 0.05). These findings are somewhat similar to [44], which was based on a database of non-smokers. They found that OSA patients (AHI = 53.6 ± 24.7\h of sleep) had a significantly higher incidence of chronic cough than the controls (no OSA).
Our database included subjects with a variety of health conditions who were referred to a polysomnography test. Future research should test larger samples to introduce more variety and draw more pinpointed conclusions.

Conclusion
This article presented a reliable DNN-based automatic nocturnal cough detector using a noncontact microphone. Thirty-one selected features in the time and frequency domains were used to characterize the cough patterns. The classification results confirm that the algorithm can clearly distinguish between coughs and other noises in a sleep environment. The main findings indicated that: 1) cough events were significantly more frequent during wakefulness than during sleep, 2) in deep sleep (N 3 ) the cough rate was significantly lower than the other sleep stages, 3) the number of nocturnal coughs was significantly higher for women, 4) the number of coughs was positively correlated with age and BMI, and 5) the number of coughs in men was positively correlated with OSA severity.
Future work could implement this system to better track the progression of respiratory illnesses, and to test reactions to different medications, especially in these times of a world pandemic.
Supporting information S1 Dataset. S1 Table: Subjects characteristics and number of manually labeled events. This table lists AHI, gender, and BMI, where the values 1 and 2 in the gender column indicate men and women, respectively. It provides the total number of events (cough, noise, snore, and coughs between the lights off/on separately) detected using manual labeling. NA stands for Not Available. S2