User authentication system based on human exhaled breath physics

This work, in a pioneering approach, attempts to build a biometric system that works purely based on the fluid mechanics governing exhaled breath. We test the hypothesis that the structure of turbulence in exhaled human breath can be exploited to build biometric algorithms. This work relies on the idea that the extrathoracic airway is unique for every individual, making the exhaled breath a biomarker. Methods including classical multi-dimensional hypothesis testing approach and machine learning models are employed in building user authentication algorithms, namely user confirmation and user identification. A user confirmation algorithm tries to verify whether a user is the person they claim to be. A user identification algorithm tries to identify a user’s identity with no prior information available. A dataset of exhaled breath time series samples from 94 human subjects was used to evaluate the performance of these algorithms. The user confirmation algorithms performed exceedingly well for the given dataset with over 97% true confirmation rate. The machine learning based algorithm achieved a good true confirmation rate, reiterating our understanding of why machine learning based algorithms typically outperform classical hypothesis test based algorithms. The user identification algorithm performs reasonably well with the provided dataset with over 50% of the users identified as being within two possible suspects. We show surprisingly unique turbulent signatures in the exhaled breath that have not been discovered before. In addition to discussions on a novel biometric system, we make arguments to utilise this idea as a tool to gain insights into the morphometric variation of extrathoracic airway across individuals. Such tools are expected to have future potential in the area of personalised medicines.


Introduction
Human exhaled breath is largely turbulent.During exhalation, air is forced out of the lung through trachea by the contracting diaphragm.To start with, the Reynolds number associated with flow through trachea is sufficiently high.In addition, as the air passes through the trachea, it interacts with the complex internal structures associated with the upper respiratory tract, leading to complexity in the flow.The upper respiratory tract consists of the larynx, the pharynx, and the oral cavity.Owing to the complexity associated with the interaction between air that is already turbulent with to measure fluid flow in the forced oscillations technique applied to the human respiratory system, as a substitute for the pneumotachograph.Other studies reporting the implementation of CT-HWA for measuring expiratory flow parameters are by [17] and [18].[19] showed that CT-HWA can be used as a flow transducer for spirography.In conclusion, HWA is a robust tool for obtaining time-resolved turbulence signature measurements in flows.Most of the work in the literature has taken advantage of the HWA data for flow rate calculations, effectively using it only as an alternative for spirometry-based studies.We propose to use HWA measurements (the complete time series of instantaneous velocity data) of turbulence in human exhaled breath as input signals for the development of a biometric system.
Behavioural biometrics use a person's gestures, such as gait patterns or breathing gestures.Recent work by [1,20] revealed the prospects of exploiting breathing acoustics for user authentication.They built a new behavioural biometric signature called BreathPrint based on audio features acquired from a microphone sensor in smartphones, wearables and other IoT devices.[1] deployed a conventional machine learning model based on the Gaussian mixture model (GMM), while [20] established the feasibility and performance evaluation of RNN-based deep learning models.A novel WiFi-based breathing estimator UbiBreathe developed by [21] works as a respiratory monitoring system based on the received signal strength (RSS) data from a nearby WiFi-enabled device.A continuous user verification system was developed using this approach by [22] for round-the-clock user verification, built based on user-specific respiratory features derived based on waveform morphology analysis and fuzzy wavelet transformation.A deep learning-based scheme also detects the existence of spoofing attacks.[23] developed a speaker recognition system, BreathID based on breath biometrics.Breath during speech is considered trivial or a noise component.They showed that unique breath features can be formulated by a template matching technique for speaker recognition.
In summary, the use of HWA and, more broadly, breath turbulence measurements as a tool for biometric authentication has not been attempted in the literature.Conventional biometric systems such as voice, face, and fingerprint recognition have their own disadvantages.There is a need to develop more sophisticated biometric systems that could make use of internal physiological features of the human body.We attempt to build a novel user authentication system based on human exhaled breath, using the principles of multidimensional hypothesis testing and machine learning.This system is different from an acoustics-based biometric system, since it does not require vocal data from the human subject and is built solely on the fluid dynamic information contained in the exhaled breath.

The experimental dataset and methodology
A measurement-based study was employed to develop algorithms for biometric authentication.Measurements of the exhaled breath were made using a Dantec Dynamics ® 55P11 hot wire probe.It consists of a 5µm diameter, 1.25mm long platinum-coated tungsten wire, which acts as the sensor.A Dantec Dynamics MiniCTA ® 54T42 module housed the CT-HWA's signal processing and output system.The hot wire probe was calibrated using a standard procedure of simultaneous measurement of the flow velocity and the anemometer voltage.The calibration was performed using a Dantec Dynamics StreamLine Pro ® automatic calibrator, between a velocity range of 0 − 5 m/s.Using this procedure, we were able to determine the calibration constants from an assumed velocity-voltage relation.This relation is a least-square polynomial fit of order-4 in the velocity-voltage space as shown in Fig 1 .In the current study, the raw voltage time series was itself used in all the analysis.This helps us avoid frequent re-calibration of the probe.The initial calibration was performed only to make sure that the voltage and velocity signals are monotonically positively correlated (as can be inferred from the least square fit from Fig 1).Participants 94 participants were recruited to take part in this study, following the ethical approval from the Institutional Ethics Committee (IEC) of the Indian Institute of Technology Madras, Chennai, India (IITM -IEC Protocol No. IEC/2018-03/MP/01).The participants were students of the Indian Institute of Technology Madras.Their age ranged from 21 to 27 years.Data were collected only once (one set of 10 breath samples) per participant.Volunteers with epileptic disorder were excluded from participation.The experimental data collection was carried out between 8th and 17th January, 2019.All volunteers who participated in this study had given their written consent.The recorded time series data were analyzed anonymously.

Data collection and analysis
A schematic of the experimental setup is shown in Fig 2A .It consists of a mouthpiece assembled into an aluminium circular cross-section channel which housed the hot-wire probe aligned to its axis to measure the streamwise component of the turbulent exhaled flow velocity.The human subjects were allowed to exhale through their mouths into the experimental measurement setup.The nose was clipped during data recording to ensure that all the exhaled air passes through the oral cavity before entering the experimental setup.Each human subject was provided with a new disposable plastic mouth-piece to wrap their mouth around, through which the subjects exhaled.The obstruction of the tongue to the flow was avoided by placing the mouth-piece above the tongue.Data were obtained in each exhalation trial lasting about 1.5 seconds, with 10 trials recorded per subject.Each time series was recorded by sampling the voltage response at 10kHz.This effectively gives us 15000 data points in a time series, the relevance of which would be discussed in the following sections

Statistical description of the time series
In general, a statistical description would involve the representation of time series distributions in terms of the central moments.Such representative measures tend to vary within a non-stationary time series.They can be characterized by studying how these moments depend on time intervals within the time series itself, by investigating the scaling properties of the signal.For instance, the Hurst exponent, H ( [24]) parametrizes the effect of the statistics of time intervals on the standard deviation of the time signal.In the context of multifractal analysis, the generalized Hurst exponent H(q) is used for parameterization, where q is the order of the fluctuation function ( [25]).H(q) is also known as the q−order Hurst exponent.In our study, we focus on the multifractal properties of the time series, since interestingly, human exhaled breath has been found to display multifractality, based on our analysis which will be discussed in this section.Fully developed turbulence is known to exhibit multifractality, as described by [26].
The multifractal nature of exhaled breath signals were investigated using the well-known technique called multifractal detrended fluctuation analysis (MFDFA) developed by [25].It helps us to identify multifractal scaling properties as well as to detect long-range correlations in a time series.A detailed explanation of the theory behind this algorithm can be found in the original work by [25].A step-by-step implementation of the MFDFA program using Matlab ® was given by [27].We made use of the recommendations from the [25] and [27] to write a Python program to perform the MFDFA on exhaled breath time signals.Briefly, the algorithm involves dividing the time series data into time intervals of equal length, then applying detrended fluctuation analysis (DFA) ( [28]) to each time interval to remove the trend and then calculating the fluctuation function F. Next, the q−order fluctuation function F(q) is obtained by raising the detrended fluctuation function to the power of q.The q−order Hurst exponent H(q) is obtained from the scaling behavior of F(q).Then, the algorithm involves estimating the q−order mass exponents τ (q) from q−order Hurst exponent H(q), converting them into the q−order singularity exponents α, and then computing the generalized singularity dimensions, also known as the singularity spectrum f (α).
In the context of multifractal analysis, a measure of complexity of a time series is its singularity spectrum f (α), which characterizes the distribution of fractal dimensions or scaling exponents α across different parts of the signal.While conventional DFA ( [28]) quantifies the average correlation properties of a signal purely through the scaling exponent α, MFDFA provides another important measure.The width of the multifractal spectrum ω (see Fig 5), which indicates the richness of multifractality present in the experimental data adds further insight into the data.Third-order polynomial fits were used to detrend data in each time interval.The time interval (window) sizes range between 10 and N/4 data points, where N is the length of the time series.The orders q of fluctuation function ranges from −5 to 5. It is to be noted that the input time series to the analysis was first normalized, which is discussed in a subsequent section.The chosen normalization method does not alter the compact support of the input time series, as it is essential that a time series with compact support is necessary for reliable multifractal analysis.The Kolmogorov-Smirnov test for normality ( [29]) revealed that a large fraction of the available breath signals were non-Gaussian.Deviations from a Gaussian or symmetric distribution may be a sign of multifractality stemming from a broad probability distribution function (PDF) as described by [25].Shuffling the time series helps us in discovering the reason for multifractality in this case.By randomly permuting the order of values in the time series, temporal correlations are disrupted while preserving the PDF.If the multifractality persists in the shuffled or surrogate data, it suggests that the broad PDF is the primary source of multifractality.Conversely, if the multifractality disappears in the shuffled data, it indicates multifractality due to inherent long-range temporal correlations.Fig 3D is a plot of the singularity spectral function f (α) against the singularity strength α, resulting from the MFDFA on the time series from 3A and 3B.The plot consists of two representative multifractal spectra -one for the exhaled breath time series and the other corresponding to the same time series shuffled, which becomes a white noise.The white noise signal was observed to form only a tiny arc clustered around α = 0.5, while the multifractal breath signal forms a well-defined spectrum.This observation is evidence of the presence of long-range correlations in the breath time signal.Any memory of the correlations (strong or weak) within the time series is lost when randomly shuffled.The inset plot in Fig 3D shows a magnified view of the spectrum from the white noise signal.It can be inferred from this observation that the white noise signal does not show any degree of multifractality and also reconfirms that the multifractality of exhaled velocity is defined by its inherent long-range correlation properties, both for short-and long-range fluctuations.The multifractal analysis was made use in the time series segmentation and feature extraction, which will be discussed in the following sections.

Time series segmentation, normalization and selection
Segmentation of time series is a standard practice in many data analysis techniques to obtain dividing points on a signal with or without stationarity.In machine learning problems with limited availability of time series samples, segmentation is of vital importance.By performing an efficient segmentation on the basis of certain statistical measures, we can obtain sufficient number of samples to train and test machine learning models.Fig 2B is a plot showing the instantaneous voltage response from the hot wire probe for 1.5 seconds.It was obtained by sampling at a frequency of 10 kHz, giving us a sufficiently resolved long series to perform segmentation without losing any significant information on the flow physics.
We will now discuss the segmentation process.Each time signal was divided into 19 overlapping segments using a window size equal to 1/10th the length of the signal and a sliding width of half the segment size.Remember the machine learning models may tend to overfit the training data when there are large number of overlapping segments.The purpose of using overlapping windows was to capture the end effects of the time series segments during feature extraction.So, the chosen segment width and sliding width are justified as each part of the time signal appears only in two segments.This effectively gives 1500 data points to each segment making it sufficiently long to reliably extract features using tools discussed this manuscript.As a result, a maximum of 190 representative time blocks become available for the analysis for each user.Each of the time signals were normalized before feature extraction, making the time series comparable across realizations.This would also make all signals independent of the sensor being used for the measurement, since these features only rely on the temporal correlation structure in the series and not on the raw data values.This approach can be termed as being sensor-agnostic.Regardless of whether the time series signal is measured using a hot-wire/film probe, or a laser-based technique, the performance of the algorithm will not be affected, as long as there are sufficient data points to properly capture the temporal structure in the flow.We then build an algorithm which works with these features which are invariant to the absolute value of the time series.z-score normalization which is popularly known as standardization was used to normalize the time series.To perform z-score normalization, the mean of the entire time series is subtracted from each data point in the time series.Then, the resulting values are divided by the standard deviation of the time series.This scales the data so that it has a mean of zero and a standard deviation of one.The resulting normalized time series will have values that represent the number of standard deviations away from the mean.The z-score normalization has the form shown in Eq 1.
where z(i) is the normalized time series, x(i) is the original time series of length N , (µ t ) is the mean of the time series, and (σ) is the standard deviation of the time series.The time series becomes unitless after normalization.MFDFA was performed on all normalised time series, and it revealed that not all spectra exhibit the expected shape.The general shape of a multifractal spectrum is convex or more precisely an inverted parabola, with the peak occurring at the central moment.This convex shape signifies the presence of multifractal scaling, indicating that different parts of the time series exhibit distinct scaling behaviors.Certain time segments were observed to result in a spectrum with folds or distortions.There could be several reasons for the appearance of folds in the multifractal spectrum.(i) They could occur due to irregularities or data artifacts in the time series itself, such as noises, outliers, etc. which may arise due to inconsistent exhalation by the user during data acquisition.For example, during the period of 1.5 seconds, if the user exhales abruptly for the first 1 second of the trial, and then the breath velocity steadily decays for the remaining 0.5 seconds.The segment which falls between these two regions might contain irregularities within it.Such irregularities could introduce inconsistencies in the scaling behaviour.(ii) The spectrum may be affected by the non-stationarity of the time series, which is when the statistical properties change with time, such as due to change in breath velocity.(iii) Spectral folds might even arise due to the finite size of the time segment.Limited number of data points may not capture the scaling properties at different scales.Investigating the type of distortions or the reason behind this behaviour of the spectrum for certain time segments fell outside the scope of this work.Instead, we made use of this behaviour as an indicator to judge whether a segment is valid or not.All segments which showed non-convex singularity spectra were discarded in our analysis.Also, the segments which produce a spectral width less than 0.05 were rejected, since they exhibit a very low degree of multifractality.These two strategies effectively make MFDFA a tool for time series selection, for further feature extraction and analysis.Any time signal which contains significant number of segments with inconsistent scaling behaviour can be rejected using this tool during the data recording step itself.A numerical example discussing how a multifractal singularity spectrum can have non-convex shapes can be found in [30].

Feature extraction
Features were extracted from normalized time signals using various time series feature extraction techniques.Unlike other physiological biometric systems where image-based patterns or features are used as templates to match an individual's identity, our input data is a time series from an individual which requires feature extraction.Several features of the time series were studied in order to develop insights into the data.The multifractal spectral information was incorporated into our analysis by including them in the set of features.The fact that the time series contains information pertaining to the correlation structure becomes relevant to machine learning algorithms.In keeping with this principle, we extract a set of three important features from the spectrum: (i) β, the abscissa corresponding to the spectral maxima, (ii) ω, the width of the spectrum, and (iii) ϵ, the bias or asymmetry parameter of the spectrum.The parameters β, ω and ϵ are dimensionless.These features are visualised on the multifractal spectrum of an exhaled breath time signal in Fig 5 .It was also noted from our analysis that the spectra showed clear differences in their temporal structure; i.e., parameters such as β, ω and ϵ were different for different time signals.Several other multifractal spectral features have also been considered in the literature ( [31], [30], [32]).We chose these three features for simplicity and also they encompass most important descriptions of a multifractal spectrum.Investigating how unique these features behave is of interest to this work.In addition to the use of MFDFA as a feature extraction algorithm, we also use of an automated time series feature extraction algorithm named tsfresh (Time Series FeatuRe Extraction on the basis of Scalable Hypothesis tests) developed by [33].The tool generates over 700 time series features using 63 different time series characterization methods.The following discussion pertains to the preparation of dataset for model building, training and testing of the algorithms.A consolidated pipeline of the algorithm towards model library building including time series normalization, and selection, followed by feature extraction and reduction, is shown in Fig 6.
Features extracted by these algorithms from all available time series are concatenated and passed through a low-variance filter.This was done to remove those feature columns with a variance value below a given threshold, which in our case was 1%.The rationale behind applying this low variance filter was to eliminate features that exhibit very little variation across instances.Such low-variance features may not provide useful insights for classification tasks.Furthermore, highly correlated features were removed from the feature set.A correlation threshold of 80% was chosen for this purpose.Removing features by these techniques reduce the dimensionality, simplifies the model, and potentially improves model performance by focusing on more informative features.All features which were derived from the absolute values of the time series, such as maximum/minimum values, quantile information, etc., were disregarded.For example, inclusion of mean value of a signal will bias the algorithms and allow them to classify on the basis of the mean values itself, which was undesired.It was observed that different human subjects were able to exhale in different velocity bands depending on their lung capacity.The filtered feature matrix thus obtained is a stack of vectors from each time series sample available, and it consisted of approximately 450 time series features.This January 8, 2024 11/31 feature space is high dimensional and may contain redundant features that can be excluded.The reduced feature set will also reduce the computational complexity of the algorithms.We adopted a feature selection method using binary random forest classifiers.Binary classifiers were built on pairwise combinations of the users' feature datasets.The importance of the features can be quantified for every random forest binary classifier by estimating how much the random forest's performance would suffer if a given feature were to be eliminated.This impurity-based feature importance developed by [34] was used for picking the top features.The top 10 most prevalent features among users were chosen as the feature space after computing the top 10 features from each classifier.In the later sections of this manuscript, the methods used for model construction and the physical insights of these features will be described.The reduced feature matrix thus obtained contains features of all the users in the database.

Building of model library
We have formulated the multi-class classification problem into a series of binary classification problems.Several studies have explored the application of pairwise binary classifiers for handling multi-class problems.A description of class binarisation and round robin classification can be found in [35].[36], [37], and [38] are few others who have studied class binarisation for multi-class classification.In order to perform tests with a machine learning based algorithm, it was necessary to build binary classifier models using binary combinations of the training datasets and these models were stored in a model library.Computational simulations were setup to evaluate the performance of the user confirmation and identification algorithms.Let us briefly see how the model library grows with the addition of users to the existing database of users.This is known as enrollment mode of the biometric system.Say, there are n disjointed users U 1 , U 2 , . . .U n in the current state of the users' database.n C 2 binary classifier models can be built, which makes up the complete model library.With the addition of a user, the updated size of the users' database becomes n + 1.Therefore, the size of the model library increases by n and becomes n+1 C 2 .This growth can be expressed as This means that when a new user is added to the users' database, n additional binary classifier models are to be built and stored in the model library.Expectedly, this follows a second-order power-law variation of the form y = ax m with the multiplication factor a ≈ 0.5 and exponent m ≈ 2.

User confirmation algorithms
Two different user confirmation algorithms were built using the extracted feature data.
The first approach was based on statistical hypothesis testing, which involves the testing of a null hypothesis against an alternative hypothesis.The second approach was based on machine learning models.In case of a machine learning based algorithm development, the training data were used to build random forest binary classifier models, thereby creating a library of models.In the case of the hypothesis testing based algorithm, model building process is redundant, and the predictions are made based on the hypothesis test results between a user's test data and available training data, making it an instance-based algorithm.These algorithms will be referred to as UCA.HT (User Confirmation Algorithm -Hypothesis Testing) and UCA.ML (User Confirmation Algorithm -Machine Learning) in later sections.The Hotelling's T 2 test ( [39]) was used in UCA.HT, which is a multidimensional version of the Student's t-test.

Confirmation algorithm based on hypothesis testing
The use of hypothesis testing as an instance-based binary classifier has been attempted in the literature.[40] compared the machine learning approach and the statistical testing based on p−variations; and the idea of instance-based classification by hypothesis testing was investigated by [41].[42] provided a detailed description on how binary decision problems can be formulated as hypothesis testing and/or binary classification.In a system based on hypothesis testing, the library comprises of the training datasets of all the users.Since we are building an algorithm which is intended to work alongside a machine learning algorithm, we formulate the hypothesis test based algorithm to work on binary pairs of users.To be more precise, the library will comprise of training datasets of pairs of users.It will be referred to as user-pair data in further discussions.Fig 7 shows a flow chart of the user confirmation algorithm which is based on hypothesis testing principles.The equality-of-means test was performed between a test data and each training data in pairs present in the library to infer whether the null hypothesis is to be rejected or not, as depicted in Fig 7 .Here, the null hypothesis states that the two samples come from the same distribution (H 0 : µ a = µ b ), and the alternate hypothesis states that the samples come from different distributions (H 1 : µ a ̸ = µ b ).A detailed description on the test statistic and formulation of the Hotelling's T 2 test can be found in the original work by [39].When a test user, say 'User i' was to provide the input, the pairwise Hotelling's T 2 tests are performed between the test user's data and the training data of n − 1 pairs of users which include 'User i', where n is the number of users in the database.Let us look at one of those tests as shown inside dotted box in Fig 7 .By performing a hypothesis test against a user-pair, for example, (1, 2), we get a pair of p−values, (p 1 , p 2 ).The tests were performed with a confidence level of 99.9%, and therefore a p−value of 0.001 or less was sufficient to reject the null hypothesis.At least one of the two p−values need to be above 0.001 for the algorithm to accept the null hypothesis.The predicted user is then the user corresponding to a higher p−value.If both p−values are either equal to or below 0.001, no predictions were made.After the test, the predictions made here are reposed as an answer to the question "Is it User i? (Yes/No)".The pipeline discussed so far becomes the 'User Confirmation Block -HT' for the hypothesis testing based algorithm.The output of this block is a scalar v which is equal to the count of model predictions which says 'Yes'.Here, a threshold of 50% of the predictions was used for defining the minimum confidence of confirmation.This means that HT(i, i) accepts the null hypothesis and HT(i, j) ∀j = 1, 2, . . .n and i ̸ = j rejects the null hypothesis in at least 50% of the cases.Then, the User i is so confirmed.Here, HT(i, j) stands for hypothesis test between a User i and User j.
The equality-of-means test can actually be viewed from two perspectives: (a) Testing the distribution of test data against the distribution of n training data; (b) Testing the distribution of test data against the distribution of training data in pairs as discussed so far.The former strategy produces n test results and the algorithm would face one of three scenarios: (i) If only one test accepts the null hypothesis, the user identity is The user confirmation block will be made use in the user identification algorithm later in this manuscript.An example of the hypothesis test against user-pair is illustrated inside the dotted box, directed from the user confirmation block by the red asterisk.Given a user i, the user confirmation block's output was reposed to answer the question "Are you indeed User i?" based on a threshold.presumed to be of the user corresponding to that particular test; (ii) If more than one tests accept the null hypothesis, the user corresponding to the test which corresponds to highest p-value is presumed to be of the user identity (predicted user).In either case, if the predicted user matches with the test user, the user is confirmed, otherwise not; (iii) If all tests reject or no test rejects the null hypothesis, then the user is not confirmed.Although the former case (procedure (a)) is a computationally simpler formulation, the latter case (procedure (b)) becomes more relevant in our study since we are trying to build a multi-model approach for user identification.It was also noted that the latter approach gave better confirmation results (for UCA.HT) compared to the former approach.

Confirmation algorithm based on machine learning
Following the discussions from subsection titled Building of model library, generating n C 2 binary classifiers is necessary to handle the multiclass problem.Let us have a detailed discussion on the model building procedure and the choice of the binary classifier.We required a detailed analysis since the choice of a classifier depends on the specific characteristics of the dataset and the multiclass problem at hand.The training dataset was used to construct binary classifier models for each user-pair.Decision tree (DT), random forest (RF), support vector machine (SVM), logistic regression (LR), Gaussian naive Bayes (GNB), and multi-layer perceptron (MLP) were chosen as the candidate binary classifier models.The machine learning models employed in our study are discriminative models except for one, the GNB.Discriminative models do not try to identify the distribution that generates the data; instead, they try to find out the features that separate classes from each other ( [43]).
Robustness of model parameters and selection of the best model is very crucial to the performance of a user authentication algorithm.Optimal tuning improves the generalizability of each machine learning model.A generic algorithm for hyperparameter tuning and model selection is illustrated in Fig 8 .The training data for a user-pair (i, j) is normalised initially using the training set's mean (µ ij ) and standard deviation (σ ij ).µ ij and σ ij should be stored in the memory as it is required for scaling the test data when required.Hence, it can be combined into a function called standard scaling function s(µ ij , σ ij ) for later use.This normalised training data is now used for tuning and training the best model.It is generally challenging to know the values of the model parameters for a given machine learning model on a dataset.Therefore, we have employed an iterative search cross-validation scheme to compare different sets of hyperparameter values for each model.A stratified k-fold cross-validation technique with hyperparameter tuning was employed for evaluation and selection of the model parameters.The number (k) of folds was chosen to be 5. Parameters from a hyperparameter search space were fed into the cross-validation algorithm, where the training data was split into k equally sized folds maintaining the same target class distribution in each fold as the original dataset.This will make sure that there is no class imbalance in each of the k folds.The iterative search from the search space was performed either using Bayesian optimisation based search ( [44], [45]) or by grid search depending on the size of the search space of a classifier model.The Bayesian search method employs a probabilistic model of the search space to choose a hyperparameter configuration.By combining exploration (experimenting with new configurations) and exploitation (utilizing knowledge from previous iterations), it effectively explores the hyperparameter space and identifies promising regions in the hyperparameter space, as described by [46].The grid search can be described as an exhaustive exploration method which tests all the combinations of the search space ( [47]).Bayesian search was employed when the search space was large, whereas the grid search was employed for smaller search space where the method of brute force was computationally affordable.During an iteration, for a selected hyperparameter configuration, the model was trained and evaluated k times, each time using a different fold as the validation set and the remaining k − 1 folds as the training set.The set of parameters which yielded the best cross-validation score were selected, and the model was retrained on the normalised training data based on the selected parameters.The standard scaling function and the classifier model are combined into a model pipeline {s(µ ij , σ ij ), m ij }, where m ij is the classifier built corresponding to user-pair (i, j).The pipeline was create to make sure that the test data when introduced should be scaled using the training set's mean and standard deviation before predictions are made.
The above procedure was performed for all the candidate binary classifiers.This procedure helped in effectively reducing the size of the feature set from ≈ 450 Fig 8 .Model library building procedure.Flow chart showing the model library building procedure for a user-pair (i, j), where i = 1, 2, . . .n; j = 1, 2, . . .n; n is the total number of users.Note that model m ij ≡ m ji , and therefore, only model m ij are built and stored in the library.The abbreviations stand for the following: CV -cross validation, DT -decision tree, RF -random forest, SVM -support vector machine, LRlogistic regression, GNB -Gaussian naive Bayes, MLP -multi-layer perceptron.s(µ ij , σ ij ) is the standard scaling function, µ ij and σ ij are the mean and standard deviation respectively of the training data of users i and j combined.dimensions to 10 dimensions using random forest classifiers as briefed in subsection titled Feature extraction.The number of trees/estimators were tuned.More trees are generally required for the purpose of refined variable importance estimations, as noted by [48].The rule of splitting was tuned by controlling the maximum depth of a tree, minimum number of samples required to split an internal node, and minimum number of samples required to be at a leaf node.It was important to Fig out the best model based on their performance.This will allow us to build an efficient library of best estimators for the given training data.Each of the n C 2 models underwent a hyperparameter tuning before being fit to the training data.This way it was ensured that each model is generalized for the corresponding user-pair's data.It is important to note here than any model with a cross-validation score of 60% or less was discarded.This was done to ensure that the models which were saved in the library are far from a random model.Hence, by storing only the models with a cross-validation score above 60%, we make sure that the overall algorithm performs between reasonably well to good, based on the generalisation of the models built.
Selection of the best binary classifier model for a given user-pair can be made by picking the model with the highest cross-validation score.This technique can be called as best-of-all model selection technique.Fig 9A shows the percent proportion of different models in a library which was built using this procedure.We observed that MLPs were the most frequently occurring best classifier based on the highest cross-validation score.The second frequently occurring one being the RF, followed by SVM, LR, DT and GNB.The process of building all 6 models and choosing the best one every time is computationally very expensive, So, one out of the 6 classifiers had to be chosen for testing the algorithms.The information from Fig 9A is not sufficient to make this decision, since a good cross-validation score does not always promise a good performance on test data since the classifiers could also be overfitting the training dataset in certain cases.Pairwise user test data was used to test each candidate classifier and the results are visualised as a box and whiskers plot in Fig 9B .The orange line inside the boxes represents the median of the test score.It it clear that DT and GNB classifiers perform poorer than the others, and, RF, SVM, LR and MLP have very similar performance on the test data.All the models have produced test accuracies ranging from very low values below 0.5 to 1. Looking at the outliers data points, we can see that SVM does very poorly as the accuracy even goes below 0.2, and in fact LR and MLP too have produced accuracies below 0.2.The RF classifiers and GNB have similar lower bound of test accuracy.In order to get a better understanding on how these models fit the training data, we can visualise the decision boundaries in a 2D feature space.Since we are already working on a reduced feature space of 10 dimensions, choosing any 2 dimensions out of it and building models for the purpose of visualisation seems appropriate here.The (β, ω) space was chosen for visualisation.To generate the 2D decision boundaries, a structured synthetic dataset was generated which filled up the two-dimensional feature space within the given bounds.The decision regions are obtained based on the predictions made on each data point from the synthetic dataset.These boundaries are visualised for comparison in Figs 10A−10R for three randomly chosen user-pairs.Given a user i, the user confirmation block's output was reposed to answer the question "Are you indeed User i?" based on a threshold.
data is given as input, say User j, the algorithm runs the user confirmation block by considering all the users in the database as trial users.This effectively is equal to running through all the n C 2 models present in the library, but in batches of trial users, User i, where i = 1, 2, 3, . . .n.The output of this pipeline is a vector V of size (1, n) with each element v i being a result of the corresponding trial confirmation test.The identified user from this algorithm will be the trial user corresponding to the maximum value in the vector V .When more than one confirmation trial results in the maximum prediction value (two elements of V having the maximum value), the algorithm does not identify any user.The user identification algorithm is made generic, which means that any user confirmation algorithm (instance-based or model-based) can be used within this algorithm and the output of this algorithm will be the vector V containing the count of predictions.This allows us to build a multi-modal approach for user identification where multiple identification results can be combined using a weighted sum.This is similar to a classical black board architecture where results from multiple expert units are combined.We will now present a brief discussion on this approach.Let us call the outputs from a hypothesis test based and machine learning based user identification algorithms as V HT and V ML respectively.We can take a weighted sum of these two vectors to get a new vector V ′ which will have the advantages of both the algorithms as shown in Eq 3.
where, w 1 and w 2 are the weights associated with hypothesis test based algorithm and the machine learning based algorithm, respectively.The weights can take values between 0 ≤ w ≤ 1 and sum of the weights should always sum up to 1.This approach can be generalised for a combination of multiple user identification algorithms as shown in Eq 4.

User confirmation system
Confirmation tests were performed for all users (n) available in the database.Each set of confirmation tests were repeated 66 by shuffling training and test data split-up.The results of the algorithm from each of these trials can be interpreted as follows: number of confirmed users denoted by c, and number of unconfirmed users denoted by u.In order to quantify the performance of the algorithms, we define a metric called the true confirmation rate (TCR) which is a ratio of the confirmed users and total number of users as shown in Eq 5.
The confidence of confirmation (η) for a user confirmation algorithm is the percentage prediction of the favourable user during a confirmation test.It directly quantifies how confident the algorithm is while attempting to confirm a user i.It can be defined as, where, v i is the favourable user predictions as seen in Figs 7 and 11, i.e., the total number of model predictions that matches the user that the algorithm is attempting to confirm, and n is the total number of users in the database.The value of η i ∀i = 1, 2, . . .n has to pass a threshold confidence of confirmation, say η t , for a user to be confirmed.A comparison of the histogram of η i is shown in Fig 13 for one trial of n confirmation tests.The study revealed that the machine learning based algorithm performs better than the hypothesis testing based algorithm.This validates the ability of a random forest classifier to capture the decision boundary better, when compared to its hypothesis testing based counterpart.For the UCA.HT, the TCR was 50 ± 9.6%, whereas, for the UCA.ML, the TCR was 97 ± 2.5%.This implies that almost every user was able to pass the threshold of 50% in the machine learning based algorithm.This Comparison of the confidence of confirmation η i .Histograms of confidence of confirmation η i compared between (A) a machine learning based approach (random forest classifiers) and (B) a hypothesis testing based classification approach, for one trial of n confirmation tests.In the example shown here, the predictions from ML classifiers give a range of η i values distributed between ≈ 38% to 100%, whereas the predictions from HT based classifiers produce η i values only close to 0% and 100%.
signifies that the algorithm achieves a greater level of confidence while confirming a user using UCS.ML.
We shall now investigate why the machine learning based classification algorithm performs better in comparison with a hypothesis test based classification.In the case of hypothesis testing, we know that the rejection of null hypothesis is based on the confidence level chosen.The confidence level can be visualized as a demarcating hyper-surface between two n-dimensional normal distributions.For simplicity, let us have a look at the decision boundaries captured by the random forest classifier and the hypothesis test based classifier in a chosen two dimensional feature space.For the purpose of visualisation of a hypothesis test based classifier's decision boundary, z−tests were performed in each dimension separately, for every data point from the synthetic dataset against one of the user's training data.The tests were performed under the null hypothesis that the data point belongs to the distribution of the training data, under a confidence level of 99.9%.The overall null hypothesis is accepted only if the null hypothesis in both the dimensions are accepted.Comparing the decision boundaries captured by a hypothesis test based algorithm and a random forest model for the same pair of users, one can observe that the random forest model has the ability to capture a more complex decision boundary between two user classes.This lets the random forest classifier to achieve a test data accuracy of 90.9%, whereas the hypothesis testing based classifier achieves only 73.9%.Now that we have established that the machine learning based algorithm is better than the hypothesis test based algorithm for user confirmation, we will now investigate how these two algorithms perform for user identification in the following section.

User identification system
The identification algorithm discussed in Fig 12 shows that we obtain a vector V of favourable user predictions.Based on the values of vector V j with j = 1, 2, 3, . . .n, we can obtain the following outcomes: • True positives (t) -Number of users who were identified correctly.
• False positives (f ) -Number of users who were identified incorrectly.
• Not identified (h) -Number of users who the algorithm was unable to identify.
We shall define the following performance metrics to evaluate the user identification algorithm: 1. Precision (P) or Positive Predictive Value (PPV), which quantifies the percentage of users who were identified correctly among all the identified users.
This parameter quantifies the probability of correct predictions given a judgement (identification) by the algorithm.
2. Accuracy (E), which quantifies the percentage of users who were identified correctly among all the users n.
The precision and accuracy values computed using Eq 7 and Eq 8 respectively, were 35 ± 10.5% and 29 ± 9.1% respectively, for the hypothesis test based algorithm.The results reported in this section are in the format 'µ p ± 2σ p ' where µ p and σ p are mean and standard deviation of the performance metrics respectively.For the random forest based algorithm, we were able to observe precision and accuracy values of 26 ± 7.2% and 22 ± 6.4% respectively.These values were computed on the basis of the maximum votes received by a user among n confirmation trials, as described previously in Fig 12 .When we combine the results from both the algorithms using Eq 3 with w 1 = 0.3 and w 2 = 0.7, we get precision and accuracy values of 32 ± 8.5% and 31 ± 8.5% respectively.Note that the values reported here are also influenced by the threshold η t which in this case was set to 55%.The parameters w 1 , w 2 , and η t can be tweaked to make the algorithm behave on both extremes -(i) to be very liberal (low precision, low accuracy); (ii) to be very conservative (high precision, low accuracy).Taking the example of a particular trial with n = 94, for a weights setting of w 1 = 0.3 and w 2 = 0.7, η t = 50% produces the outcomes (t, f, h) = (31,58,5), giving a precision of 34.8% and accuracy of 33.0%.For the same weights, η t = 96% produces the outcomes (t, f, h) = (18,6,70), giving a precision of 75.0% and accuracy of 19.1%.The former case allows for a lot of false positives by making judgements on most of the instances, whereas the latter case of the algorithm makes judgements stringently.
With the right set of hyperparameters (w 1 , w 2 , . . .w r (in the general case from Eq 4) and η t ), a multi-modal approach is expected to improve the robustness of the overall algorithm.If one classifier produces incorrect predictions for certain trials, other classifiers in the ensemble can compensate for it and provide correct predictions.The contribution of each algorithm can be controlled by the weights.This robustness helps in improving the generalization of the ensemble model.The following discussion is based on results produced from this combined algorithm.We know that the highest voted user becomes the identified user from the algorithm.Based on the 66 shuffle trials, we have the following understanding of the user database.21.3% to 42.6% of the users can be correctly identified by them being the highest voted users, 39.4% to 57.4% of the users can be correctly identified as at least the second highest voted users, and 50.0% to 66.0% of the users can be correctly identified as at least the third highest voted users.This is remarkable given that it is the first attempt in the literature to classify and uniquely identify individuals based solely on the fluid physics of the exhaled breath.We believe that this is conclusive evidence that the fluid dynamic structure of the exhaled breath contains uniquely identifiable information.
This algorithm holds tremendous potential for future use in the area of personalised medicine and also as a novel way to store biological data.This can be achieved by careful model selection and generalisation of classifier models.Advanced models such as deep neural networks can be made use to enhance the multi-model approach discussed in this manuscript.

Physical insights: Understanding the defining features
In order to make a physics-based argument for the uniqueness of human exhalation, it is important to investigate the physical significance of the most important features that result in robust classification.These would be the set of features or attributes which inherently differentiate the classes for a given training data.As we have seen, the importance of the features were quantified for every random forest binary classifier for choosing a reduced feature set in subsection titled Feature extraction.These features are to be investigated to understand their physical meaning in the context of the current problem in hand.A description of the most important classifying features (in the decreasing order of importance) are as follows.
1.The singularity strength or Hölder exponent corresponding to the maximum (β) of the multifractal spectrum of the exhaled breath time series: This is a feature extracted using the MFDFA.β explains the long range correlation present in the time series.A low value indicates that the underlying process becomes correlated and loses fine structure, becoming more regular in appearance ( [25]).This, in our case, would relate to the organised motion of vortical structures in the turbulent exhaled air flow.For some subjects the vorticity pattern might be more irregular than the others, which could be attributed to the extrathoracic morphology.
2. The sum over the absolute value of consecutive changes in the velocity time series: This is a measure of similarity between consecutive time blocks, and this metric describes the existence of mean reverse if present.Such a characteristic can infer the existence of vortical structures in the exhaled flow being unique for every individual, allowing them to be classified by the algorithm.
3. Third coefficient of the autoregressive AR(r) model with order parameter r = 10: The parameter r is the maximum lag of the autoregressive process.The AR model generally predicts future behavior based on past data.The importance of the second as well as fourth coefficients show that there is some correlation between successive values in the time series for most of the users.
4. The number of peaks in the time series with a support (s) of at least 1: A peak of support s is defined as a sub-sequence in the time series where a value occurs that is greater than its s neighbors to the left and to the right.When s is set to 1, this feature computes the number of peaks in the time series where a value is greater than its immediate neighbors.This feature can provide insights into the presence or intensity of localised fluctuations in the flow.
5. The number of different continuous wavelet transform (CWT) peaks present in the signal for smoothing width of 1: This feature was extracted from the time series by applying CWT using Ricker wavelet with width, w = 1.This method simultaneously evaluates the signal in the temporal and frequency domains.The number of distinct peaks identified across the width scales considered can be quantified using these features.It can be used to compare the signals based on their peak characteristics.
6.The value of partial autocorrelation function at a lag of 3: The partial autocorrelation is a statistical measure that quantifies the linear relationship between a time series variable and its lagged values.In the context of our exhaled breath flow, the partial autocorrelation can provide insights into the temporal dependence and correlation structure of the breath velocity.This means that this feature can be useful in understanding the persistence or memory of the signal.It suggests that a strong linear relationship between the current flow state and its state 3 time steps ago have been important for the classification of human subjects.
7. Width of the multifractal spectrum (ω) of the exhaled breath time series: ω describes the richness of the multifractality present in the time series, i.e., wider the range of singularity strength, richer the structure of the signal.The spectral width can implicitly represent the intensity or the level of turbulence present in the flow of exhaled breath.Turbulence is characterized by fluctuations in velocity at different scales.A wider range of turbulence scales is reflected by a wider spectral width, indicating a more turbulent flow.This might be attributed to factors such as extrathoracic constriction, or increased turbulence due to specific breath patterns or breath dynamics.
8. Fourth coefficient of the autoregressive AR(r) model with order parameter r = 10.
9. The number of different continuous wavelet transform (CWT) peaks present in the signal for smoothing width of 5.
10. Kurtosis of the velocity time series calculated with the adjusted Fisher-Pearson standardized moment coefficient, g2: We know that Kurtosis is a higher-order statistical attribute of velocity signals.The heaviness of the tails of the probability density functions of normalized time series could be distinct for each user.This feature will help us in assessing the degree of deviation from the Gaussian distribution and provides evidence of skewed behaviour of the time series.

Computational complexity of the algorithm
Run-time of an algorithm is an extremely important factor for a real-time biometric system.It was generally observed that the size of the input feature set affects the amount of computational resources required to run an algorithm.It was observed that the hypothesis test based algorithm performs predictions faster than the machine learning based algorithm which is because the former is an instance-based classifier.Since the user identification algorithm depends on the number of users and in turn the number of models in the model library, the identification time per user was expected to scale up with the size of the library.The identification time was observed to show a linear relationship with the size of the library (of the form y = ax, with slope a ≈ 1) as seen in the Fig 15 .The error bars show 95% confidence interval at every data point.One of the advantages of building an algorithm which uses n C 2 binary classifiers instead of a single multi-class classifier is that it is massively parallelisable.As long as we have sufficient number of cores to run model loading and prediction, the parallelisation is possible.This significantly improves the computational time by several orders.Plot showing the linear relation of user identification time with the growth of model library.This is applicable to the ML based algorithms which include building of binary classifier models (also known as enrollment in the context of biometrics).The error bars show 95% confidence interval at every data point.

Conclusion
We have provided evidence for the feasibility of a novel biometric system that works based on the turbulence information present in human exhaled breath.The use of a hot-wire anemometer for data acquisition allowed us to build a compact working setup.The faster response time of a constant temperature hot wire anemometer and the real-time computation in combination will possibly make the setup implementable as a biometric authentication system.Since the input of the exhaled breath-based biometric system is correlated with the internal morphology of the human body, it is impossible for a hacker to spoof-authenticate a user.This is because it is difficult to reconstruct an original time series and subsequently the binary classifier models that consolidate all the relevant features (biometric traits) of the true user.Preliminary studies carried out and presented in this work based on time series data from 94 human subjects have shown promising results.We recommend the machine learning approach discussed in this work as a procedure to build a working user confirmation system as it produces good accuracy in confirming users.It achieved a true confirmation rate of nearly 100%, which is because of the ability of random forest models to capture complex decision boundaries between the classes.Although the dataset performs really well for a user confirmation algorithm, the real test of a biometric system comes in for the user identification algorithm, where the test user's identity is not revealed a priori.Building such an algorithm comes with more challenges and would require samples from a larger population to be evaluated.We recommend a multi-model approach for the user identification system, as discussed in this manuscript.The results from our study show that a user identification algorithm performs reasonably well with maximum precision and accuracy of ≈ 40% each for optimum parameter settings.39.4% to 57.4% of the users were correctly identified as at least the second highest voted users.
Our study reveals the possibility that a system built solely on the basis of the fluid dynamics of human exhaled breath could be a potential tool to understand the person-to-person variation in turbulent signatures of exhaled breath.This uniqueness in observed signature could potentially be correlated to the morphometric variation present in the extrathoracic airway.To make comments on the intricate structures within the upper respiratory tract, we might need experimental proof on cadaver models, or simultaneous imaging of upper tract along with the HWA data.Such a study would give us insights on how the structures exhibit considerable morphological diversity among individuals.While our study does not involve direct experimentation with throat morphology, it prompts consideration of how these morphological variations could contribute to the surprisingly unique turbulent signatures found in exhaled breath.Further investigation would give us better understanding on the relationship between these morphological traits and the distinct fluid dynamic signatures.For example, it is possible that the turbulence information can be correlated to occlusion in the extrathoracic passage and its nature, which is a major source of deposition of aerosolised therapeutics.Such an understanding will help us delve deeper into the area of personalised medicines.

Fig 1 .
Fig 1. Calibration curve for the hot wire anemometer.A fourth order least square fit of the experimental data (shown as maroon dotted line) becomes the calibration curve for the hot wire anemometer in use.The polynomial equation of the fourth order fit is shown inside the plot.

Fig 2 .
Fig 2. Experimental setup and recorded time series.(A) Depiction of the experimental setup for data collection.It consists of a disposable mouth-piece, a mouth-piece mount housing a hot wire anemometer and a data acquisition system.(B) A typical human exhalation velocity signal measured using a standard hot wire anemometer.The time signals were sampled at 10kHz for 1.5 seconds.

Fig 3
consists of a set of plots showing the effect of random shuffling of the exhaled breath time signal on the multifractal singularity spectrum.Figs 3A and 3B show the original and shuffled time series respectively.The inset plots in each of these plots display the zoomed-in view of the first 1000 data points.It is clearly visible that the existing correlation is destroyed when the data is shuffled.The distribution of the visualised time signal is shown in the form of a histogram in Fig 3C.

Fig 3 .
Fig 3. Comparison of multifractality of time signals.Plots showing the effect of random shuffling of exhaled breath time series acquired using a hot wire anemometer.Signals shown in (A) and (B) correspond to the actual breath data and the shuffled data respectively.Inset plots in (A) and (B) show a zoomed-in view of the first 1000 data points of the signals.Note that the signal has been normalized using its mean and standard deviation.(C) Histogram showing the distribution of all N data points of the breath signal.(D) Multifractal spectra for the original breath signal and the randomly shuffled white noise signal.Random shuffling causes loss of memory within the time series and losses the multifractality.
Fig 4 shows an example of such a distortion.The multifractal spectrum for a time signal and three randomly chosen segments X, Y and Z from the same time series are displayed.Fig 4A shows the entire time signal and the chosen segments.Out of the three segments, X and Z show a typical spectral shape, whereas segment B consists of a fold towards the left hand side of the spectrum (see Fig 4B).

Fig 4 .
Fig 4. Multifractal spectra for different segments of a time signal.The multifractral spectra corresponding to the entire time signal (maroon) and time segments X, Y and Z (black, bounded by gray band) in (A) are shown in (B).It is evident that few segments exhibit an inverted parabola shape and spectrum B has a distortion.

Fig 5 .
Fig 5.The multifractal spectrum.Plot of the spectrum of singularities f (α) against the singularity strength α, computed for an exhalation time series segment.The parameters β, ω and ϵ are the features that characterize a multifractal spectrum.

Fig 6 .
Fig 6.Flow chart of the algorithm.Flow chart showing the algorithm pipeline, including time series normalization, filtering, feature extraction, feature reduction, and data splitting into training and testing.The time signal shown here is one of the segments of the original time series.Note that the representation of blue bar for training dataset and green bar for testing dataset will be consistent in further discussions in this manuscript.The training data of all users were used for building n C 2 binary classifier models, which becomes the process known as enrollment.

Fig 7 .
Fig 7.  User confirmation algorithm based on hypothesis testing.A flow chart of the user confirmation algorithm based on hypothesis testing.The user confirmation block will be made use in the user identification algorithm later in this manuscript.An example of the hypothesis test against user-pair is illustrated inside the dotted box, directed from the user confirmation block by the red asterisk.Given a user i, the user confirmation block's output was reposed to answer the question "Are you indeed User i?" based on a threshold.

Fig 9 .
Fig 9. Comparison of candidate classifier models.(A) Bar chart showing the percent proportion of each model in the library in the case of best-of-all model selection procedure.(B) Box and whiskers plot showing the spread of test accuracy of each classifier.The orange line inside the boxes represent the median.

Fig 10 .
Fig 10.Two dimensional decision boundaries.Comparison of two dimensional decision boundaries in the (β, ω) plane, captured by different models for three randomly chosen user-pairs.The scattered points are the training data points with red and blue labels denoting their true classes respectively.The line separating the two contour regions is the decision boundary.Accuracy of each model against the test data is displayed at the top right corner of their respective plots.The abbreviations stand for the following: DT -decision tree, RF -random forest, SVM -support vector machine, LR -logistic regression, GNB -Gaussian naive Bayes, MLP -multi-layer perceptron.

Fig 11 .
Fig 11.User confirmation algorithm based on machine learning.A flow chart of the user confirmation algorithm based on machine learning.The user confirmation block will be made use in the user identification algorithm later in this manuscript.Given a user i, the user confirmation block's output was reposed to answer the question "Are you indeed User i?" based on a threshold.

Fig 12 .
Fig 12.A generic user identification algorithm.Given a test user j, the algorithm performs n confirmation trials.One confirmation trial is the equivalent to running the user confirmation block (either HT from Fig 7 or ML from Fig 11) for a trial user i.The identified user corresponds to the maximum prediction based on the n confirmation tests.Note that in the case where more than one confirmation trial results in the maximum prediction value, the algorithm does not identify a user.

Fig 13 .
Fig 13.Comparison of the confidence of confirmation η i .Histograms of confidence of confirmation η i compared between (A) a machine learning based approach (random forest classifiers) and (B) a hypothesis testing based classification approach, for one trial of n confirmation tests.In the example shown here, the predictions from ML classifiers give a range of η i values distributed between ≈ 38% to 100%, whereas the predictions from HT based classifiers produce η i values only close to 0% and 100%.
Fig 14  shows a visualisation in the (β, ω) plane for a randomly chosen user-pair.The blue and red markers are the training data points corresponding to two user classes, respectively.The class regions are computed using a structured synthetic dataset in the feature space.

Fig 14 .
Fig 14.Comparison of the decision boundaries in (β, ω) plane.Decision boundaries captured by (A) random forest classifier and (B) hypothesis testing based classifier for a randomly chosen user-pair.The scattered points are the training data points with red and blue labels denoting their true classes respectively.The line separating the two contour regions is the decision boundary.Accuracy of each model against the test data is displayed at the top right corner of their respective plots.The RF classifier captures a complex decision boundary compared to the HT based classifier.

Fig 15 .
Fig 15.Dependence of user identification time on the size of model library.Plot showing the linear relation of user identification time with the growth of model library.This is applicable to the ML based algorithms which include building of binary classifier models (also known as enrollment in the context of biometrics).The error bars show 95% confidence interval at every data point.
For each user, the dataset was split into training (60%) and test (40%) sets.It is important to note that this splitting was done after shuffling between groups of features corresponding to the 19 time blocks for each subject.We know that there were 190 time signals for each user in the database with each set of 19 signals coming from a single recorded time series (see subsection titled Time series segmentation, normalization and selection).Shuffling without grouping would result in the the same information being spread across the training and testing dataset, which was undesired.By doing this we made sure that out of 10 exhaled breath samples, 6 become part of training set and 4 become part of the test set.The training feature set was used to build the model library and the test feature set was used for user confirmation/identification tests.