User authentication system based on human exhaled breath physics

doi:10.1371/journal.pone.0301971

Fig 1.

Calibration curve for the hot wire anemometer.

A fourth order least square fit of the experimental data (shown as maroon dotted line) becomes the calibration curve for the hot wire anemometer in use. The polynomial equation of the fourth order fit is shown inside the plot.

More »

Expand

Fig 2.

Experimental setup and recorded time series.

(A) Depiction of the experimental setup for data collection. It consists of a disposable mouth-piece, a mouth-piece mount housing a hot wire anemometer and a data acquisition system. (B) A typical human exhalation velocity signal measured using a standard hot wire anemometer. The time signals were sampled at 10kHz for 1.5 seconds.

More »

Expand

Fig 3.

Multifractal spectra for different segments of a time signal.

The multifractal spectra corresponding to the entire time signal (maroon) and time segments X, Y and Z (black, bounded by gray band) in (A) are shown in (B). It is evident that few segments exhibit an inverted parabola shape and spectrum B has a distortion.

More »

Expand

Fig 4.

The multifractal spectrum.

Plot of the spectrum of singularities f(α) against the singularity strength α, computed for an exhalation time series segment. The parameters β, ω and ϵ are the features that characterize a multifractal spectrum.

More »

Expand

Fig 5.

Flow chart of the algorithm.

Flow chart showing the algorithm pipeline, including time series normalization, filtering, feature extraction, feature reduction, and data splitting into training and testing. The time signal shown here is one of the segments of the original time series. Note that the representation of blue bar for training dataset and green bar for testing dataset will be consistent in further discussions in this manuscript. The training data of all users were used for building ⁿC₂ binary classifier models, which becomes the process known as enrollment.

More »

Expand

Fig 6.

User confirmation algorithm based on hypothesis testing.

A flow chart of the user confirmation algorithm based on hypothesis testing. The user confirmation block will be made use in the user identification algorithm later in this manuscript. An example of the hypothesis test against user-pair is illustrated inside the dotted box, directed from the user confirmation block by the red asterisk. Given a user i, the user confirmation block’s output was reposed to answer the question “Are you indeed User i?” based on a threshold.

More »

Expand

Fig 7.

User confirmation algorithm based on machine learning.

A flow chart of the user confirmation algorithm based on machine learning. The user confirmation block will be made use in the user identification algorithm later in this manuscript. Given a user i, the user confirmation block’s output was reposed to answer the question “Are you indeed User i?” based on a threshold.

More »

Expand

Fig 8.

A generic user identification algorithm.

Given a test user j, the algorithm performs n confirmation trials. One confirmation trial is the equivalent to running the user confirmation block (either HT from Fig 6 or ML from Fig 7) for a trial user i. The identified user corresponds to the maximum prediction based on the n confirmation tests. Note that in the case where more than one confirmation trial results in the maximum prediction value, the algorithm does not identify a user.

More »

Expand

Fig 9.

Comparison of the confidence of confirmation η_i.

Histograms of confidence of confirmation η_i compared between (A) a machine learning based approach (random forest classifiers) and (B) a hypothesis testing based classification approach, for one trial of n confirmation tests. In the example shown here, the predictions from ML classifiers give a range of η_i values distributed between ≈38% to 100%, whereas the predictions from HT based classifiers produce η_i values only close to 0% and 100%.

More »

Expand

Fig 10.

Comparison of the decision boundaries in (β, ω) plane.

Decision boundaries captured by (A) random forest classifier and (B) hypothesis testing based classifier for a randomly chosen user-pair. The scattered points are the training data points with red and blue labels denoting their true classes respectively. The line separating the two contour regions is the decision boundary. Accuracy of each model against the test data is displayed at the top right corner of their respective plots. The RF classifier captures a complex decision boundary compared to the HT based classifier.

More »

Expand

Fig 11.

Dependence of user identification time on the size of model library.

Plot showing the linear relation of user identification time with the growth of model library. This is applicable to the ML based algorithms which include building of binary classifier models (also known as enrollment in the context of biometrics). The error bars show 95% confidence interval at every data point.

More »

Expand