Analyzing Personalized Policies for Online Biometric Verification

Motivated by India’s nationwide biometric program for social inclusion, we analyze verification (i.e., one-to-one matching) in the case where we possess similarity scores for 10 fingerprints and two irises between a resident’s biometric images at enrollment and his biometric images during his first verification. At subsequent verifications, we allow individualized strategies based on these 12 scores: we acquire a subset of the 12 images, get new scores for this subset that quantify the similarity to the corresponding enrollment images, and use the likelihood ratio (i.e., the likelihood of observing these scores if the resident is genuine divided by the corresponding likelihood if the resident is an imposter) to decide whether a resident is genuine or an imposter. We also consider two-stage policies, where additional images are acquired in a second stage if the first-stage results are inconclusive. Using performance data from India’s program, we develop a new probabilistic model for the joint distribution of the 12 similarity scores and find near-optimal individualized strategies that minimize the false reject rate (FRR) subject to constraints on the false accept rate (FAR) and mean verification delay for each resident. Our individualized policies achieve the same FRR as a policy that acquires (and optimally fuses) 12 biometrics for each resident, which represents a five (four, respectively) log reduction in FRR relative to fingerprint (iris, respectively) policies previously proposed for India’s biometric program. The mean delay is sec for our proposed policy, compared to 30 sec for a policy that acquires one fingerprint and 107 sec for a policy that acquires all 12 biometrics. This policy acquires iris scans from 32–41% of residents (depending on the FAR) and acquires an average of 1.3 fingerprints per resident.

Finally, as noted in the main text, 1.87% of residents were excluded from the verification studies due to poor image quality (pg 23-24 of [1]). Hence, we consider two scenarios. In the exclusion scenario, we assume that the 1.87% of people are omitted from the study and use the 21 FRR values directly. In the inclusion scenario, we assume that the failure-to-acquire (FTA) rate is 0.0187 and that the 21 FRR and FAR values from Figs. 10-11 in [1] are false non-match rates (FNMR) and false match rates (FMR), respectively. We then recalculate the 21 FRR and FAR values via the formulas FRR=FTA+FNMR(1-

Overview of Parameter Estimation Procedure
The experimental set-up that generates the rank-1 and rank-2 PMFs in Fig. 8 of [1] differs from the set-up used to generate the FRR vs. FAR curves in Figs. 10-11 of [1] and the FAR vs.t curve provided separately: the experimental set-up for Fig. 8 uses one very good sensor and includes the 1.87% of people that were unlikely to be verified successfully, while the latter uses the average of 14 good sensors and excludes the 1.87% of people with poor image quality. Consequently, we do not jointly estimate all of our parameter values, but rather use a two-stage approach, where we first estimate (c 1 , . . . , c 10 ) from the PMFs in Fig. 8 of [1] and then estimate the remaining parameter values using the FRR vs. FAR curves in Figs. 10-11 of [1] and the FAR vs.t relationship.
More specifically, let p (1)i and p (2)i for i = 1, . . . , 10 be the predicted rank-1 and rank-2 probabilities corresponding to the PMFs in Fig. 8 of [1]; we derive expressions for these probabilities in terms of the model parameters in §1. 3. Because the PMFs in Fig. 8 of [1] clearly provide the best data for estimating (c 1 , . . . , c 10 ), in the first stage we solve the least squares problem, min c1,...,c10,µ,τ,σ,s and retain the values of (c 1 , . . . , c 10 ) from the solution to (1)-(2) and discard the µ, τ , σ and s values from the solution. As we see in §1.3, the parameter δ does not feature in this optimization and the constraint (2) fixes the scaling of (c 1 , . . . , c 10 ).
In the second stage, we estimate the remaining parameters using the three FRR vs. FAR curves in Figs. 10-11 of [1] and the FAR vs.t relationship. All of these FAR values were calculated using a single imposter probability distribution (using BFD fingers) that does not vary by finger i or by image quality. Let G 1 (t) be the cumulative distribution function (CDF) andḠ 1 (t) be the complementary CDF of a lognormal distribution with unknown parameters (µ G , σ 2 G ). Let G 2 (t) andḠ 2 (t) be the CDF and complementary CDF of the distribution that is the convolution of two lognormal (µ G , σ 2 G ) distributions.
These two complementary CDFs are associated with the imposter log similarity score (i.e., of different people, one at enrollment and one at verification) for one finger and for the fusion (i.e., sum) of two fingers, respectively. Because of the difficulties in simultaneously estimating the measurement noise for genuine and imposter distributions, we do not attempt to capture the former, and hence assume that G 1 (t) andḠ 2 (t) hold even if multiple attempts are taken.
To estimate the parameters µ G and σ G , we use the data (F AR l ,t l ) for l = 3, 4, 5, 6. We take the FAR values as hard constraints, thereby implicitly fixing the thresholds t l in terms of µ G and σ G , and minimize the sum of squared errors between the logarithm of the predicted and the observed thresholds (the presence of the ln function in (3) is explained in §1.7): subject toḠ 1 (t l ) = FAR l for l = 3, 4, 5, 6.
To estimate the remaining parameters, let t 1k and t 2k for k = 1, . . . , 7 be the unknown thresholds used to make accept/reject decisions, which generated the points on the FRR vs. FAR curves for the one-finger policies in Fig. 10 of [1] and the fusion policy in Fig. 11 of [1], respectively; becauseḠ 1 (t) is assumed to apply in the one-finger case regardless of the number of attempts, and the FAR values provided in the blue (single-attempt rank-1 finger) and red (rank-1 finger with up to three attempts) curves in Fig 10 of [1] are the same, it follows that the thresholds are the same in the two rank-1 finger cases. Let F 1 (t), F 1m (t) and F 2m (t) be the CDFs corresponding to the similarity scores for the singleattempt rank-1 finger, the rank-1 finger with up to three attempts, and the fusion of rank-1 and rank-2 with up to three attempts, respectively, so that the predicted FRRs in these three cases are F 1 (t 1k ), F 1m (t 1k ) and F 2m (t 2k ). Expressions for these three CDFs in terms of the model parameters are derived in §1.4-1. 6. We take the FAR values as hard constraints, thereby fixing the thresholds t 1k and t 2k using the estimates of µ G and σ G , and choose the remaining five parameters to minimize the sum of squared relative errors (we use relative errors because the various FRR values in Figs 10-11 of [1] are of different orders of magnitude): min µ,τ,σ,δ,s subject toḠ 1 (t 1k ) = FAR 1k for k = 1, . . . , 7, G 2 (t 2k ) = FAR 2mk for k = 1, . . . , 7.
As explained in the main text, our model finesses some of the details of the BFD and verification processes because we do not have the raw similarity score data from [1]. More specifically, we approximate the color-coded ranking system by simply assuming that fingers are ranked solely by their Y i values (e.g., the rank-1 finger is arg max i Y i ). We also approximate the "up to three attempts" verification process in the derivation of F 1m (t) and F 2m (t) by simply assuming that three attempts are always made and the maximum score of the three attempts is used (the subscript m is mnemonic for maximum). Because fingerprints with image quality 1 or 2 should be easy to verify, this simplifying assumption should not introduce very much error.

Derivation of p (1)i and p (2)i
In this section, we derive p (1)i and p (2)j , which are used in equation (1).
where X i is the true similarity score for finger i. Let Y i be the observed similarity score during BFD for finger i, and letỸ i = ln Y i . ThenỸ i =X i + max k=1,2,3˜ ik , where˜ ik ∼ N (δ, s 2 ) are independent for k = 1, 2, 3.
We use Clark's method [7], which constructs an accurate approximation for the maximum of several normal random variables, to approximate max k=1,2,3˜ ik by a normal random variable˜ mi with mean δ + µ 3 and variance σ 2 3 , where ThusỸ i =X i +˜ mi , and using the independence ofX i and˜ mi , we get thatỸ i ∼ N ( where θ ∼ N (µ, τ 2 ) and α 2 σ 2 + σ 2 3 . LetỸ mi = max j =iỸj . Then the probability that finger i is chosen as the best finger equals P (Ỹ i >Ỹ mi ). There are different ways to derive this quantity, and we use an approach that is amenable to numerical integration: where The change of variable in (11) converts the integral in (10) into a form where we can apply Gauss-Hermite quadrature, which is well-suited for functions that require integration of the normal density ( [4], pg 129). Gauss-Hermite quadrature approximates where N is the number of sample points for the approximation, and the points v i and the associated weights w i are fixed once n is chosen. This procedure yields We perform similar calculations to derive p (2)i . If we defineỸ mij = max k =i,jỸk , then Approximating the double integral in (14) with Gauss-Hermite quadrature with N sample points v i and associated weights w i , we get where Note from (12) and (16) that neither h 1 (θ, y i ) nor h 2 (θ, y i ) depend on δ, and hence p 1(i) and p 2(i) do not depend on δ for i= 1, . . . , 10. Further, we can always scale the parameters (c 1 , . . . , c 10 ), σ and s to obtain scaled values of (Ỹ 1 , . . . ,Ỹ 10 ), which nevertheless preserve the probabilities p 1(i) and p 2(i) for i= 1, . . . , 10. This justifies constraint (2), which fixes the scaling.

Derivation of F 1 (t 1k )
As explained in §1.3,X i is the true log genuine similarity score for finger i andỸ i =X i +˜ mi is the log genuine similarity score observed during the BFD process, so thatỸ Let the subscripts (i) be defined by Y (1) ≥ · · · ≥ Y (10) , regardless of which random variables the subscripts appear on. LetZ i =X i +δ i be the log genuine similarity score during verification, whereδ i ∼ N (δ, s 2 ) and is independent of˜ mi . It follows thatZ i = ln Z i ∼ N (c i θ + δ, σ 2 + s 2 ). Recalling that F 1 (t) is the CDF of the genuine similarity score in the single-attempt rank-1 finger scenario, we have that the FRR for this case is Our goal in this section is to derive the CDF ofZ (1) .
Recalling thatỸ mi = max j =iỸj , we have that We use the tower property in conjunction with conditioning on variables that -to ease the computational complexity -make the events within the probability conditionally independent. In particular, we condition on θ andỸ i because To evaluate P (X i +δ i < t|θ,Ỹ i ) in (19), we apply Prop. 3.13 on page 116 of [5], which gives the distribution of one normal random vector conditioned on another (possibly correlated) random vector.
This result implies that given (θ, (21) It follows that Finally, we employ a change of measure in conjunction with numerical integration to reduce the number of Φ(·) evaluations needed at each approximation point in the double integral version of (22).
As in §1.5, we approximateδ mj in (30) by a normal random variable with mean δ + µ 3 and variance σ 2 3 given in (8)- (9). Thus, given θ,Z mj is approximately distributed as N (c j θ + δ + µ 3 , σ 2 + σ 2 3 ), where we use the independence ofX j andδ mj in calculating the variance. In (30), we are interested in calculating the conditional distribution ofX i givenZ mi and θ. Noting thatZ mi =X i +δ mi , and using Prop. 3.13 in [5], we find thatX whereσ 2 is given in (21) and Referring again to (30), we already knowX j|Ỹj ∼ N (µ j|Ỹj ,σ 2 ) from Section §1.4. Based on this fact and (29)-(32), we calculate the first two probabilities inside the expectation in (30) to get P (eZ m(1) + eZ m(2) < t) Turning to the change of measure, we take advantage of the fact that, given θ,Ỹ j andZ mi are independent for i = j. Given θ, we change the measure ofỸ j ,Z mi in (33) toỸ ∼ N (cθ +δ +µ 3 , α 2 ),Z m ∼ N (cθ + µ 3 , α 2 ), and compensate by multiplying by the corresponding terms to get We note that this change of measure is more efficient than applying Clark's approximation to the last probability in (33) and then invoking a change of measure.
Because the expression in (36) is difficult to compute, we perform one final step, which uses a monomial-based integrating scheme called cubature [3] combined with the Gauss-Laguerre scheme [4].
Suppose we need to calculate for any function f (·) on R 3 , Consider the substitutionθ In shorthand, we denote the relations on the right side of (38)-(40) by (θ, y, z) = H 1 (θ,ŷ,ẑ). Making this substitution into (37) yields We now convert to spherical coordinates as follows: let (θ,ŷ,ẑ) = H 2 (r, u) (ru 1 , ru 2 , ru 3 ), and let U 3 represent the surface of a unit sphere in three dimensions with dA(u) representing its infinitesimal area element. Substituting into (41), we get Applying to (43) the change of variable, we get where (44) holds with weights and sample points obtained using the Gauss-Laguerre scheme, which can approximate integrals of the form , pg 130). The function S(·) in (42), which is a surface integral, is approximated by choosing points (and corresponding weights) on the surface of the unit sphere. This technique of evaluating multi-dimensional surface integrals as a weighted sum of the function evaluated at points on the surface is called cubature, and has been applied for functions integrated against a Gaussian density [3]. That is, we approximate and substituting (45) into (43) yields Equation (46) provides a numerical integration scheme for any function f (·) on R 3 introduced in (37); we need to apply (46) to the function in (36). Before that, we write the CDF of eZ m(1) + eZ m(2) in a format that is amenable to numerical computation (more specifically, we take the exponent of the logarithms in obtaining (48) from (47) because the a iZm and a jY terms can be very large), and -for ease of reference -we define all quantities in one place that are required to compute this CDF: where . (49) Applying (46) to (48), we get where the superscript kl is used to emphasize that the quantity depends on k and l through (θ kl ,Ỹ kl ,Z kl m ).
x Cub,l ), which upon expanding gives In summary, we calculate the fusion probability by using (50), with all its variables evaluated using (49) at the values of triplet (θ kl ,Ỹ kl ,Z kl m ) described by (51).
Recalling thatḠ 1 (t) is the complementary CDF of a lognormal with parameters (µ G , σ 2 G ), we see that the left side of (4) is equal toḠ Equating this expression to F AR l yields where the values of parameters µ G and σ G need to be estimated. Thus (52) provides theoretical estimates of the log thresholds, and because we already know the observed log thresholds lnt l , we use linear regression to solve (3) and obtain estimates of µ G and σ G .
Constraint (6) can be rearranged as However, because there is no simple expression for the sum of two iid lognormals, it is hard to directly solve for t 2k in constraint (7). The approximation in [8] is not sufficiently accurate for our purposes, and so we use simulation to generate 10 6 imposter similarity scores {G i } 10 6 i=1 , and for each k let t 2k be the 1-F AR 2k quantile of the empirical distribution of {G Finally, note that solving the optimization problem requires an initial solution (µ 0 , τ 0 , σ 0 , δ 0 , s 0 ). As an initial guess, because the similarity scores are normalized to a 100-point scale, we set the lognormal median, e µ0 , equal to 50 to obtain µ 0 = ln 50. For (τ 0 , σ 0 , s 0 ), we use the estimates of (τ, σ, s) from the solution of (1)-(2). As noted before, although the experimental setup of (1)-(2) used to estimate (c 1 , . . . , c 10 ) differs from that used to estimate all other parameters, we expect these values to provide a good starting point. Also, recall from §1.3 that the solution of (1)-(2) provides no information about δ.
We expect δ to be negative but close to zero because it should be much smaller than µ. Therefore, we set δ 0 equal to 0. Finally, the entire stage 1 and stage 2 procedure is repeated multiple times, with each run (except the first) using the optimized values from the previous run as the inital solution.
2 Iris Parameter Estimation The data are described in §2.1, mathematical expressions for FRR and FAR are derived in §2.2 and the iris parameters are estimated in §2.3.

Data
The data for our iris parameters are taken from [9]. Although single-and dual-eye cameras are tested in [9], we restrict ourselves to the dual-eye experiments because they achieved better performance and smaller delays than the single-eye experiments. The relevant data are in Table 8 and Fig. 13 of [9]: Fig. 13 gives four points on the FRR vs. FAR curve, denoted by (F RR 2j ,F AR 2j ), j = 1, . . . , 4, which consider two attempts of two (left and right) irises, and Table 8 fixes F AR 1 = 10 −6 and provides the value F RR 1 for one attempt of both irises. When multiple attempts are taken of an iris, the maximum similarity score across attempts is used. In addition, the fusion of two irises is the maximum of the two similarity scores; i.e., although fingers are fused with the sum, irises are fused with the maximum in the UIDAI experiments. Each attempt in [9] corresponds to using the first image that meets the quality threshold or the best among three images, whichever happens first. However, because this process is used during enrollment and during each verification, we do not attempt to model the details within an attempt. These experiments use four types of dual-eye cameras, and then discard one type as not being good, and hence report results on the average of three types of good cameras. For two attempts of two fingers (i.e., Fig. 13 of [9]), we also know that for l = 4, 5, 6, the threshold valuest 4 = 27,t 5 = 36 and These experiments exclude the 0.33% of people whose iris images could not be acquired, defined as having the image not be usable for at least four of the eight single-eye and dual-eye cameras tested [9].
As in the fingerprint case, we consider two scenarios: the 0.33% are omitted and the five FRR values are used directly in the exclusion case, and in the inclusion scenario we assume a failure-to-acquire rate of 0.0033 and use the relationships FRR=FTA+FNMR(1-FTA) and FAR=FMR(1-FTA).

Analysis
We now derive expressions for F RR 1 , F AR 1 , F RR 2j and F AR 2j . By construction, we know that for genuine scores, Because iris fusion is performed via the maximum, it follows that for the unknown threshold t 1 used to where Φ 2 (x, y; ρ) is the bivariate CDF of two standard normal variables with correlation coefficient ρ.
The independence of the imposter similarity scores for the left and right irises implies that When there are two attempts of each iris, we letZ im =X i + max k=1,2γik and let t 2j be the unknown threshold corresponding to (F RR 2j ,F AR 2j ) for j = 1, . . . , 4. We use Clark's method to approximate max k=1,2γik as a normal random variable with mean ψ 2 ψ + 1 √ π β and varianceβ 2 π−1 π β 2 , which implies that F RR 2j is Because we ignore measurement noise, the F AR 2j values are also given by (57):

Parameter Estimation
We have seven parameters to estimate: µ 11 , σ 11 , ρ, ψ, β, µ GI , σ GI . The parameters µ 11 and ψ appear in equations (56) and (58) as µ 11 + ψ, and hence cannot be individually determined. We arbitrarily assume ψ = 0, leaving us with six parameters. We initially estimate the imposter parameters µ GI and σ GI using the three threshold values for three specific FAR values, which yields a negative mean score (µ GI = −1.23, σ GI = 0.53), which in turn generate extremely large estimates of µ 11 , σ 11 and b. Consequently, we also estimate µ GI and σ GI from Hamming distance data in [10], where the best camera has a Hamming distance distribution with mean 0.456 and standard deviation 0.0214. Assuming that the similarity score equals 100 times 1 minus the Hamming distance, we obtain µ GI = 4.00 and σ GI = 0.039. Although both pairs of estimates lead to comparable fits to the FRR vs. FAR curves, and we are only interested in the right tail of the imposter distribution (and the left tail of the genuine distribution), we nonetheless use the latter approach so that the parameter values make more intuitive sense.
Solving equations (57) and (59) for the thresholds , and substituting these thresholds into (56) and (58) However, the solution to (60) turns out to be indeterminable: many combinations of σ 11 , β, ρ can give the same result. Fig. 6 of [11] allows us to roughly estimate ρ to be 0.6, and solving (60) for the three remaining parameters leads to our final estimates (the optimal solution is very insensitive to 1000 randomly-generated starting points).

Proposed Policies
We consider single-stage policies in §3.1 and two-stage policies in §3.2.

Single-stage Policies
A single-stage policy takes as input a resident's observed log similarity scores during BFD and BID, Y = (Ỹ 1 , . . . ,Ỹ 12 ), and chooses to acquire a subset S of the 10 fingers, along with either neither or both irises (due to the use of a dual-eye camera). We then observe the new similarity scoresZ for the acquired subset and make an accept/reject decision (i.e., deem that the resident is genuine or an imposter). Our overall goal is to minimize the FRR subject to constraints on the FAR and average delay.
Because this problem is difficult to solve, we make several simplifying assumptions. First, we solve this problem for each individual resident because we cannot easily compute the expectation over the distribution of (Ỹ ,Z). In particular, we require that each resident's FAR is equal to a specified value p.
Second, it is much easier to consider a delay penalty than a delay constraint, and so we add the delay subject to FAR = p.
Overview. We describe our approach to analyzing (61)-(62) in five steps. Let H 0 denote the null hypothesis that the resident is an imposter and H 1 be the alternative hypothesis that the resident is genuine. In step 1, we follow [13] and construct the likelihood ratio L = Neyman-Pearson lemma states that the form of the optimal policy is to accept the resident as genuine (i.e., reject H 0 ) if L > t for some threshold t and reject the resident (i.e., accept H 0 ) if L < t. In step 2, we solve for the threshold t by equating the FAR, which is P (L > t|H 0 ), to the pre-specified value p.
In step 3, we calculate the FRR, which is P (L < t|H 1 ). In step 4, we observe that the 10 fingers can be ranked according to a simple rule, which greatly simplifies the analysis. In step 5, we solve (61)-(62).
Let e T = (1 1) and I be the 2×2 identity matrix. Under hypothesis H 0 that the resident is an imposter, we have thatZ i are iid N (µ G , σ 2 G ) for i = 1, . . . , |S| andZ I ∼ N (eµ GI , Σ GI ), where Σ GI σ 2 GI I. Define the centered variablesZ c GI =Z I − eµ GI . It follows that the denominator of the likelihood ratio is Define the vectorμ µ I − µ GI , so thatZ c GI =Z c I +μ. By (76)-(77), the log likelihood ratio (i.e., ln L) is given by where Choosing the Threshold t. The optimal decision rule dictates that we declare the resident to be genuine if and only if L > t. We need the threshold t to satisfy FAR = P (L > t|H 0 ) = p, which by (78) can be written as Recall from (77) that under H 0 ,Z i ∼ N (µ G , σ 2 G ) and iid for i = 1, . . . , |S|. Letting N i (Z i − µ G )/σ G ∼ N (0, 1) and simplifying the i th term inside the summation in (81), we have where Turning to the irises part of (81), we use the fact that the distribution of a definite quadratic form of Gaussian variables, x T Qx where Q 0 and x ∼ N r (µ, Σ), can be expressed as a positive linear combination of independent non-central chi square random variables [14]. More specifically, where y ∼ N r (P T A −1 µ, I) are independent, A is defined by Cholesky's decomposition of Σ = AA T , and λ = (λ 1 , . . . , λ r ) > 0 and the orthogonal matrix P are the eigenvalues and eigenvectors of A T QA. To be able to apply this technique to the irises part of (81), we must satisfy Substituting (91)-(92) into (87), (88) and (89), and from our discussion below (86), we get Before applying this result, we check the conditions for Q 1 to be definite. Because its off-diagonal entries are positive, −Q 1 0 is ruled out. We can only have Q 1 0, which from (93) requires that , and (94) .
(95) (94). Thus, the only condition we require for Q 1 to be (positive) definite is that which holds in our case.
Recall that under H 0 ,Z I ∼ N (eµ GI , Σ GI ). We are now in a position to use (86) to express the irises part of (81) by which by (84) equals where, from the discussion below (84), we have thatŷ = (ŷ 1ŷ2 ) T ∼ N (μ y , I) (whereμ y is defined below), A is obtained using Cholesky's decomposition of Σ GI =ÂÂ T , andλ = (λ 1λ2 ) T andP are the eigenvalues and eigenvectors ofÂ T Q 0Â . Explicitly evaluating these quantities, we haveÂ = σ GI I,P = 1 and the final parameters of interest By (101) we can re-express (98) as where N 11 and N 12 are iid N (0, 1).

Substituting (82) and (102) into (81) and rearranging yields
wheret ln t − |S| ln ω 1 + ω 2 The quantity on the left side of the inequality in (103) is a positive linear combination of independent non-central chi-squared random variables. We analyze this quantity using a fast and accurate approximation in [15], which is a generalization of a result for a non-central chi-squared random variable in [16].
The simpler result in [16] can be used directly if I 11 = 0; here we apply the result in [15] to the general case.
. We did not include the next higher order terms of the approximation in [16] because they did not improve the accuracy in the right tail in our numerical calculations. Applying this approximation to (103) yields Calculating the FRR. Our analysis of FRR follows the same sequence of steps as our analysis of FAR.

(109)
Recall that under hypothesis H 1 ,Z c I ∼ N (0, Σ I ). Repeating the steps used to derive the FAR, we use (85) to express the irises part of (108) as which by (84) equals where from the discussion below (84), we have thatȳ = (ȳ 1ȳ2 ) T ∼ N (μ y , I),Ā is obtained using Cholesky's decomposition of Σ I =ĀĀ T , andλ = (λ 1λ2 ) T andP are the eigenvalues and eigenvectors ofĀ T Q 1Ā . Explicitly evaluating these, we havē and the final parameters of interest By (115), we can re-express (111) as where M 11 and M 12 are iid N (0, 1).
Substituting (109) and (116) into (108) and rearranging yields wheret ln t − |S| ln ω 1 + ω 2 As in (105)-(106), we use the approximation in [15], but we now include the higher order terms in [16] to improve the accuracy in the left tail. As before, noting that the random variable inside the probability in (117) is of the formQ k = k j=1 c j (x j +a j ) 2 where x j ∼ N (0, 1) iid for 1 ≤ j ≤ k with c j > 0, and definingθ s andh analogously, we have ( . Applying this approximation directly to compute the probability (117), however, proves inaccurate for our application. The quality of this approximation deteriorates with increasing variation among the c j 's [15], and the c j 's corresponding to the iris terms,λ 1 2 andλ 2 2 , which are both of the same order, are significantly larger than , which is the c j for the finger terms. To circumvent this problem, we splitQ k asQ F k +Q I k , whereQ F k (Q I k ) consists of the finger (iris) terms exclusively and is thus approximated accurately using the scheme described above. To compute FRR=P(Q F k +Q I k <t) using the approximated independent marginals ofQ F k andQ I k is analytically intractable, so we instead use the trapezoidal approximation withÑ terms: where the equality follows from the independence ofQ F k andQ I k . Each term in (119) may be computed based on our approximation, as was done in (105)-(106). We chose a value ofÑ = 20 in our numerical computations.
Ranking the Fingers. Until now, our analysis has ignored the decision of which |S| fingers to acquire.
Rather than evaluating all 10 |S| possibilities, it turns out that there is an optimal ranking of the fingers, regardless of the value of |S|. To derive this ranking, first fix the number of fingers |S| that may be scanned.
Next, note that the distribution of the first random variable in (117) is non-central chi squared with |S| degrees of freedom and non-centrality parameter Next, we note that |S| i=1 β 2 i is the scaled Euclidean distance between the means of the log similarity scores of fingers under the two hypotheses H 0 and H 1 . Therefore, holding all else fixed, a higher |S| i=1 β 2 i implies a greater difference -and hence a greater ability to distinguish -between the two hypotheses; this should reflect in a lower FRR at the same level of FAR. It is easy to prove that this argument is always true when the standard deviations under the hypotheses H 0 and H 1 are identical, and we are close to this case as the ratio of these standard deviations is 0.87 for the exclusion scenario and 0.75 for the inclusion scenario. Further, even when this ratio is away from 1, the conclusion still holds at levels of FAR that are small enough, such as those in our case.
It follows that if we are to acquire exactly |S| fingers to minimize FRR, it is nearly optimal to choose those with the |S| highest values of β 2 i , or equivalently |β i |. This argument does not depend on the value of |S|, and hence the ranking based on |β i | does not depend on |S|.
Because the Neyman-Pearson lemma considers a deviation of the log similarity score away from µ G -whether positive or negative -as a departure from the null hypothesis, it allows for β i < 0. However, it will typically be the case that β i > 0 because the expected value of the log similarity score should be higher when the match is genuine than when the match is between two different residents, which implies that µ i + δ > µ G . Hence, for practical purposes, we choose to rank the fingers by β i rather than |β i |, which by (83) is equivalent to ranking by µ i ; by (65), this ranking reduces to µ θ σ 2 3 σ 2 c i +Ỹ i , which depends on both the population-wide average c i of finger i and the observed similarity scoreỸ i of finger i.

Two-stage Policies
In general, a two-stage policy takes as input a resident's observed log similarity scores during BFD and BID,Ỹ = (Ỹ 1 , . . . ,Ỹ 12 ), and chooses to acquire in the first stage a subset S 1 of the 10 fingers, along with either neither or both irises (due to the use of a dual-eye camera). We then observe the new similarity scoresZ (1) for the acquired subset and either accept the resident, reject the resident or continue to the second stage to acquire more biometrics. If the second stage is reached, it chooses to acquire a subset of the 10 fingers and both irises that have not already been used in the first stage. We then observe the new similarity scoresZ (2) for the acquired subset of second stage and make an accept/reject decision.
Because this problem is difficult to solve, we only consider three restrictive classes of policies. In the first class, we consider the iris-finger policy, in which we acquire both irises and no fingers in the first stage, and no irises and a subset of the 10 fingers in the second stage. Next we consider the finger-iris policy, in which, analogous to the iris-finger policy, we may acquire only fingers in the first stage and only both irises in the second stage. Finally, the least restrictive class we consider is the either-other policy, that permits the use of any one mode of biometrics -either fingers or irises -in the first stage and the use of only the other mode in the second stage.
We also make several simplifying assumptions. As before for the single-stage policies, we impose the same FAR constraint for each resident and we account for the constraint on average delay using an additive delay penalty with Lagrange multiplierλ. In addition, if the policy proceeds to the second stage, we impose that the second-stage FAR be independent of the observed similarity scoresZ (1) from the first stage. Note that the second-stage FAR may still be different across residents. This assumption, while suboptimal, considerably simplifies the analysis of the policy and makes it possible to express the optimal policy in a compact form.
As before, our overall goal is to minimize the sum of FRR and the delay penalty subject to a constraint on the FAR. We note that in contrast to the single-stage policies, the delay in the two-stage case is variable, depending on whether or not the first-stage similarity scores are inconclusive in making an accept/reject decision. Therefore, we now useλ times the expected delay as the delay penalty in our objective function, where the expectation is taken under the hypothesis that the resident is genuine.
Overview. In our analysis of two-stage policies, we re-use much of the notation from §3.1 but add a subscript or superscript to denote the stage. Let the biometric acquisition decisions be given by S i and I (i) 11 for stage i = 1, 2, letZ (i) be the set of similarity scores for the biometrics acquired in stage i, and let L i be the likelihood ratio based on the similarity scores acquired only in stage i. We change notation for the thresholds (Fig. 1 in the main text) and let t U and t L be the upper and lower thresholds in stage 1 and let t 2 be the threshold for stage 2. Thus, if L 1 < t L we reject the person as imposter, if L 1 > t U we accept the person as genuine, and we otherwise go to stage 2. In stage 2, we compare L 2 with t 2 to make the accept/reject decision (we prove in our analysis that it is optimal under our assumptions to discard L 1 when making the stage 2 decision). Finally, we define D i (S i , I (i) 11 ) to be the delay due to the biometrics acquired in stage i, as given in Table 2 of the main text.
As mentioned earlier, we analyze three classes of two-stage policies which differ in the restrictions that they place on the set of feasible biometrics (S 1 , I 11 , S 2 , I (2) 11 ) that may be acquired. We define these feasible sets as 11 , S 2 , I (2) 11 : S 2 ⊆ {1, . . . , 10}, I 11 , S 2 , I (2) 11 : 11 ∈ {0, 1}, |S 2 | = 0, I Note that for each of these policies, by construction, the mode of biometrics acquired in stage 2 differs from that in stage 1. By our assumption of independence between iris and finger scores, this implies that Z (1) andZ (2) (and hence, L 1 and L 2 ) are independent -a fact that will be harnessed in our analysis.
All classes of two-stage policies are now analyzed under a unified framework in which we fix the policy class and refer to the corresponding feasible set in (121) by C. Our strategy to find the optimal policy is divided in two steps: (i) fix the set of biometrics (S 1 , I 11 , S 2 , I (2) 11 ) acquired in each stage and find the optimal policy parameters (t L , t U , t 2 ) as well as the optimal objective value, and (ii) determine the optimal set of biometrics to acquire by making comparisons across their objective values. By ranking the fingers by β i as in the single-stage policies, we need to consider only 10, 10 and 20 different feasible sets for the iris-finger, finger-iris and either-other policy, respectively.
We also provide implementation details that facilitated effective computation of these policies.
Optimal Policy for a Fixed Set of Biometrics. In the first step, we fix the set of biometrics (S 1 , I 11 , S 2 , I (2) 11 ) to be acquired in stage 1 and 2, and note an elementary but important fact: the second-stage decision t 2 will, in general, depend on the realized similarity scoresZ (1) from stage 1 as well as on the choice of t L and t U . Letting F = (t L , t U ,Z (1) ) denote the information at the beginning of stage 2 and F be the set of all possible values of F , we have that t 2 is a real function on F; i.e., t 2 (F ) ∈ R, ∀F ∈ F. However, we shall prove shortly that under our restriction that the FAR in stage 2 be constant for all values ofZ (1) , it follows that t 2 (F ) is constant for all F with fixed t L , t U ; this allows for any policy to be expressed in a compact form as a triplet (t L , t U , t 2 ) ∈ R 3 .
We now write the expressions for FAR, FRR and the expected delay (D) in terms of (t L , t U , t 2 ).
Defining S(F) to be the space of all real functions on F, we place no restrictions on t 2 ∈ S(F). Let t 2 be any real-valued function on F and define the function t 2 such that t 2 (F ) = t 2 (F )/L 1 , ∀F ∈ F (note that t 2 is well-defined since L 1 is a function ofZ 1 , which is an element of F ). We know from the Neyman-Pearson lemma that it is optimal to make the decisions in stage 1 based on L 1 and those in stage 2 based on L 1 L 2 . Thus, we havē Our goal is to solve Substituting (122)-(124) into (125) yields min t L ,t U ∈R t2∈S(F ) P (L 1 < t L |H 1 ) +λD 1 (S 1 , I Let us define the second-stage FAR by which is only a function of t L and t U . Next, in (126), we move the minimization over t 2 inside to equivalently obtain min t L ,t U ∈R P (L 1 < t L |H 1 ) +λD 1 (S 1 , I 11 ) (128) Note that in the inner optimization problem, while the value of t 2 may be optimized path by path for each realized value of F , the constraint only applies in an average sense (i.e., on the expected value over all realizations of F ∈ F). This makes it hard to solve for the optimal t 2 ∈ S(F), and even if one did, it would still be hard to evaluate the expectation in (128) in order to solve for the optimal t L and t U in the outer optimization. For this reason, we impose that the constraint in (128) holds path by path, i.e., P (L 2 > t 2 (F )|F, H 0 ) = p ∀F ∈ F. This is our aforementioned assumption to let the FAR in stage 2 be the same for all realizations of the stage 1 similarity scoresZ (1) . Further, the optimal threshold t 2 (F ) now only depends on p which is only a function of t L and t U ; as a consequence, the optimal t 2 does not depend on the stage 1 similarity scoreZ (1) and the optimal policy is characterized by the triplet (t L , t U , t 2 ) ∈ R 3 . Henceforth, we simply assume that t 2 ∈ R.
We next exploit the independence between L 2 (which is a function ofZ (2) ) andZ (1) as discussed below (121). Using the fact that t 2 is now just a policy constant, we can greatly simplify the expressions for FRR and FAR in (123)-(124) to get Using (127) and (130), we express the constraint F AR = p as P (L 2 > t 2 |H 0 ) = p . Substituting this and (129) in (125) yields the optimization problem that we finally solve: To solve (131), we perform a grid search of (t L , t U ). At any estimate of this pair, the value of p is implicitly defined, and t 2 may then be obtained using the constraint.
Before closing this discussion, we describe how to compute the probabilities in the objective function of (131), as well as the procedure to invert the probability in the constraint to obtain t 2 . Recall that the log likelihood, ln L i , is just the expression (78) with S, I 11 andZ substituted by S i , I 11 andZ (i) , for i = 1, 2. This enables us to use the machinery already developed in §3.1 to compute and invert all the probabilities involved. Thus, to invert the constraint in (131) for a given p , we note that it is identical to (81) with S, I 11 ,Z, t and p substituted by S 2 , I (2) 11 ,Z (2) , t 2 and p , and we follow the same procedure to obtain t 2 . Next, by writing P (L 1 ∈ [t L , t U ]|H 1 ) as P (L 1 < t U |H 1 ) − P (L 1 < t L |H 1 ), we express all probabilities in the objective function of (131) as P (L i < t|H 1 ) for some t ∈ R and i = 1, 2.
Any probability of the form P (L i < t|H 1 ) is identical to the right hand side of (108) with S, I 11 andZ substituted by S i , I (i) 11 andZ (i) , and it can be computed using the procedure to compute FRR laid out there.
Optimal Set of Biometrics. Let (S 1 , I 11 , S 2 , I 11 ) ∈ C be the set of biometrics we choose to acquire for a fixed two-stage policy. As we know how to compute the optimal parameters (t L , t U , t 2 ) for a fixed set of biometrics, computing the optimal objective value in (131) is straightforward. By choosing the set (S 1 , I 11 , S 2 , I 11 ) ∈ C with the minimum optimal objective value, we obtain the optimal set of biometrics to acquire. However, for each two-stage policy there are more than 1,000 feasible sets in C, which makes it prohibitively expensive to compute the optimal parameters for each set. Fortunately, a major reduction is possible by following the same line of arguments for ranking fingers as in §3.1.
Turning to the iris-finger policy, in which the fingers may be used in stage 2 only and the similarity scores of fingers appear only in L 2 , we consider all sets of fingers S 2 , which have |S 2 | fingers. Next, we fix the values of t L , t U in (131) so that p is now fixed. As noted earlier, ln L 2 is just the expression (78) with S, I 11 andZ substituted by S 2 , I 11 andZ (2) . We already know from §3.1 that for a fixed FAR= p = P (L 2 > t 2 |H 0 ), increasing |S2| i=1 β 2 i decreases the FRR= P (L 2 < t 2 |H 1 ), which in turn decreases the objective function in (131). Therefore, for this value of t L and t U , it is optimal to rank the fingers based on β i as before and to choose the set S 2 consisting of the top |S 2 | fingers. Because the ranking based on β i does not depend on the values of t L and t U , the set S 2 of the top |S 2 | fingers provides the lowest objective value in (131) among all sets with |S 2 | fingers. Because 1 ≤ |S 2 | ≤ 10, there are only 10 different sets S 2 that we need to evaluate to arrive at the optimal set of biometrics to acquire.
For the finger-iris policy, we make a more fundamental argument because a direct comparison with the single-stage case (as we did for the iris-finger policy) does not work due to the presence of the confounding terms P (L 1 ∈ [t L , t U ]|H i ) in the FRR and FAR. Instead, we first note that the inference in the second stage (using only irises) is independent of the choice of fingers in the first stage because the similarity scoresZ (1) are never used in stage 2. Hence, improving the inference in stage 1 by using the optimal set of fingers can only lower the overall FRR for a given FAR. Next, following the line of arguments for the single-stage case, we know that heuristically a higher |S| i=1 β 2 i provides a greater difference, and hence a greater ability to discern, between the two hypotheses H 0 and H 1 . Therefore, we rank the fingers based on β i as before and choose the set S 1 consisting of the top |S 1 | fingers, for fixed 1 ≤ |S 1 | ≤ 10, to improve the inference in the first stage. As claimed earlier, among the sets with |S 1 | fingers, using S 1 lowers the overall FRR for the given FAR, and this fact allows us to make comparisons across just 10 sets to arrive at the optimal set of biometrics to acquire.
Finally, it follows from the definition of C either−other in (121) that we only need to compare across 20 sets to obtain the optimal set of biometrics for the either-other policy: the best set S 2 for the iris-finger policy and the best set S 1 for the finger-iris policy containing n fingers, for each 1 ≤ n ≤ 10.
Implementation Details. The computation of the optimal two-stage policies consumes significantly more time than the single-stage policies. In contrast to an average computation time of 0.002 seconds for determining the optimal parameters for the single-stage policies, the same step is ≈ 100 times slower for the two-stage policies due to the numerical optimization step over t L and t U . Hence, it is important to choose a good initial solution. We do this by first solving for the threshold t 1 under the assumption that it is never optimal to proceed to the second stage (i.e., t L = t U ), which amounts to computing the single-stage optimal policy. We then use as initial solution t L = t 1 − and t U = t 1 + for a small value > 0, which typically yields a smooth convergence. In addition, for those residents for whom the singlestage policy is already good enough (e.g., guarantees an FRR < 10 −9 ), we skip the optimization step altogether and simply treat the single-stage policy as optimal. This significantly reduces the computation time and results in no change in our simulated results.