Towards minimizing efforts for Morphing Attacks—Deep embeddings for morphing pair selection and improved Morphing Attack Detection

Face Morphing Attacks pose a threat to the security of identity documents, especially with respect to a subsequent access control process, because they allow both involved individuals to use the same document. Several algorithms are currently being developed to detect Morphing Attacks, often requiring large data sets of morphed face images for training. In the present study, face embeddings are used for two different purposes: first, to pre-select images for the subsequent large-scale generation of Morphing Attacks, and second, to detect potential Morphing Attacks. Previous studies have demonstrated the power of embeddings in both use cases. However, we aim to build on these studies by adding the more powerful MagFace model to both use cases, and by performing comprehensive analyses of the role of embeddings in pre-selection and attack detection in terms of the vulnerability of face recognition systems and attack detection algorithms. In particular, we use recent developments to assess the attack potential, but also investigate the influence of morphing algorithms. For the first objective, an algorithm is developed that pairs individuals based on the similarity of their face embeddings. Different state-of-the-art face recognition systems are used to extract embeddings in order to pre-select the face images and different morphing algorithms are used to fuse the face images. The attack potential of the differently generated morphed face images will be quantified to compare the usability of the embeddings for automatically generating a large number of successful Morphing Attacks. For the second objective, we compare the performance of the embeddings of two state-of-the-art face recognition systems with respect to their ability to detect morphed face images. Our results demonstrate that ArcFace and MagFace provide valuable face embeddings for image pre-selection. Various open-source and commercial-off-the-shelf face recognition systems are vulnerable to the generated Morphing Attacks, and their vulnerability increases when image pre-selection is based on embeddings compared to random pairing. In particular, landmark-based closed-source morphing algorithms generate attacks that pose a high risk to any tested face recognition system. Remarkably, more accurate face recognition systems show a higher vulnerability to Morphing Attacks. Among the systems tested, commercial-off-the-shelf systems were the most vulnerable to Morphing Attacks. In addition, MagFace embeddings stand out as a robust alternative for detecting morphed face images compared to the previously used ArcFace embeddings. The results endorse the benefits of face embeddings for more effective image pre-selection for face morphing and for more accurate detection of morphed face images, as demonstrated by extensive analysis of various designed attacks. The MagFace model is a powerful alternative to the often-used ArcFace model in detecting attacks and can increase performance depending on the use case. It also highlights the usability of embeddings to generate large-scale morphed face databases for various purposes, such as training Morphing Attack Detection algorithms as a countermeasure against attacks.


Introduction
Automated face recognition plays an integral role in access control, criminal investigation, and surveillance settings [1].In particular, for automated border control, the observation and analysis of facial characteristics is becoming increasingly important for identity verification [2,3].For example, to assist immigration officers at borders or airports, automated Facial Recognition Systems (FRS) can increase traveler throughput and reduce costs.
In a typical identity verification process, a biometric reference image, i.e., a passport photograph of a subject is compared to one or multiple probe images, i.e., trusted live photographs captured at the border.A similarity score is then calculated between the reference and probe images, and the subject is allowed to cross the border if the similarity score exceeds a predetermined threshold.
The operation of such an automated FRS must be secure and robust.However, so-called Morphing Attacks can compromise the security of FRSs [4,5].In a Morphing Attack, an attacker combines the face images of two or more subjects in order to form a morphed face image (see e.g., Fig 1).This morphed face image is presented as a (manipulated) reference to the FRS as it is stored and read from the passport on request.Since the calculated similarity score between the morphed face reference image and one or more bona fide probe images should be high enough to exceed some predetermined decision threshold τ of the FRS, each attacker's identity is falsely verified.As a result, two or more individuals may use the same passport to cross the border, and the unique link between a passport and an individual is broken.Real-world Morphing Attack cases have already been reported (e.g., [7]).High-ranking governmental bodies, such as EU DG HOME and the Ministries of Interior of the G7 states have now formed an action, in order to address the Morphing Attack Detection topic.
In the recent years, Morphing Attack Detection (MAD) algorithms have been proposed to fend off such attacks [8].Several MAD algorithms are based on machine learning and therefore require a large amount of data for training (e.g., [9,10]).However, generating such a large data source with high quality morphs is often challenged by the need for manual post-processing to reduce image artifacts [4,11].It is therefore important to develop criteria that allow an informed but automated selection of two (or more) individuals suitable for producing a high quality morph image [12] without relying heavily on manual intervention.These criteria can then be used to find a large number of possible pairs of suitable source images from which morphs can be automatically generated, and a database of morphed images can be created for future research on MAD.
Previous research has shown that an adequate pre-selection of possible morph pairs can reduce two things: (i) the choice of the applied morphing algorithm is less relevant [12], and (ii) the amount of artifacts produced by an automated morphing algorithm is reduced, making an FRS more vulnerable to the Morphing Attack [12].A large database of morphed images not only allows for better training and testing of MAD algorithms.It also allows for statistical analysis of the performance of FRSs, and may ultimately lead to a better understanding of the image properties that predict the success of a Morphing Attack.We claim that our analysis will contribute to the creation of large-scale training data sets to make MAD approaches more robust.
We further note that manual image pre-selection relies on some heuristic criteria as employed in previous works [11,13,14].For instance, soft biometrics characteristics have been used, to morph only subjects of similar age, same gender, or same ethnicity [11,13,14].In a complementary direction, other characteristics such as the shape of the hair, skin tone, differences in landmark position, and Euclidean distance between face embeddings extracted from the OpenFace model [15] have shown positive effects on the attack potential of a morph [12].
Deep learning-based FRSs provide feature embeddings, which are low-dimensional representations of high-dimensional face images [16].In the context of face recognition, termed face embeddings, feature embeddings are point representations in latent space learned during the training of a face recognition neural network [17].Computing a simple distance in latent space between two face embeddings, such as the cosine distance, can be effective in quantifying the similarity of two faces [18].Motivated by the superior performance of models that use such embeddings for face recognition (e.g., [15,19,20]), we hypothesize that feature embeddings from deeply learnt face models can provide rich enough data to automate image pre-selection for morphing simply by analyzing the embeddings.
We take advantage of the power of embeddings in determining similarity by presenting them as auxiliary data for image pre-selection in morphing.The general assumption in our work is that a small distance between the face embeddings of two subjects corresponds to a high similarity (structural and perceptual) of the facial features of the two subjects.Thus, selecting pairs of face images based on high similarity scores between them can help generate more realistic morphs compared to selecting two face images that do not look particularly similar.Automating the pair selection process (i.e., pre-selection) makes it tractable, reproducible, scalable, and less subjective than manual approaches.
An attacker could also use embeddings by comparing a number of candidate image embeddings to find a suitable morphing partner, for example in a database of possible April 2, 2024 3/41 accomplices.This could improve the success of an attack by allowing more quantifiable parameters to be used in deciding which morphing partner to choose, rather than just soft biometrics and subjective facial similarity.From a theoretical standpoint, it is obvious that pre-selection based on face similarity can increase the attack potential of resulting morphs [11,21], and previous research has demonstrated the increased attack potential of pre-selected morphs [12] using OpenFace [15] embeddings.However, a detailed analysis of morphing pairs pre-selected on embeddings of different state-of-the-art FRSs such as ArcFace and MagFace is still lacking.Insights on the suitability of these contemporary models for image pre-selection are however crucial to guide future attempts to create large-scale databases of morphed face images, which are especially important for the research context of MAD.
We evaluate the pre-selections by quantifying the attack potential of the created morphs on different FRSs.In addition to previously deployed metrics, we also use the recently introduced Morphing Attack Potential (MAP) [22,23] and a few of its derivatives.MAP compensates for some drawbacks of earlier metrics.E.g., the MMPMR [4] tends to represent the upper bound of attack potential, since an attack is considered successful with only one (of several) bona fide images positively verified against the morphed reference.On the other hand, the FMMPMR [24] represents the lower limit of the attacker potential, since an attack here is considered a success only in case of exclusively positive verification of all bona fide face images.However, the bona fide face images in a real-world attack are often very similar among themselves, as they are all captured in short time intervals just before the verification process.In contrast, the face images used in the present study were taken with large time intervals between them, and are therefore considerably more heterogeneous, so FMMPMR is not a pertinent measure of attack potential.MAP, on the other hand, tests attack potential across several different FRSs and thus provides a more generalizable picture of the actual attack potential.In real-life scenarios, attackers often do not know which system is being used for verification.Therefore, different attacks are launched on each system.If the bona fide face images are as heterogeneous as in the present study, this measure is also characterized by greater robustness.
We further make use of the learnt embeddings by using a particular category of losses.The magnitude of these losses can measure the quality of the given face image.We use these losses for creating a robust MAD algorithm in a Differential-MAD (D-MAD) setting.Our D-MAD algorithm is further based on the idea that the magnitude of the feature embedding is highly correlating and monotonically increasing if the pair of images are to be chosen using FRSs with adaptively learnt intra-class and inter-class feature distributions [20].
For this D-MAD algorithm, we deliberately build upon the concept of a previously published D-MAD algorithm, which used ArcFace embeddings [25].However, we take advantage of the better face recognition performance of MagFace [20] and therefore train a D-MAD classifier on the differential face embeddings of this FRS, while building an identical D-MAD algorithm using ArcFace embeddings for comparison.
The present study makes the following contributions: • We first examine face embeddings produced by several well-known FRSs for automated image pre-selection to produce morphed face images.We demonstrate that a large data set of morphed face images can be easily constructed by analyzing the distance between the embeddings.We empirically validate the effectiveness of our developed selection criteria by systematically studying the susceptibility of deep learning-based FRSs and two commercial-off-the-shelf (COTS) FRSs on our generated database.Our experiments show that pre-selection can produce better morphs and can compromise FRSs and MAD classifiers to a high degree, regardless of which particular embeddings were used in the pre-selection.We validate our pre-selection approach against a control data set consisting of randomly paired face images.However, the model used for preselection has a significant impact on the attack potential and efficiency to avoid detection by MAD classifiers.
• Furthermore, motivated by the limited performance of MAD algorithms in detecting Morphing Attacks generated by our pipeline, we present a newly designed MAD algorithm using MagFace over ArcFace differential embeddings for training, improving the detection capability.
In the rest of the paper, we first present our proposed approach for morph pair pre-selection and provide details about the data sets and models used to provide embeddings, morph face images, and validate the resulting morphs.We then illustrate the results of the Morphing Attacks by showing how the attacks generated by our pipeline are able to fool FRSs.Finally, we construct and benchmark a new MAD algorithm on this data set and discuss our results.

Methods
Proposed approach for morph pair pre-selection Our proposed approach consists of generic deep learning-based FRSs to extract embeddings followed by a similarity-based pair-selection module.The selected pairs were then provided to different face morphing algorithms.The generated data set is thereupon used to study vulnerability of FRSs and to develop a MAD algorithm.Fig 2 presents an illustration of our proposed approach for the convenience of the reader.

Embeddings from Face Recognition Systems
In our proposed architecture, different state-of-the-art implementations of FRSs were used to extract face embeddings for image pre-selection.Based on the results reported in recent work, we selected four different architectures to obtain the embeddings in our pre-selection pipeline.We chose ArcFace [26], VGG-Face [27], DeepFace [28], and MagFace [20].For ArcFace, VGG-Face, and DeepFace (Facebook), Tensorflow implementations of the respective models were used, which were included in the software distribution of the LightFace repository [29].For MagFace, the official repository was used [20].Each of these FRSs provides an embedding vector representing a face image.The vector differs in length depending on which FRS was used, as shown in Table 1.For the sake of completeness of the experiments in this paper, we also used the same set of FRSs to verify the resulting morphed faces, in addition to two COTS FRSs.

Pre-selection algorithm
Our proposed pre-selection criterion is based on a measure of similarity of embeddings which typically consist of rich identity preserving information [30].Given two equally sized embedding vectors, we employed Cosine distance (Eq 1) to determine the similarity between the underlying faces.For a pair of embeddings, corresponding to two face images, the Cosine distance [30] can be defined as: with E 1 and E 2 the D-dimensional embedding vectors of the images.For a given data subject and a chosen FRS, we computed the Cosine distances between the subject and the remaining N subjects in the chosen database.The procedure was repeated N times for each subject in the database.We further enforced a demographic consistency check.Specifically, we enforced that gender and ethnicity between individuals of a potential pair must correspond.Further, we allowed for a maximum age difference of 5 years between individuals of a potential pair.The labels provided with the face image data set were used for these checks (see below).Based on the computed similarity score matrix across all subjects, we retained the upper diagonal of the score matrix owing to the symmetric nature of the Cosine distance (i.e., ).The face images of unique subjects fulfilling the criteria were then chosen for morphing.The details of our pre-selection criteria are presented in Algorithm 1 for the sake of brevity.It should be mentioned that by this procedure each data subject was only used for the creation of a maximum of one morph pair.However, each data subject can be used in all runs of the algorithm, i.e., when the algorithm was applied to the embeddings of another FRS.
Pair selection according to this algorithm was carried out using four different FRSs.The algorithm was reapplied to create a baseline comparison data set without taking into account the similarity between the embeddings.For the baseline data set, only the face images of subjects with matching demographics were randomly morphed.

Creation of the morphed face data set
We validate our proposed approach for image pre-selection using the academic version of the UNCW-MORPH face data set distributed by the face ageing group of the University of North Carolina Wilmington (UNCW) [6,31].UNCW-MORPH contains bona fide face images only, contrary to its name which is suggesting it contains morphed images.It comprises over 55, 000 face images of more than 13, 000 data subjects, captured between 2003 and 2007.The facial images were captured in frontal poses with largely neutral expressions, making the data set suitable for face morphing.The face images had resolutions between 200 px × 240 px and 400 px × 480 px, with each image labeled with exact age, gender, and ethnicity.
In order to employ the data for our experiments, we conducted a curation process with a number of pre-processing steps.First, all samples were checked for neutral facial expressions and any images not conforming to neutral expressions were eliminated.We specifically used an emotion detection model from the LightFace package [29] to verify neutral expression.All samples, for which neutral was not the emotion with the highest probability, or samples, for which the emotion model failed, were discarded from morphing.
Since we needed multiple samples to study the susceptibility of FRSs to the resulting morphs, any subject with fewer than five samples was discarded from further analysis.For the remaining subjects, the first sample (in chronological order) was used for morphing, while the remaining samples were used for validation.Of the 55, 134 samples from 13, 618 subjects in the raw data set, 22, 992 samples from 3, 337 subjects remained in the data set.

Morphing algorithms
To perform the morphing, we chose four different morphing algorithms.Three of them were landmark-based (Alyssaq morpher [32], NTNU morpher [13,14], & UBO morpher [5,[33][34][35]).In these landmark-based algorithms, morphing was based on averaging the landmark coordinates of the two morph candidate images.The 68 face landmarks were extracted using the OpenCV dlib library [36], with an ensemble of regression trees used to estimate the coordinates.As a fourth morphing algorithm, a deep learning based algorithm -Identity Prior driven Generative Adversarial Networks (MIPGAN) [37,38] -were used.Unlike landmark morphed images, MIPGAN used the latent space of two samples to generate morphed images.A morphing factor (alpha) of 0.5 was used for all morphing algorithms.No image pre-processing or post-processing was done other than the steps included in the respective morphing packages.However, rescaling and cropping steps were performed for the face recognition steps in the vulnerability analysis (see below).

Properties of the morphed images
Morphing was then performed based on several criteria, such as the similarity of facial embedding vectors and demographic consistency.Since several different FRSs were used for the pre-selection algorithm, the pairings naturally differed between the approaches used.Furthermore, there were differences in the absolute number of pairs found.This is due to the fact that the demographic check eliminates many matched pairs if the demographic properties are too different.Thus, we obtained 452 pairs when pre-selecting with ArcFace, 511 for DeepFace, 632 for VGG-Face, and 639 for MagFace.When random pairing was performed, 819 pairs were found.

Vulnerability analysis
We investigated the vulnerability of various FRSs to morphs generated by our proposed architecture.Using a subset of bona fide images from the UNCW database and the morphed images generated by our architecture, we investigated the vulnerability of four different open-source FRSs and two COTS systems.While an open-source FRS can illustrate the applicability of the proposed approach, the evaluation of COTS systems will indicate a higher relevance for security considerations regarding Morphing Attacks in operational scenarios.

Calibration of the decision thresholds for verification
Since each of the open-source FRSs operates with a unique decision threshold, we first determined their respective thresholds specifically on the UNCW data set.We calibrated these thresholds for the four open-source FRSs which we used for face verification (which were the same as those used for pre-selection).To determine the respective thresholds, a subset of 500 data subjects was sampled and all possible one-to-one combinations of mated pairs were obtained using each FRS.Similarly, all possible combinations of non-mated comparison scores were computed.As the total number of possible non-mated comparisons highly outnumbered the amount of possible mated comparison scores, a uniform sampling from all possible non-mated comparisons was performed to obtain an equal number.Detection Error Trade-off (DET) curves were calculated for each FRS using the respective mated and non-mated distributions.The decision thresholds τ for False Match Rates (FMRs) of 0.1% were empirically determined for each FRS [39] based on the FRONTEX recommendation [40].The thresholds, along with the corresponding False Non-Match Rates (FNMRs), are shown in Table 2.
For face verification, we further deployed two COTS FRSs.For these, a default threshold was used to achieve an FMR of 0.1% as recommended by the respective COTS vendors.

Vulnerability analysis metrics
After determining the threshold, we analyzed the newly created morphed images for their attack potential using three different metrics, such as Product Average Mated Morph Presentation Match Rate (prodAvgMMPMR) (Eq 3), Relative Morph Match Rate (RMMR) (Eq 4), and Morphing Attack Potential (MAP).While prodAvgMMPMR and RMMR give the attack potential with respect to a single FRS, MAP gives the attack potential of the newly created data set across multiple FRSs.In this study, all rates are reported as decimal fractions and are therefore distributed in the interval [0; 1].[4] is defined for distance scores (Eq 2),

MMPMR The Mated Morph Presentation Match Rate (MMPMR)
with M being total number of morphed images, D n m the mated morph comparison score (here: distance score) of subject n at morph m, N m the total number of subjects constituting to morph m, and τ the decision threshold.
prodAvgMMPMR The Product Average Mated Morph Presentation Match Rate (prodAvgMMPMR) [11] is a variant of MMPMR that allows a more probabilistic interpretation of the success of Morphing Attacks (Eq 3), in which, additionally to the above, I n m is the number of samples of subject n within morph m, and D n,i m the mated morph comparison score of sample i of subject n at morph m.
An example: One morphed image was evaluated.Two data subjects contributed to the morph with one image each.Three bona fide samples per subject were tested against the morph.For one data subject, 2  3 of the comparison scores exceeded the April 2, 2024 9/41 threshold τ .For the other data subject, 3  3 of comparison scores exceeded the threshold τ .The prodAvgMMPMR then was simply the product of 2  3 and 3 3 , therefore 2 3 .
RMMR The Relative Morph Match Rate (RMMR) metric [11] on the other hand takes the FNMR of a biometric system into account.Different biometric systems, calibrated at a particular FMR, can have different FNMRs.For instance, the FNMRs of the calibrated open-source FRSs greatly varied after calibration of the decision threshold (see Table 2).If the FNMR is high, the system is less suited for an operation in a particular scenario, e.g., access control.Consequently, it might produce low MMPMR or prodAvgMMPMR -therefore be less vulnerable to Morphing Attacksbut at the same time rejects a large proportion of mated verification attempts.Therefore the RMMR relates the MMPMR to the FNMR (Eq 4).
MMPMR and FNMR (and therefore RMMR) are specific for the chosen decision threshold τ .Thus, if MMPMR is high, therefore the morphs would fool the FRS at τ , and at the same time, if the FRS performs well by having a low FNMR, the RMMR would level off around 1. On the other hand, if both the potential of the attack is low (low MMPMR), and the FRS also performs poorly by having a high FNMR, the RMMR would still level off at around 1. Most interestingly, if the potential of the attack is poor (i.e., low MMPMR), and the FRS performs well by having a low FNMR, the RMMR would be around 0. For the sake of completeness: if the attack is of high quality (high MMPMR), and the FRS performs poorly (high FNMR), the RMMR could theoretically level off at 2. However, that would require the morphed comparison distances to be smaller than the mated comparison distances.
MAP Recently, the Morphing Attack Potential (MAP) has been proposed to report the attack potential of a data set D of morphed images in a combined manner across different FRSs [22,23].All FRSs (in our case 6 different systems) verified the same number of different bona fide images (e.g. 4) of each subject against the respective morph.M AP D 4,6 then represents the 4 × 6 matrix, where the element (i, j) indicates the decimal fraction of morphed images for which at least i verification attempts were successful with respect to both contributing subjects and at least j FRSs (Fig 3).As outlined earlier, MAP values are characterized by higher generalizability and robustness compared to many other metrics (cf.Introduction).The MAP is now adopted in the ISO/IEC CD2 20059 standard [23].The standard further suggests the MAPavg metric, a weighted average which reduces the MAP matrix to one scalar.The weights are set higher for the cells to the lower and to the right of the MAP [23].We averaged the MAPs for different morphing algorithms before calculating one MAPavg per pre-selection method.Additionally, we evaluated the attack potential using the recently proposed G-MAP metric, which similarly aims at reducing MAP to one single scalar [41].We have calculated G-MAP for each pre-selection method separately, across morphing algorithms and verification systems.The calculations of MAPavg and G-MAP differ in several aspects, e.g., weighting is performed in MAPavg, but a different number of attacks per morph is allowed in G-MAP.

Morphing Attack Detection
In the previous parts, we described the methodology for morphing and evaluating the vulnerability of FRSs.In addition, the study at hand uses embeddings to detect Morphing Attacks.In particular, we draw inspiration from Differential Image Morphing Attack Detection (D-MAD) approaches, which compare a presented image with a trusted bona fide image to evaluate the nature of the presented image.We used a D-MAD approach proposed by Scherhag et al. [25] as our baseline.The chosen D-MAD approach performs a differential analysis of ArcFace embedding vectors to train a binary Support Vector Machine (SVM) classifier using radial basis functions and else default parameters as implemented in sklearn (v.0.24.2).Specifically, the ArcFace embeddings were extracted from the suspicious images to be analyzed.ArcFace embeddings were further extracted from bona fide probe images of one of the participating morph candidates.These bona fide images are comparable to trusted live captures of an attacker.The procedure is outlined in Fig 4. The two embedding vectors were then subtracted from each other.The resulting difference vectors of length 512 portray the samples of morphed (differential) images.As samples of bone fide (differential) images, the same procedure has been carried out by subtracting the embeddings of two different bona fide captures of the same data subject.The resulting difference vectors were scaled to follow a standard Normal distribution with µ = 0 and σ = 1 which were then handed over as features to the SVM.
While we found a decent performance of the previously proposed D-MAD approach, we want to note that the embeddings of MagFace instead of ArcFace could raise the recognition accuracy to a new level.The loss function of MagFace is designed in such a way that it not only arranges the samples of a class (a subject) adjacent in the multidimensional space.It is further designed so that samples with higher quality, or samples for which the certainty of class membership is high, are closer to the center of the class [20].Thus, the distances in the embedding space between two samples of the same class, which are of high quality, or, conversely, are certain to belong together, are very small.On the other hand, the distance in the multidimensional space between two samples is quite large when the membership estimate of one of the samples is less accurate due to low image quality.The size of MagFaces' embedding vector increases monotonically with image quality.This results in a larger difference between the two embedding vectors when the quality of a face image is low.Using MagFace instead of ArcFace embeddings could not only lead to the reported superior performance of MagFace in face recognition.It could also combine the strengths of an embedding-based D-MAD approach such as that of Scherhag et al. [25] with approaches based on image quality analysis, such as the approach of Venkatesh et al. [42].
In the present study, the procedure closely followed the approach described by Scherhag et al. [25] using ArcFace embeddings.However, the same approach has then been repeated in an analogous fashion using MagFace embeddings (Fig 4).

Training protocol
Only a subset of 80% of the generated control data set was used for training.This control data set consisted of morphed images without pre-selection based on embeddings, but face images were randomly morphed after demographic consistency checks (see above).For training, morphed images from all morphing algorithms were used together.Thus, from a pair of faces selected (randomly) for morphing, the morphs of all four morphing algorithms used were placed in either the training or the test set.A subset of 80% of all non-morphed subjects (with at least 2 face samples, which was about 10, 000 subjects) was used for training on the bona fide differential embeddings, and correspondingly 20% for testing.More importantly, testing was also performed on all morphs that were generated based on pre-selection using our proposed architecture, namely using distances between embeddings from different FRSs, such as ArcFace, DeepFace, VGG-Face, & MagFace.

Testing metrics
To evaluate the MAD algorithms, ISO/IEC 30107-3 and ISO/IEC CD2 20059 [23,43] propose to calculate the Morph Attack Classification Error Rate (MACER) and the Bona fide Presentation Classification Error Rate (BPCER).MACER was formerly named APCER (Attack Presentation Classification Error Rate) but renamed in the context of Morphing Attacks [23].Similar to the metrics used in the previous analyses, all rates will be reported as decimal fraction in a range of [0; 1].
MACER subserves as a security measure, i.e., the proportion of attack presentations incorrectly classified as bona fide presentations must be small for a secure biometric system.On the contrary, BPCER subserves as a convenience measure, i.e. a low number of false negatives is aimed for in an operational biometric system.Oftentimes, the BPCER10 is also reported [25].BPCER10 is the BPCER at the threshold of the system, at which the MACER is 10%, i.e., 0.1 [25].BPCER10 can subserve as a convenience metric, at a given security level.

Mated morph comparison performance
We studied the vulnerability of different FRSs when attacked by the data set we created.Specifically, we selected two different FRSs, namely ArcFace [26] and MagFace [20], to illustrate their vulnerability to face images generated by our proposed architecture.The corresponding success rates were measured using prodAvgMMPMR and are shown in Fig 5.Moreover, the proposed approach is investigated using four different morphing algorithms.
As shown in Fig 5, image pre-selection increased attack potential as compared to random pairing when the resulting morphs were verified using ArcFace or MagFace.The attack potential increased when MagFace was used as the verification system followed by ArcFace (Fig 5).While we also evaluated two other FRSs based on VGG-Face and DeepFace, the attack potential did not increase as the FRSs by themselves were relatively low performing (S2 Fig) , Table 2. See below for a detailed analysis of this behavior.
We further note a link between the FRS used for pre-selection and the FRS used for the assessment of vulnerability.If the pre-selection was based on the same FRS which is also used to assess vulnerability, the attack potential of the database was higher.This does not come as a surprise, since the embeddings are treated the same in both cases.However, when COTS FRSs were used, the attack potential was still increased compared to random pairing, although it was not biased by using the same FRSs twice within the analysis pipeline.The vulnerability to the morph attacks for the two COTS FRSs tested is illustrated in S1 Importantly, the increased attack potential, regardless of the morphing algorithm used, can be clearly observed throughout the proposed pre-selection approach.However, there was a noticeable difference in the success of Morphing Attacks.NTNU morpher and UBO morpher produced the best Morphing Attacks, followed by Alyssaq morpher and lastly MIPGAN (Fig 5 & S1 Fig).
The MAP has recently been introduced as a general measure of the success of Morphing Attacks across different verifying FRSs [22,23].Briefly, the elements of a MAP matrix contain the proportions of successful Morphing Attacks (with both data subjects involved) that fool a given number of FRSs for a given number of attack attempts (Fig 3).The higher the values, and the further the high values spread to the lower right of the matrix, the more effective the attacks were on the tested data set.
Fig 6 shows MAPs for morphs created by the UBO morpher.Again, using pre-selection generally increased the MAPs.All non-random pre-selection methods successfully outwitted at least four (out of six) FRSs in about 70 to 90% of cases with at least one attack attempt.In contrast, random morphs only exceeded 47%.In about 17% to 47% of cases, all four attack attempts were able to fool four different FRSs when pre-selection was performed.However, only single-digit percentages of morphs were able to fool four FRSs with all four attack attempts.Therefore, we further calculated MAPavg for each pre-selection method [23], which aggregates the MAP matrix into a single scalar by performing a weighted average, and with higher values corresponding to stronger attack potentials.The weights were defined by the positions of the cells, i.e., morphs with more successful attack attempts were weighted higher, as well as attacks which fool more FRSs.Values were calculated as decimal fraction, with one MAPavg value per pre-selection method.When verifying with six FRSs (four open-source and two COTS), MAPavg were 0.173 (no pre-selection), 0.247 (DeepFace), 0.322 (VGG-Face), 0.339 (ArcFace), and 0.317 (MagFace).Additionally, we removed DeepFace and VGG-Face from the verifying FRSs of the MAPavg analyses, as they showed high FNMRs and would therefore not be used in real-world scenarios (see next Section for more details).The resulting MAPavg were 0.347 (no pre-selection), 0.487 (DeepFace), 0.580 (VGG-Face), 0.662 (ArcFace), and 0.614 (MagFace).
The results confirm that pre-selection was better than none, and MagFace and ArcFace generated the strongest attacks (depending on the metric used for evaluation), followed by VGG-Face, and lastly DeepFace.

Relative mated morph comparison performance
To further examine the performances of data sets using the different pre-selection methods, as well as the behavior of the verification algorithms, the distributions of the raw distance scores of mated comparisons, non-mated comparisons, and mated morph comparisons were visualized as Empirical Cumulative Distribution Functions (ECDFs) in Fig 7, using morphs created with the UBO morpher as an example.Across all four open-source verification FRSs, the mated morph comparison scores were distributed between mated scores and non-mated scores.However, they were closer aligned to the mated scores than to the non-mated scores, even for morph pairs without pre-selection (i.e., random assignment).Importantly, morphs pre-selected with our proposed architecture performed better than morphs from random pre-selection.Similarly to before, the same verification FRS was biased for morphs pre-selected by their own embeddings prior to morphing.
The comparison decision highly varied between the verification FRSs.Whereas DeepFace incorrectly accepted only a very small number of morphs, followed by VGG-Face, ArcFace, and most significantly MagFace incorrectly accepted nearly all morphs as mated comparisons.On the contrary, at the calibrated threshold of FMR = 0.1%, DeepFace and to a lesser extent VGG-Face both revealed high FNMRs (Table 2), therefore incorrectly rejecting a large proportion of mated verification attempts.On the opposite, ArcFace, and more importantly, MagFace had very low FNMRs at the given FMR (Table 2).This has led to a higher susceptibility of better FRSs -in the sense of low FNMR at a given FMR -to Morphing Attacks.
We call this phenomenon morphing attack paradox.The better the FRS and therefore the lower the FNMR of the FRS on a preset threshold, the more tolerant the FRS is to mated presentations.The more tolerance the FRS shows to mated presentations, the more susceptibility it is to Morphing Attacks.As a result, more accurate FRSs are more susceptible to Morphing Attacks.S6  However, while the distance distributions of mated morph comparisons with NTNU morphs closely resembled those of morphs created with the UBO morpher, morphs created with Alyssaq and MIPGAN showed higher relative distances, resulting in a higher number of rejections of mated morphs at the given decision thresholds.
Since mated morph distances of more accurate FRSs -such as ArcFace and MagFace -were distributed between mated distances and non-mated distances, Fig 7 indicates that there is a chance of separating mated morph comparisons from mated comparisons by adjusting the decision threshold of the FRS.Such an adjustment could dramatically reduce the vulnerability for MagFace, for which the distance distributions of mated comparisons and mated morph comparisons showed only a slight overlap.Using ArcFace for verification, the overlap between distributions was already stronger.Therefore, threshold adjustment for verification would lead to significantly higher FNMRs in ArcFace.Contrarily, the distributions of mated morph distances of less accurate FRSs such as DeepFace and VGG-Face closely aligned to the distribution of the mated distances ( Fig 7).In the case of DeepFace, especially when both image pre-selection and verification were performed with the same FRS, the mated morph distances were even smaller than the mated distances.ECDFs for similarity scores of the COTS FRSs.Mated, non-mated, and mated morph comparisons were performed.Morphs were generated using the UBO morpher.The different similarity scores for the comparisons are displayed on the x-axis.The (cumulative) proportion of successful verifications at a particular similarity score is plotted at the y-axes.Note that because similarities instead of distances were used, the interpretation of the x-axes must be flipped compared to Fig 7. Different COTS FRSs were used for verification (rows).The different types of comparisons are color-coded, i.e., mated, non-mated, or mated morph comparisons, with morphs pre-selected with the help of face embeddings of certain FRSs.The dotted vertical lines indicate the 0.1% FMR threshold for each FRS used for verification.comparisons shifted toward the distributions of the mated comparisons, when pre-selection according to our architecture was applied.A hierarchy between the different pre-selection methods can be seen.Morphs derived from a pre-selection approach using MagFace embeddings produced the highest similarity scores, followed by ArcFace, VGG-Face, and finally DeepFace.
To further account for the performance of the individual FRSs, the RMMR was calculated using the open-source FRSs for verification.The RMMR corrects the MMPMR for the FNMR (Eq 4).Thus, the strong inflation of the mated morph comparison values of the previous chapter can be corrected, especially for FRSs with high FNMRs.Table 3 shows the RMMR values for differently pre-selected, morphed, and verified images.A similar pattern as before can be seen.When the same FRS is used for pre-selection and verification, the RMMR is highest in most cases.However, the second highest RMMR is often obtained from a pre-selection with MagFace, followed by ArcFace and VGG-Face.Higher RMMR can also be observed for morphs created by the UBO morpher and the NTNU morpher compared to other morphers.
Table 3 can be summarized in the following fashion.To get some idea about how well the individual pre-selection FRSs have performed across morphing algorithms and open-source verification FRSs -using RMMR as a metric -each row of Table 3 was converted to ranks (1 to 5). 5 indicated the FRS for pre-selection (columns) that had the highest RMMR value compared to the other elements, and 1 indicated the FRS Table 3. Relative Morph Match Rates (RMMRs).Images were morphed using different morphing algorithms, pre-selected using embeddings of different FRSs or alternatively, randomly pre-selected, and verified using different FRSs.The RMMR corrects the MMPMR by the FNMR of the verification FRS (see Eq 4).The highest values row-wise are highlighted in bold, leaving out the quasi-diagonal elements, i.e., if pre-selection and verification FRSs coincided.Note that RMMR was calculated as a decimal fraction within the range [0  for randomly pre-selected pairs for morphing.The BPCER10 values were higher when the test set contained morphs of pairs that were pre-selected according to our proposed architecture.This trend was more pronounced for morphs generated by the NTNU morpher and even more in morphs generated by the UBO morpher.On the other hand, morphs generated by the Alyssaq morpher or MIPGAN did not lead to a pronounced increase in BPCER10 values.
A high value of BPCER10 renders the MAD system inconvenient for practical purposes.The BPCER10 was increased by pre-selection (i.e., MagFace and ArcFace) and by the morphing algorithm used (i.e., UBO morpher and NTNU morpher).The trend is illustrated in more detail in Fig 10.Higher BPCER and MACER values were produced by the respective FRSs if pre-selection was performed according to our proposed architecture, and especially if it was performed using ArcFace or MagFace embeddings.This was consistent across different morphing algorithms.
While we have already shown the superiority of pre-selection over random pairing, we also observe large differnces in MAD depending on which FRS is used to extract embeddings for training and testing the D-MAD classifiers.BPCER10 values were about half in magnitude when MagFace was used for D-MAD, regardless of which FRS was used to extract embeddings for image pre-selection (Fig 9).On the other hand, the advantage of attacks morphed by the UBO morpher over embeddings morphed by the NTNU morpher disappeared when MagFace was used for D-MAD compared to ArcFace (Fig 9).The same can be seen in more detail in the DET curves ( Fig 10).The MACER and BPCER values were generally smaller, indicating a better performance of the D-MAD algorithm.
Interestingly, in some cases in Fig 10, it can be observed that there was not a consistent bias of the D-MAD algorithms towards being fooled by morphs pre-selected

Comparison of face recognition models for pre-selection
Regarding the FRS for extracting embeddings for image pre-selection, several models were evaluated.The results showed that the recently published MagFace algorithm performed best in most cases -depending on which metric was used to quantify vulnerability -closely followed by ArcFace.VGG-Face and in particular DeepFace showed relatively weak performance for morphing pre-selection.However, all pre-selection methods improved the success of the Morphing Attacks compared to random pairing (  3 & 4).In addition, a bias was observed such that if the same FRS was used for pre-selection and verification, the FRS was more susceptible to the resulting morphs (Figs 5 & 7).However, when two COTS FRSs were used, this bias was mitigated and the hierarchical ranking between the pre-selection methods was still the same (Fig 8) & S1 Fig).
A -at first glance -counterintuitive observation can be made when comparing Fig 5 and S2 Fig: While more accurate FRSs such as MagFace and ArcFace were quite vulnerable to Morphing Attacks, less accurate FRSs such as VGG-Face or DeepFace showed little vulnerability, since the prodAvgMMPMRs when verified with these FRSs mostly accumulated around 0. This trend suggests that as FRSs generally improve, so that after calibration to an FMR of, say 0.1%, the FNMR becomes lower, these more accurate -in terms of recognition -FRSs become more vulnerable to Morphing Attacks.Earlier we called this phenomenon the morphing attack paradox, and the effect is also nicely illustrated in [44].
The key element is the decision threshold, located somewhere between the distributions of the mated comparison distances and the non-mated comparison distances ( Fig 7).As long as a considerable proportion of the values of the mated-morph comparison distances is located below the threshold towards the mated comparison distances, the FRS will be quite vulnerable.Adjusting the decision threshold toward the distribution of the mated comparison distances would reduce this vulnerability.Adjusting this decision threshold would be best possible in MagFace as a verification model, as the mated and morphed distributions show a small overlap ( Fig 7).However, for a model as good as ArcFace, as well as the two COTS FRSs, the distributions had significant overlap, impeding a simple solution via adjustment.Furthermore, by adjusting the decision threshold in the direction of the mated comparison distances, FMR would decrease, which generally makes the system more secure -even against zero-effort impostor attacks.This in turn would inevitably increase the FNMR of the system, making it less convenient for particular practical purposes.At this point, it should be recalled, that the morphs used in this study were generated in an automated fashion.A real-world attacker would be able to invest time and resources into creating one single high-quality morph through manual intervention and various image post-processing steps.Comparison scores of such manually created morphs would be even more challenging to distinguish from mated comparisons, even when using MagFace for verification.
Furthermore, from the distribution of the prodAvgMMPMRs in In particular, the high values in the four leftmost columns of each MAP matrix are likely to derive from the more vulnerable MagFace and ArcFace FRSs, and the two highly vulnerable COTS FRSs.Analogously, the rather low values in the two rightmost columns are likely to be driven by the less vulnerable DeepFace and VGG-Face FRSs.This can also be observed in the MAPavg and G-MAP analyses.When removing these two verifying FRSs which showed high FNMRs (Table 2), the MAPavg and G-MAP values also increased substantially.
When correcting the mated morph rates for the FNMR of a verification FRS, as was done using the RMMR metric (Eq 4, Table 3), the general pattern persisted that a verification FRS was most vulnerable to morphs from image pairs pre-selected with the embeddings of the identical FRS.However, by ranking the RMMR row-wise and averaging across pre-selection methods and morphing algorithms (Table 4), the pattern manifested that MagFace was best suited for pre-selection among the FRSs tested.ArcFace followed MagFace, then VGG-Face, and lastly DeepFace.The poorest performance was constantly seen with randomly pre-selected morphs.
Further we want to emphasize that we morphed all images in the database.By only selecting a particular amount of pairs (e.g., 20% with the smallest distances) for morphing, the vulnerability rates would be higher.Therefore, the reported vulnerabilities are expected to rather represent a lower bound.
In a preliminary analysis of a different face data set, we also investigated the potential of different distance (or similarity) metrics to apply in the proposed pre-selection architecture.However, morphed face images pre-selected based on Cosine distance yielded superior results S14 Fig.This is not surprising as the algorithms are typically designed in such a way [20,26].Therefore we used Cosine distance for pre-selection in the present study.

Evaluation of morphing algorithms
A clear performance gap between the morphing algorithms is a common thread that runs through all of the analyses.Morphed images created by the UBO morpher, closely followed by those morphed by the NTNU morpher, performed best in fooling both FRSs (Fig 5 and Table 3) and also D-MAD classifiers (Figs 9 & 10).Morphs created by Alyssaq morpher and MIPGAN however performed worse in the current analyses.
What can be seen from Figs 9 & 10 is that the morphing algorithm deployed had a higher impact on the success of the D-MAD algorithm than the pre-selection.Similarly, the success in terms of fooling the verification FRSs can be seen in .The Alyssaq morpher returned morphs that were cropped at the face edges in a non-rectangular fashion (Fig 1), and not projected back onto one of the original images' backgrounds.This has probably helped the D-MAD algorithm in its decision during both training and testing, although the classification was not performed on the raw images, but on the extracted face embeddings.Real-world attackers would not use such a morph, e.g., in a passport fraud scenario.Furthermore, MIPGAN produced rather blurry images (Fig 1).In the original implementation of MIPGAN [37,38], the morphs were of higher quality, but also the original images used for morphing were of higher image quality than those of the database used in the study at hand.Thus, the morphing in latent space in the present case may have dropped many facial characteristics that could have been helpful in facilitating a Morphing Attack.

MagFace improved Morphing Attack Detection
Instead of adjusting decision thresholds to counter Morphing Attacks, MAD algorithms could be inserted into a face verification process.The concept of the D-MAD algorithm used in the study at hand was introduced by [25] and learned to distinguish between the distribution of the differences between two bona fide images and the distribution of differences between morphs and bona fide images (Fig 4), all in the embedding space.
April 2, 2024 23/41 Testing on morph images derived from random pairing produced the lowest BPCER10 values, indicating the highest accuracy and therefore lowest vulnerability of the D-MAD algorithm towards these morphs (Fig 9).Testing on the other morphswith pre-selection applied according to our proposed architecture -increased the BPCER10 values.Thus, the greatest vulnerability of the D-MAD classifier was seen for morphs pre-selected by MagFace, then ArcFace, VGG-Face, and finally DeepFace.This was true regardless of whether the D-MAD classifier was trained and tested with ArcFace embeddings or with MagFace embeddings.
In fact, the D-MAD algorithm trained with MagFace embeddings showed considerably lower BPCER10 values, regardless of the type of pre-selection.Therefore, using MagFace instead of ArcFace could be a significant improvement to the D-MAD classifier proposed by Scherhag et al. [25].Note that only the embeddings of the MagFace algorithm were used, not any additional quality metrics returned by the MagFace model.However, the quality of an image was still incorporated into the embeddings by the way the loss function was constructed.In MagFace's loss function, high-quality samples of an individual are pulled toward the center of the multidimensional distribution, while low-quality samples are pushed toward its boundaries [20].In other words, during the training of MagFace, the magnitude of the face embeddings was made proportional to the Cosine distance to the respective class (i.e., individuals) centers [45].Therefore, having different image qualities for the bona fide images and the morphed images results in an easier separation of the two groups by the classifier since their positions in the 512-dimensional embedding space are farther apart than the positions of two high-quality bona fide images.
The proposed D-MAD classifier based on MagFace embeddings was also submitted to the Face Recognition Vendor Test (FRVT) [46] and achieved good results in detecting high-quality morphed images (S12 Moreover, it outperformed the similar algorithm which is based on ArcFace embeddings instead of MagFace embeddings (hdaarcface) for MACER/APCER values from 0 to 0.1 (decimal fraction), an area of relevant security settings regarding Morphing Attacks.However, on low-quality images (i.e., Fig 4 in [46]), our classifier only outperformed the ArcFace classifier below an MACER/APCER of 0.02 (decimal fraction).Our classifier outperformed or underperformed compared to the ArcFace classifier depending on the face data set used for evaluation.In general, the ArcFace classifier performed better above MACER/APCER values of 0.1 (decimal fraction).However, at lower MACER/APCER values, the MagFace classifier achieved lower BPCER values on several face data sets throughout the different processing tiers, i.e., morphed face image data sets of different quality, such as for example in the Visa Border or TWENTE data sets [46].A detailed analysis of why the classifier performed better on some data sets and for specific MACER/APCER and BPCER is beyond the scope of the current study.
We further submitted the MagFace D-MAD algorithm to FVC-onGoing: on-line evaluation of fingerprint recognition algorithms [47], in the section for Differential Morph Attack Detection [14].Among all algorithms tested on the DMAD-SOTAMD P&S-1.0 benchmark, the D-MAD algorithm based on MagFace achieved the lowest BPCER10 values (0.84%), and the second lowest BPCER20 values (4.39%, S13 Fig).However, it achieved high BPCER100 values (i.e., 100%).The benchmark contained high-quality images of faces that were printed and scanned and captured with a frontal pose, natural expressions, and good lighting.See [48] for more details on how the algorithm performed on morphs from data subjects of different age groups or ethnicities and on morphs produced by different morphing algorithms, post-processing pipelines, and so forth.
One aim of large-scale image pre-selection based on embeddings was to evaluate a method for providing sufficiently large data sets of morphed face images for training MAD algorithms.Interestingly, a recent study showed that image pre-selection for training MAD algorithms could be done in the opposite way to the present study [49].It was shown, that training morphing pairs with low similarity can improve the performance of the MAD algorithm [49].
In the present study, separate D-MAD algorithms were trained on either ArcFace or MagFace embeddings.However, a fusion of the two may have constructive effects.In particular, it may be that the combination of both D-MAD algorithms would perform better than the D-MAD algorithm based on MagFace embeddings performed alone.

Conclusion
This study analyzed the use of face embeddings in image pre-selection and Morphing Attack Detection.MagFace and ArcFace embeddings were found to be effective for image pre-selection, as the resulting attacks posed a significant threat to modern FRS, especially COTS systems.MagFace outperformed ArcFace in many scenarios.Face embeddings from these models are highly suitable for pre-selection, such as for instance for automatically generating large databases of morphed faces.Similarly, morphed images pre-selected by MagFace, closely followed by ArcFace, posed a considerable threat to MAD algorithms by escaping detection.Lastly, MagFace differential embeddings were found to be particularly useful for attack detection, as they can improve the performance of a D-MAD algorithm.Taken together, the results underline the dual benefit of embeddings for both pre-selection and MAD, i.e. for the attacker and the defender.

Fig 1 .
Fig 1. Illustration of morphed face images created using different morphing approaches.The images on the left and on the right represent the corresponding two bona fide face images.Face images are republished from [6] under a CC BY license, with permission from Prof. Karl Ricanek Jr, University of North Carolina at Wilmington, original copyright 2006.

Fig 2 .
Fig 2. General workflow of our proposed pipeline for image pre-selection.Embeddings were extracted from one sample of each subject.Distances between embeddings were calculated.Faces were paired based on a low distance between embeddings.Pairs were then morphed, and morphed images were verified against bona fide probe images.Furthermore, Morphing Attack Detection has been conducted.The image pre-selection steps are further illustrated in Algorithm 1.The processing steps were performed using different FRSs and different morphing algorithms.Face images are republished from [6] under a CC BY license, with permission from Prof. Karl Ricanek Jr, University of North Carolina at Wilmington, original copyright 2006.
Fig 1 illustrates exemplary morphed images created by all four approaches.

Fig 3 .
Fig 3.  Morphing Attack Potential (MAP).The MAP is a matrix describing the success of a data set of morphed images to fool a set of FRSs using multiple attack attempts.Several FRSs (x-axis) are attacked with several mated Morphing Attack attempts (y-axis).The element of a MAP matrix describes the proportion of successful verifications of both attackers (i.e., both contributing subjects of each morph) at a given number of attempts (i.e., number of different bona fide images for both subjects) and with a particular number of fooled FRSs.Note that MAP was calculated as a decimal fraction within the range [0; 1].

Fig 4 .
Fig 4. D-MAD pipeline.ArcFace or MagFace embeddings were extracted from bona fide images and morphed images.Differential embeddings have been created by subtraction of either the embeddings of a bona fide image from a morphed image or by the subtraction of a bona fide image from a different bona fide image of the same data subject.The differential vectors have been re-scaled to N (0, 1).A classifier was trained (on ArcFace and MagFace differential embeddings, separately) to differentiate between bona fide images and morphed images.Face images are republished from [6] under a CC BY license, with permission from Prof. Karl Ricanek Jr, University of North Carolina at Wilmington, original copyright 2006.
Fig.For the COTS FRSs, the prodAvgMMPMR mostly accumulated around 1, indicating an extremely high vulnerability even for morphs based on random pre-selection (S1 Fig).In addition, the morphs created with MIPGAN and verified with COTS FRSs again illustrate the benefit of image pre-selection (S1 Fig).

Fig 5 .
Fig 5. Mated morphs comparison success rates for different image pre-selection embeddings.prodAvgMMPMRs (y-axes) are plotted for different pre-selection methods (x-axis & color-coded).Density is plotted in horizontal direction.Median values are illustrated by horizontal black bars.The same pairs were morphed by different morphing methods (rows).Random assignments of the morphing pairs are displayed in the left-most column.All morphs were verified using ArcFace and MagFace (columns).See S2 Fig for verifications using DeepFace and VGG-Face.Note that prodAvgMMPMR was calculated as a decimal fraction within the range [0; 1].

Fig 6 .
Fig 6.Morphing Attack Potential (MAP) of morphs generated by the UBO morpher.Different FRSs were used for image pre-selection, i.e.ArcFace, DeepFace, VGG-Face, or MagFace (different heatmaps).Alternatively, pairs were randomly assigned (bottom heatmap).For each FRS used for pre-selection, the resulting morphs were verified against four bona fide images of each subject.The ratio of successful attempts for both subjects is shown on each y-axis of each plot.In addition, different FRSs were used to verify the paired morphs, four open-source FRSs and two COTS FRSs.The percentage of successful attacks across multiple FRSs is plotted on each x-axis.The MAP is shown and color-coded in each cell and describes the proportion of successful verifications for a given number of attempts (y-axes) and FRSs (x-axes).Note that the MAP was calculated as a decimal fraction in the range [0; 1].
Fig, S7 Fig, & S8 Fig illustrate the respective distributions of distance values for morphs created with the other morphing algorithms.The general patterns were the same as in Fig 7.

Fig 8
further illustrates the ECDFs of the similarity scores using COTS FRSs and the UBO morpher (see S9 Fig, S10 Fig, & S11 Fig for the morphs created by the other morphers).Since the COTS FRSs were not used for pre-selection, the results are less biased with respect to the pre-selection algorithm.First, even with random pre-selection, all types of morphs were likely to be successfully verified by the COTS FRSs.However, similar to the open-source FRSs, the distributions of the mated morph

Fig 7 .
Fig 7. ECDFs for distance scores of the open-source FRSs.Mated, non-mated, and mated morph comparisons were performed.Morphs were created using the UBO morpher.The distance values for the comparisons are shown on the x-axis.The (cumulative) proportion of positive verifications at a certain distance score is plotted on the y-axes.Different FRSs were used for verification (rows).The different types of comparisons are color-coded, i.e., mated, non-mated, or mated morph comparisons, including morphs pre-selected with the help of face embeddings of the different FRSs.The dotted vertical lines indicate the 0.1% FMR threshold for each FRS used for verification.

Fig 8 .
Fig 8.  ECDFs for similarity scores of the COTS FRSs.Mated, non-mated, and mated morph comparisons were performed.Morphs were generated using the UBO morpher.The different similarity scores for the comparisons are displayed on the x-axis.The (cumulative) proportion of successful verifications at a particular similarity score is plotted at the y-axes.Note that because similarities instead of distances were used, the interpretation of the x-axes must be flipped compared toFig 7.  Different COTS FRSs were used for verification (rows).The different types of comparisons are color-coded, i.e., mated, non-mated, or mated morph comparisons, with morphs pre-selected with the help of face embeddings of certain FRSs.The dotted vertical lines indicate the 0.1% FMR threshold for each FRS used for verification.

Fig 9
Fig 9 shows the corresponding BPCER10 values of the MAD classifiers, tested on morphs with different pre-selection applied, and corresponding bona fide images.The operational point values BPCER10 were lower for the D-MAD classifier trained with MagFace differential embeddings than for the one trained with ArcFace differential embeddings.Furthermore, the BPCER10 values on the test data sets were the lowest

Fig 9 .
Fig 9. D-MAD algorithm performances.BPCER10 values of the classifiers tested on differently morphed and differently pre-selected data sets are shown.Left: metrics from a D-MAD algorithm trained on ArcFace embeddings.Right: Metrics from a D-MAD algorithm trained on MagFace embeddings.The images morphed by different morphing algorithms are shown in different colors.The pre-selection methods to generate the pairs for morphing are distributed along the x-axes.Note that BPCER10 was calculated as a decimal fraction within the range [0; 1].

Fig 10 .
Fig 10.DET curves of the D-MAD approaches.Left column: D-MAD approach using ArcFace embeddings (original version).Right column: D-MAD approach using MagFace embeddings.Morphs of the different morphing algorithms are separated by rows.Data subsets of differently pre-selected morph pairs are color-coded.The BPCER is plotted against the MACER.Dotted lines indicate the positions where BPCER or MACER are 0.1 (i.e., 10%) and 0.05 (i.e., 5%).Note that both rates were calculated as decimal fractions within the range [0; 1].
Figs 5, 6, 7 & 8, S1 Fig, Tables Fig 5 and S2 Figthe high vulnerability of MagFace and ArcFace and the low vulnerability of VGG-Face and DeepFace -some conclusions can be drawn on the results of the MAPs (Fig 6).
Fig 5, and by comparing the MAPs between morphers (Fig 6, S3 Fig, S4 Fig, & S5 Fig).Alyssaq and MIPGAN morphers performed rather poorly at fooling the D-MAD algorithm, even with pre-selection applied.The reason for the Alyssaq morpher might for instance be the shape of the resulting morph (Fig 1) Fig).The FRVT MORPH report was created shortly before the initial submission of our manuscript.The D-MAD algorithms in S12 Fig were evaluated on high-quality morphed images, created with commercial tools.The illustrated DET curve shows particularly low BPCER values of the MagFace D-MAD algorithm (hdamag) in the area of low APCER (i.e., MACER) values.
and validate it on several FRSs using multiple attack attempts.In particular, we compute the recently proposed Morphing Attack Potential (MAP) metric on the resulting data set of morphed images, which illustrates for a more generalizable and robust measure of attack potential.
• We generalize our proposed pre-selection approach for morph generation across April 2, 2024 4/41 different morphing algorithms

Table 1 .
The number of embeddings per FRS.

Table 2 .
The verification thresholds on the unnormalized Cosine distances for each open-source FRS.Thresholds were calculated on the UNCW data set.The corresponding FNMRs are illustrated next to the thresholds as decimal fractions.

Table 4 .
RMMR value, respectively.If equal values coincided in a row, decimal numbers were used.The ranks were then averaged across rows, therefore averaged across morphing algorithms and the attacked FRSs.Table4illustrates the average ranks for the different pre-selection methods.Pairs based on MagFace embeddings generated the highest RMMR values, followed by ArcFace, VGG-Face, and finally, DeepFace.Randomly pre-selected pairs performed the worst across different morphing algorithms and verification systems.Average ranks for RMMR values for the different pre-selection methods.Pre-selection was either performed using random assignment of pairs or based on embeddings of four different FRSs.