Face recognition is used to prove identity across a wide variety of settings. Despite this, research consistently shows that people are typically rather poor at matching faces to photos. Some professional groups, such as police and passport officers, have been shown to perform just as poorly as the general public on standard tests of face recognition. However, face recognition skills are subject to wide individual variation, with some people showing exceptional ability—a group that has come to be known as ‘super-recognisers’. The Metropolitan Police Force (London) recruits ‘super-recognisers’ from within its ranks, for deployment on various identification tasks. Here we test four working super-recognisers from within this police force, and ask whether they are really able to perform at levels above control groups. We consistently find that the police ‘super-recognisers’ perform at well above normal levels on tests of unfamiliar and familiar face matching, with degraded as well as high quality images. Recruiting employees with high levels of skill in these areas, and allocating them to relevant tasks, is an efficient way to overcome some of the known difficulties associated with unfamiliar face recognition.
Citation: Robertson DJ, Noyes E, Dowsett AJ, Jenkins R, Burton AM (2016) Face Recognition by Metropolitan Police Super-Recognisers. PLoS ONE 11(2): e0150036. doi:10.1371/journal.pone.0150036
Editor: Kun Guo, University of Lincoln, UNITED KINGDOM
Received: November 10, 2015; Accepted: February 8, 2016; Published: February 26, 2016
Copyright: © 2016 Robertson et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n.323262, and from the Economic and Social Research Council, UK (ES/J022950/1). http://erc.europa.eu/; http://www.esrc.ac.uk/. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Face recognition is a fundamental part of our everyday lives. We can recognise familiar faces over a very wide range of settings, including visually difficult conditions such as poor lighting or degraded images . However, our ability to recognise familiar people does not generalise well to unfamiliar faces. It has been known for many years that eyewitness identification is highly error-prone (see  for a review). More recently, it has become clear that face matching is also very difficult for unfamiliar viewers. Even with no requirement to remember anything, viewers find it very difficult to match two photos of the same person even when these are taken on the same day, in good lighting, and when there are no time-restrictions [3–7]. The difficulty of unfamiliar face matching extends into real life settings; in modern society, we are often asked to prove our identity by presenting photo-ID. However, viewers find it very difficult to match a live face to a photo or video, making many errors [8–10].
Face recognition, and particularly unfamiliar face matching, is also a key part of forensic and security operations. To cross national borders we often have to present a passport (with photo), and if we are ever accused of a crime, CCTV evidence may be used. The frequency of this task, combined with its difficulty, raises the question of whether professionals comparing face images (e.g. passport officers, the police or security agencies) can do so effectively. While there has been movement towards automation of facial recognition in some settings, automatic face recognition is neither universal nor infallible, and human operators almost always have the final say in identification judgements . The face recognition accuracy of trained professionals is therefore critical to the accuracy of the system as a whole.
The results of studies on face experts have been somewhat mixed. White et al  tested a group of 49 passport issuing officers and demonstrated that they were no better on standard measures of face matching than a group of untrained students. Similarly, Burton et al  showed that a group of police officers were no better than untrained students at matching poor quality CCTV images to facial photos. Furthermore, the work of forensic experts, providing specialist evidence on face matching to courts has been challenged as unreliable [13–14]. More recently, some studies of professionals specifically trained to make forensic image comparisons have demonstrated that they can out-perform untrained students [15–16]. In the case of passport officers, a sub-group with particular responsibility for facial comparison performed better than other officers .
Interestingly, any differences that are observed between professional and novice groups are typically small by comparison to within-group differences. For example, in the study on passport officers , performance on a live-face-to-photo matching task ranged between 100% and 70% within the group (where chance level was 50%). Indeed there is growing appreciation of the large individual variation in face recognition abilities within the population [17–18]. Furthermore, the extreme ends of the distribution of face recognition ability have drawn interest in their own right. Congenital prosopagnosia refers to the phenomenon by which some members of the general population are extremely bad at face recognition (in the clinical range) despite having no known pathology [19–20]. At the top end of the distribution are super-recognisers who display very high levels of ability on a range of face recognition tasks .
Importantly for applied face recognition, these very large individual differences appear to be highly stable over time. Among passport officers, White et al  found no correlation between years of service (range 0–21 years) and task performance (range 58–95%), implying that occupational experience does nothing to improve accuracy. Moreover, the lack of any group difference between the passport officers, who had received professional training in facial image comparison, and undergraduate students, who had not, implies that the occupational training had no effect on accuracy either. The absence of any training benefit in this context is consistent with laboratory studies, where generalizable training benefits have typically been small or absent [22–24].
Given the scale of the individual differences in face recognition ability, and the limited impact of experience and training, it seems natural to propose personnel selection as an approach to optimising performance: simply recruiting individuals with a talent for face recognition should improve the accuracy of the system . To this end, the Metropolitan Police, the force responsible for London, have recently set up a ‘super-recogniser team’ . Members of the team are recruited from within the police force, on the basis of a particular interest in face recognition, and an undisclosed facial memory test. They perform a wide variety of tasks, including identification of suspects live and from security cameras. They often work with low-quality images and with people who are attempting to hide their identity.
In this study we ask a simple question: are the police super-recognisers actually better than the normal population on tests of face recognition? Since they give evidence in court based on their expertise, and the recognition decisions they make are (by definition) hard for other viewers to confirm, we sought to test their ability over several face recognition tasks, including standardised tasks and a comparison to regular police officers. To anticipate the findings, we did establish that the ‘super-recognisers’ we tested had face recognition skills which far exceed the general population.
Method and Results
This study was approved by the Ethics Committee of the Department of Psychology, University of York. All participants gave written informed consent. The individuals shown in Fig 1 have given written informed consent (as outlined in PLOS consent form) to publish these images.
The top pair shows two instances of the same person, the bottom pair shows two different people. The individuals shown in Fig 1 have given written informed consent (as outlined in PLOS consent form) to publish these images.
Participants and Testing
We studied four Metropolitan Police staff who were part of the super-recogniser (SR) team at New Scotland Yard Central Forensic Image Unit, London, at the time of testing (summer 2015). All were male, and the mean age was 40 (range 33–47). The SRs were tested during a normal working day, in a shared office—their usual working environment.
We carried out three tests of face recognition. 1) The Glasgow Face Matching Test (GFMT) , which is a standardised test of unfamiliar face matching; 2) an unfamiliar face matching test that was designed to be particularly difficult, the Models Face Matching Test (MFMT) ; and 3) the Pixelated Lookalikes Test (PLT)—a new test of familiar face recognition based on very poor quality images.
Task 1: Glasgow Face Matching Test
The GFMT (short version) consists of 40 pairs of unfamiliar faces, half of which are same identity pairs and half of which are different identity pairs (see Fig 1 for examples). Each face image is front facing, neutral in emotional expression, and standardised to a width of 350 pixels (greyscale, 72ppi resolution). Photos were taken a few minutes apart using different cameras. Fig 1 shows examples from the test. For more details see .
Printed paper booklets were used to present the face pairs (one pair per page). Participants were instructed to decide whether the face pairs depicted the same person or two different people. They were required to make their responses by circling the word ‘SAME’ or ‘DIFFERENT’ printed in bold text below the face images. Same face pairs and different face pairs were randomly intermixed. No time constraints were placed on the participants, with individuals working through the booklet at their own pace.
Super-recogniser performance on the GFMT was compared with data from 194 police trainees from a UK training college, who had been used to provide original norms for this version of the test . This was a younger group, on average, than the super-recognisers (mean 26 years, range 18–46), though previous research has demonstrated no association between age and performance on this test .
The super-recognisers were found to outperform a large group of police trainees. While mean accuracy on this test, as derived from the normative police trainee data, is 81.3% (SD = 9.4%), mean accuracy in the super-recogniser group was found to be 95.8% (SD = 4.3%). Fig 2 shows individual scores for each of the SRs, demonstrating very high levels of performance, including one participant scoring 100% accuracy on this test.
Performance of super-recognisers (SR1–4; black) and comparison viewers (white) on three different tests of face recognition—the GFMT (left column), the MFMT (middle column), and the PLT (right column). Vertical lines indicate the range of scores for comparison groups, the deleted portion of the line shows the standard deviation, and the horizontal notch shows the mean. In all three tasks, chance performance is 50%.
Matching Task 2: Models Face Matching Test
The Models Face Matching Test (MFMT, ) is a paired face-matching test with the same format as the GFMT, but designed to be more difficult. It consists of 90 pairs of faces, half of which show the same identity, and half of which show different identities. The images depict male models, and were collected from a single portfolio website for professional models. The models carry out photo-shoots with wide variation in clothes, hairstyles, lighting conditions and cameras. For this test, images were presented in full colour and were cropped to display the face only and were standardised to a size of 300 x 420 pixels (for more details see ; example images cannot be reproduced here for copyright reasons).
The procedure for the models matching task was identical to that descried for the GFMT (Matching Task 1) above. Test items were presented in a printed booklet, and participants were asked to make same/different judgements for each pair, with no time limits.
On this task, super-recogniser performance was compared to a normative control group reported in . Participants were 54 undergraduate university students (25 male) with a mean age of 23 years (SD = 4 years, range = 18–40 years).
Fig 2 shows performance for super-recognisers and controls. Performance was lower overall than for the GFMT, confirming that this is a harder test, as designed. More strikingly, the super-recognisers once again consistently out-performed the controls. The mean accuracy for the control group, derived from university undergraduates, was 73.6% (SD = 10.9%). Mean accuracy for the super-recognisers was considerably higher at 90.3% (SD = 1.9%), and there was consistently high performance across this group.
Matching Task 3: Pixelated Lookalike Test
We have seen that the police super recognisers consistently outperform control groups on two different tests of unfamiliar face matching. For the final test, we consider familiar face recognition. As this arises in some policing contexts (for example recognition of previous offenders in security video), it is important to establish whether the high levels of performance we have reported so far extend to the familiar case. Because familiar face recognition is generally so accurate , we developed a test that makes it more challenging than normal. The test materials are pixelated images of celebrities and celebrity lookalikes, designed to emulate the poor quality images which are often reviewed in forensic practice.
The pixelated lookalike test (PLT) was designed to be presented in the same way as the unfamiliar tests above, i.e. as a same/different matching pairs task. We chose 30 famous celebrities, and selected three images for each of these people from an internet search. One criterion for selection was the ready availability of professional lookalikes for these celebrities, and we chose one representative image of a lookalike for each celebrity. We were therefore able to create two trials per celebrity: a match trial comprising two images of that person, and a mismatch trial comprising an image of the celebrity and an image of the corresponding lookalike. Each image was only used once across match or mismatch trials.
To emulate poor quality security surveillance we presented all images in low resolution (30 pixels wide x 45 pixels high). This also served the purpose of reducing matching images to the level that we expect to avoid ceiling effects in the matching trials. Examples of the stimuli are shown in Fig 3.
Images on the left show different identities (with the imposter face on the right). Images on the right show the same identity.
The procedure was similar to that used in the unfamiliar tests. Super-recognisers were presented with a printed booklet containing 60 trials, half of which showed the same identity and half of which showed different identities. They were asked to make a same/different judgement for each trial. We were keen to ensure that this was genuinely a test of familiar face matching, and so following the face-matching task participants used a numerical scale to indicate their level of familiarity with each celebrity whose face had been viewed in the task (from 1 [completely unfamiliar] to 10 [extremely familiar]).
30 students (11 male) were recruited from the University of York to act as a comparison group for this study. Mean age was 19.7 (range 18–28, SD = 2.8). All took part in return for course credit. As the comparison group were generally younger than the super-recognisers, we took care to ensure that any differences in performance could not arise through some celebrities being more familiar to one group or the other, see below.
We are concerned here primarily with familiar face matching, and so we considered only those faces which participants rated as highly familiar in the post-test (8–10 on the 10-point scale). Fig 2 shows the overall pattern of performance. Once again, the police super-recognisers consistently performed with far greater accuracy than student participants. Overall, the controls scored just 73% correct, whereas the police super-recognisers scored with 93% accuracy in the lookalike test. All super-recognisers performed much better than the control mean, with one super-recogniser performing perfectly.
The preceding analysis was based on the subset of faces that viewers rated as highly familiar. Interestingly, super-recognisers gave high familiarity ratings (8–10) to a very high proportion of faces compared with controls (70% of faces for super-recognisers; 37% of faces for controls). To better understand the super-recogniser advantage, we conducted two further analyses. First, we compared accuracy of super-recognisers and controls on faces that they rated as less familiar (0–7 on the 10-point scale; i.e. those not included in the above analysis). Police super-recognisers outperformed controls on these faces too (80% accuracy for super-recognisers; 67% accuracy for controls), implying that their performance advantage holds across the whole familiarity continuum. Second, we analysed the control participants’ data for an association between (i) the proportion of faces that were given high familiarity ratings and (ii) the level of accuracy on those highly familiar faces. We found a significant positive correlation between these two measures [r(28) = 0.39, p < .05], such that the highest performing controls were qualitatively similar to the super-recognisers. These analyses support the original comparison, i.e. our SRs are performing at well above the levels of controls, even when group differences in famous face familiarity are taken into account.
We have now seen consistently high performance of the police super-recognisers across three different tasks. Interestingly, their strong performance on the tests of unfamiliar face matching (GFMT and MFMT), are accompanied by strong performance on familiar face matching too (PLT). These are all components of everyday policing, but familiar and unfamiliar face processing abilities are not always reported to be associated .
The selection criteria for the Metropolitan Police super recogniser team are not made public—as one might expect. However, they appear to be successful, in that the SRs tested here all performed at well-above normal levels. Furthermore, in the case where we have police controls, the SRs performed well above these too. There is also accumulating evidence that people who self-identify as super-recognisers in the general population perform well on forensically-relevant tasks similar to those used here .
It is beginning to be clear that simple membership of a professional group dealing with faces is not sufficient to confer high levels of performance [1,12]. For example, previous work with police officers and passport officers shows that, on average, both professional groups perform face tasks with the same levels as the general public. However, the wide levels of individual variation in the general population mean that large organisations have the opportunity to sample from within their ranks, and allocate those with appropriate skills to relevant jobs. The evidence to date suggests that training for unfamiliar face recognition is typically not highly effective. Recruitment is therefore key to improving overall levels of performance. The tests reported here provide a consistent indicator of expertise and are quick and easy to administer. We therefore suggest that they may be useful in future recruitment for other agencies. All three tests are available from the authors.
S1 Table. Scores for super recognisers and comparison participants on each of the three tests.
S1 Text. Figure permissions.
We are very grateful to Detective Chief Inspector Mick Neville of the Metropolitan Police for allowing us access to his team. We also thank the four police Super Recognisers who took part in the studies reported here.
Conceived and designed the experiments: DJR EN AJD RJ AMB. Performed the experiments: DJR. Analyzed the data: DJR EN AJD RJ AMB. Contributed reagents/materials/analysis tools: DJR EN AJD. Wrote the paper: DJR EN AJD RJ AMB.
- 1. Burton AM, Wilson S, Cowan M, Bruce V. Face recognition in poor-quality video: Evidence from security surveillance. Psychol Sci. 1999;10(3):243–8. doi: 10.1111/1467-9280.00144
- 2. Lindsay RCL, Mansour JK, Kalmet N, Bertrand MI, Melsom L. Face perception and recognition in eyewitness memory. In: Calder AJ, Rhodes G, Johnson MH, Haxby JV, Keane J. editors. The Oxford handbook of face perception. Oxford: Oxford University Press; 2011. pp. 307–328.
- 3. Bruce V, Henderson Z, Greenwood K, Hancock PJB, Burton AM, Miller P. Verification of face identities from images captured on video. J Exp Psychol Appl. 1999;5(4):339–60. doi: 10.1037/1076-898x.5.4.339
- 4. Bruce V, Henderson Z, Newman C, Burton AM. Matching identities of familiar and unfamiliar faces caught on CCTV images. J Exp Psychol Appl. 2001;7(3):207–18. pmid:11676099 doi: 10.1037/1076-898x.7.3.207
- 5. Burton AM, Jenkins R. Unfamiliar face perception. In: Calder AJ, Rhodes G, Johnson MH, Haxby JV, Keane J. editors. Oxford Handbook of Face Perception. Oxford: Oxford University Press; 2011. pp. 287–306.
- 6. Clutterbuck R, Johnston RA. Exploring levels of face familiarity by using an indirect face-matching measure. Perception. 2002;31(8):985–94. pmid:12269591 doi: 10.1068/p3335
- 7. Clutterbuck R, Johnston RA. Matching as an index of face familiarity. Vis cogn. 2004;11(7):857–69. doi: 10.1080/13506280444000021
- 8. Kemp RI, Towell N, Pike G. When seeing should not be believing: Photographs, credit cards and fraud. Appl Cogn Psychol. 1997;11(3):211–22. doi: 10.1002/(sici)1099-0720(199706)11:3<211::aid-acp430>3.3.co;2-f
- 9. Megreya AM, Burton AM. Matching faces to photographs: Poor performance in eyewitness memory (without the memory). J Exp Psychol Appl. 2008;14(4):364–72. doi: 10.1037/a0013464. pmid:19102619
- 10. Davis JP, Valentine T. CCTV on trial: Matching video images with the defendant in the dock. Appl Cogn Psychol. 2009;23(4):482–505. doi: 10.1002/acp.1490
- 11. White D, Dunn JD, Schmid AC, Kemp RI. Error Rates in Users of Automatic Face Recognition Software. PLoS One. 2015;10(10):e0139827. doi: 10.1371/journal.pone.0139827. pmid:26465631
- 12. White D, Kemp RI, Jenkins R, Matheson M, Burton AM. Passport Officers’ Errors in Face Matching. PLoS One. 2014;9(8):e103510. doi: 10.1371/journal.pone.0103510. pmid:25133682
- 13. Edmond G, Kemp RI, Porter G, Hamer D, Burton AM, Biber K, et al. Atkins v The Emperor: the “cautious” use of unreliable “expert” opinion. Int J Evid Proof. 2010 Apr 1;14(2):146–66. doi: 10.1350/ijep.2010.14.2.349
- 14. Kleinberg KF, Vanezis P, Burton AM. Failure of Anthropometry as a Facial Identification Technique Using High-Quality Photographs. J Forensic Sci. 2007;52(4):779–83. pmid:17524059 doi: 10.1111/j.1556-4029.2007.00458.x
- 15. Norell K, Läthén KB, Bergström P, Rice A, Natu V, O’Toole AJ. The Effect of Image Quality and Forensic Expertise in Facial Image Comparisons. J Forensic Sci. 2015;60(2):331–40. doi: 10.1111/1556-4029.12660. pmid:25537273
- 16. White D, Phillips PJ, Hahn CA, Hill M, O’Toole AJ. Perceptual expertise in forensic facial image comparison. Proc R Soc B Biol Sci. 2015;282(20151291). doi: 10.1098/rspb.2015.1292
- 17. Megreya AM, Burton AM. Unfamiliar faces are not faces: Evidence from a matching task. Mem Cognit. 2006;34(4):865–76. pmid:17063917 doi: 10.3758/bf03193433
- 18. Yovel G, Wilmer JB, Duchaine BC. What can individual differences reveal about face processing? Front Hum Neurosci. 2014;8(August):1–9. doi: 10.3389/fnhum.2014.00562
- 19. Behrmann M, Avidan G. Congenital prosopagnosia: face-blind from birth. Trends Cogn Sci. 2005;9(4):180–7. pmid:15808500 doi: 10.1016/j.tics.2005.02.011
- 20. Duchaine B, Nakayama K. Dissociations of face and object recognition in developmental prosopagnosia. J Cogn Neurosci. 2005;17(2):249–61. pmid:15811237 doi: 10.1162/0898929053124857
- 21. Russell R, Duchaine BC, Nakayama K. Super-recognizers: People with extraordinary face recognition ability. Psychon Bull Rev. 2009;16(2):252–7. doi: 10.3758/PBR.16.2.252. pmid:19293090
- 22. Alenezi HM, Bindemann M. The effect of feedback on face-matching accuracy. Appl Cogn Psychol. 2013;27(6):735–53. doi: 10.1002/acp.2968
- 23. Hussain Z, Sekuler AB, Bennett PJ. How much practice is needed to produce perceptual learning? Vision Res. 2009;49(21):2624–34. doi: 10.1016/j.visres.2009.08.022. pmid:19715714
- 24. White D, Kemp RI, Jenkins R, Burton AM. Feedback training for facial image comparison. Psychon Bull Rev. 2014;21(1):100–6. doi: 10.3758/s13423-013-0475-3. pmid:23835616
- 25. Potts L. The police super-recognisers putting names to faces. BBC News. Available: http://www.bbc.co.uk/news/uk-england-34544199. Accessed 10 November 2015.
- 26. Burton AM, White D, McNeill A. The Glasgow Face Matching Test. Behav Res Methods. 2010;42(1):286–91. doi: 10.3758/BRM.42.1.286. pmid:20160307
- 27. Dowsett AJ, Burton AM. Unfamiliar face matching: Pairs out-perform individuals and provide a route to training. Br J Psychol. 2015;106:433–45. doi: 10.1111/bjop.12103. pmid:25393594
- 28. Bobak AK, Hancock PJB, Bate S. Super-recognisers in Action: Evidence from Face-matching and Face Memory Tasks. Appl Cogn Psychol. 2016;30:81–91. doi: 10.1002/acp.3170