Defining Seropositivity Thresholds for Use in Trachoma Elimination Studies

Background Efforts are underway to eliminate trachoma as a public health problem by 2020. Programmatic guidelines are based on clinical signs that correlate poorly with Chlamydia trachomatis (Ct) infection in post-treatment and low-endemicity settings. Age-specific seroprevalence of anti Ct Pgp3 antibodies has been proposed as an alternative indicator of the need for intervention. To standardise the use of these tools, it is necessary to develop an analytical approach that performs reproducibly both within and between studies. Methodology Dried blood spots were collected in 2014 from children aged 1–9 years in Laos (n = 952) and Uganda (n = 2700) and from people aged 1–90 years in The Gambia (n = 1868). Anti-Pgp3 antibodies were detected by ELISA. A number of visual and statistical analytical approaches for defining serological status were compared. Principal Findings Seroprevalence was estimated at 11.3% (Laos), 13.4% (Uganda) and 29.3% (The Gambia) by visual inspection of the inflection point. The expectation-maximisation algorithm estimated seroprevalence at 10.4% (Laos), 24.3% (Uganda) and 29.3% (The Gambia). Finite mixture model estimates were 15.6% (Laos), 17.1% (Uganda) and 26.2% (The Gambia). Receiver operating characteristic (ROC) curve analysis using a threshold calibrated against external reference specimens estimated the seroprevalence at 6.7% (Laos), 6.8% (Uganda) and 20.9% (The Gambia) when the threshold was set to optimise Youden’s J index. The ROC curve analysis was found to estimate seroprevalence at lower levels than estimates based on thresholds established using internal reference data. Thresholds defined using internal reference threshold methods did not vary substantially between population samples. Conclusions Internally calibrated approaches to threshold specification are reproducible and consistent and thus have advantages over methods that require external calibrators. We propose that future serological analyses in trachoma use a finite mixture model or expectation-maximisation algorithm as a means of setting the threshold for ELISA data. This will facilitate standardisation and harmonisation between studies and eliminate the need to establish and maintain a global calibration standard.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 external reference specimens estimated the seroprevalence at 6.7% (Laos), 6.8% (Uganda) and 20.9% (The Gambia) when the threshold was set to optimise Youden's J index. The ROC curve analysis was found to estimate seroprevalence at lower levels than estimates based on thresholds established using internal reference data. Thresholds defined using internal reference threshold methods did not vary substantially between population samples.

Conclusions
Internally calibrated approaches to threshold specification are reproducible and consistent and thus have advantages over methods that require external calibrators. We propose that future serological analyses in trachoma use a finite mixture model or expectation-maximisation algorithm as a means of setting the threshold for ELISA data. This will facilitate standardisation and harmonisation between studies and eliminate the need to establish and maintain a global calibration standard.

Author Summary
Trachoma is caused by the bacterium Chlamydia trachomatis (Ct). Individuals who have previously been infected with Ct carry specific antibodies in their blood. Recent studies have suggested that these antibodies may be a good way to estimate the intensity of transmission of this bacterium in a population. Among people who do have antibodies (seropositives) there is variation in the amount that is detectable in their blood. Some people have such low levels that differentiating them from those who don't have antibodies (seronegatives) is challenging. We used a new test for Ct antibodies on blood specimens from three countries. Our test worked extremely well, giving reproducible results when we tested the same samples multiple times. We compared four different methods for setting the position of the threshold line between seronegatives and seropositives. The estimated transmission intensity in each country varied depending on the threshold method used, but two methods that used statistical modelling algorithms to define the two groups performed consistently across all three countries' samples. We recommend that future studies should consider adopting the statistical modelling approaches, as they are objective tests that require no reference material and allow for standardisation between studies.

Introduction
Trachoma is caused by ocular infection with the bacterium Chlamydia trachomatis (Ct) [1]. It is the leading infectious cause of blindness worldwide [2]. The World Health Organization (WHO) estimates that over 200 million people in 42 countries are at risk from trachoma blindness [3], that 1.4 million people experience moderate to severe visual impairment because of the disease and that of these, around 450,000 have been irreversibly blinded [4].
The most commonly used system for estimating the prevalence of trachoma uses the WHO simplified grading system [5] of clinical signs of trachoma. These include trachomatous inflammation-follicular (TF), trachomatous inflammation-intense (TI) and trachomatous trichiasis (TT), which is the rubbing of the eyelashes against the globe of the eye. WHO guidelines recommend the SAFE strategy to combat trachoma: Surgery to treat trichiasis, annual mass-drug administration (MDA) of Antibiotics to treat Ct infection and Facial cleanliness and Environmental improvement to reduce transmission. Implementation of the SAFE strategy and cessation of MDA depends on the prevalence of TF in children aged 1-9 years. Concerns have been raised about the appropriateness of having treatment guidelines based on clinical signs such as TF and TI. In some low endemicity [6,7] and post-MDA settings [8,9], both TF and TI correlate poorly with the prevalence of Ct infection and both clinical signs are sometimes associated with bacteria other than Ct [10,11].
Tests for infection have been suggested as possible tools for trachoma control programmes. Numerous nucleic acid-amplification tests (NAATs) have been developed, including the adapted use of commercial kits originally designed for diagnosing genitourinary Ct infections [12][13][14][15][16]. NAATs have been shown to be cost-effective in some settings [17] but concerns have been raised that the per-sample cost of NAATs can be too much for national eye health programmes in countries where trachoma remains a problem [18]. The cost of specialist devices and platforms for deploying NAATs can also be prohibitive.
Serology has been suggested as a possible alternative to clinical signs and infection testing, as it indicates the cumulative exposure to Ct [19,20], with the potential to assess the impact of intervention efforts [21]. By monitoring the exposure to Ct of the youngest age groups, born after implementation of MDA, serology may prove useful for confirming that transmission has been interrupted [22].
Serology has recently been used in several studies [19,20,[22][23][24], three of which have taken place in districts that have completed three or more rounds of MDA [22][23][24]. These studies have used the multiplex bead array platform (Bio-rad, Hercules, California) to detect antibodies against Pgp3 and CT694, antigens thought to be highly immunogenic [25]. Because this platform is costly, technically complex and unlikely to be found in most laboratories in resource-limited regions, alternative, simpler methods of antibody detection have been proposed [22,26].
To make serological testing more widely accessible, the Pgp3/CT694 assay used in previous studies [19,20,[22][23][24] has been adapted for use in a simple Pgp3-specific enzyme-linked immunosorbent assay (ELISA). Pgp3 is a Ct-specific 84kDa heterotrimeric protein [27] and is recognised by specific IgG [28]. It is thought to be the most immunodominant of the proteins encoded by the Ct plasmid that is unique to Ct [29].
ELISAs are routinely used to detect specific IgG in dried blood spots [30][31][32][33][34]. ELISA data, measured as optical density (OD) is quantitative and continuous. It is desirable to be able to assign a classification (seronegative, seropositive) to each sample, but this can be challenging because the distributions of OD values in the negative and positive populations may overlap to a greater or lesser extent [34]. The aim of this study was to determine the most appropriate method for setting the threshold for positivity as well as to determine the usefulness of an anti-Pgp3-specific ELISA for identifying communities in which the transmission of ocular Ct has been interrupted. We tested dried blood spots collected as part of trachoma surveys in three countries: Laos, Uganda and The Gambia. We evaluated the age-specific seroprevalence using four methods and compared the resulting estimates of prevalence of seropositivity based on six possible thresholds. We discuss the merits of the different methods in the context of programmes seeking to monitor the elimination of trachoma as a public health problem. . In all countries, a local health official explained the study to each head of household, answered any questions and explained the written consent form before requesting their agreement and signature. Written (thumbprint or signature) consent was obtained from each participant or the parent or guardian of each child under 18 who participated; assent was sought from children aged 12-17.

Clinical assessment
Trachoma graders were trained according to the Global Trachoma Mapping Project (GTMP) protocols and were required to score a minimum kappa of 0.7 for the diagnosis of TF in an inter-grader agreement test with 50 eyes of 50 children [35,36]. The samples in Laos were collected in November 2014 as part of a follow-up study to the GTMP work completed there. Three districts in three regions were selected based on baseline trachoma survey findings that indicated potential 'hot spots' [37]. From these three regions, all children aged 1-9 in selected villages were invited to participate. Trachoma elimination programmes have never been undertaken in Laos. In Uganda, samples were collected as part of a trachoma impact survey in May 2014, following three years (2010-2012) of implementation of the A, F and E components of the SAFE strategy in two regions (Pader and Agogo). Prior to MDA, trachoma was considered highly endemic in these regions, although no data is publicly available. This study was a population based prevalence survey, which used a two stage sampling strategy; villages were selected with probability proportional to size, and households were randomly selected within each selected village based on a household list produced by the village chief and local health officials. All children aged 1-9 years in the selected households were invited to participate. In The Gambia, a population based prevalence survey using a two stage sampling strategy was undertaken in February-March 2014; villages were selected with probability proportional to size, and households were randomly selected within each selected village based on a household list produced by the village chief and local health officials. One region, Lower River Region (LRR) had undergone three rounds of annual (2007-2009) MDA for trachoma, while the other, Upper River Region (URR), has never had trachoma elimination activities because trachoma has not been of a sufficiently high prevalence to justify implementation. All members of randomly selected households were invited to participate, regardless of age.
After informed consent was obtained, a trachoma grader examined both eyes for signs of trachoma using a binocular loupe (2.5×) and a torch. The grader changed gloves between each participant to minimise the risk of carry-over contamination. Antibiotics were provided to individuals with evidence of active trachoma and/or the affected household, according to each country's national policy.

Blood collection
Each participant had a finger-prick blood sample collected onto filter paper (Trop-Bio Pty, Townsville, Australia), using a sterile single-use lancet (BD Microtrainer, Dublin, Ireland). Each filter paper had six extensions, calibrated to absorb 10 μL of blood. Samples were airdried for approximately five hours and then stored in individual Whirl-Pak plastic bags (Nasco, Modesto, California) with desiccant sachets (Whatman, Little Chalfont, UK) before being stored at -20˚C.
All samples were shipped to LSHTM for testing.

ELISA analysis of anti-Ct-Pgp3 antibodies
Dried blood spots (DBS) were tested for antibodies against Pgp3. One whole filter paper extension per sample was eluted in 250 μL PBS + 0.3% v/v Tween-20 (PBSTw) (Sigma-Aldrich, Dorset, UK)+ 5% w/v non-fat milk powder (PBSTw-milk) (AppliChem, Maryland Heights, USA) overnight at 4˚C. Immulon 2HB 96-well plates (VWR International, Lutterworth, UK) were coated with recombinant Pgp3 protein [19] overnight at 4˚C (25ng per well in 0.1M sodium carbonate buffer, pH 9.6). Plates were washed with PBSTw to remove unbound protein, blocked with 100 μL PBSTw for 1 hour at 4˚C and washed two times. Control sera with known ratios of Pgp3 antibodies (1000 units, 500 units, 200 units, 50 units and negative control serum) and a blank consisting of PBSTw-milk were run on every plate. All samples and controls were tested in triplicate at a 1:50 dilution in PBSTw-milk. After 2 hours incubation on an orbital shaker at room temperature, wells were washed 5 times and 50 μL of an HRP-labelled mouse anti-human IgG(Fc)-HRP (Southern Biotech, Birmingham, USA) diluted 1:32,000 was added. Plates were incubated for 1 hour on an orbital plate shaker at room temperature then washed 5 times to remove unbound antibody. Fifty microliters of TMB (KPL, Gaithersburg, USA) was added and the mixture was incubated in the dark for 9 minutes at room temperature. The reaction was stopped with 50 μL 1N H 2 SO 4 and optical density was read at 450 nm (OD 450 ) on a Spectramax M3 plate reader (Molecular Devices, Wokingham UK). Readings were corrected for background by subtracting the average absorbance of three blank wells containing no serum, using Softmax Pro5 software (Molecular Devices, Wokingham UK).

Data analysis
Blanked OD 450 values for samples and controls were normalised by dividing the mean of the three wells against the mean of 200 unit control included on each plate. This was done for each plate. Data analysis for ELISA was performed separately and masked to the results of demographic and clinical information. Statistical analysis was carried out using R [38].

Defining seropositivity
We used four different methods for establishing a threshold for seropositivity: visual inspection of the inflection point (VIP), a finite mixture model (FMM) [39], the expectation-maximisation algorithm (EM) [40] and an receiver operating characteristic (ROC) curve based on previously-assayed dried blood spots from children in Tanzania [19]. There are as yet no accepted guidelines as to what level of sensitivity or specificity is required of a serological test; thus we referred to a previously published template [18] and established three possible thresholds from the ROC curve: one maximising specificity, one with a sensitivity greater than 80% [18] and one optimising the balance between sensitivity and specificity, by maximising Youden's J-index [41].

Visual inflection point (VIP)
We asked 12 arbitrarily selected non-laboratory staff and students at LSHTM to visually examine a simple plot of the sorted OD 450 data curves and determine the inflection point for each sample set. For this exercise, we defined the inflection point as the point on the data curve where the curve changes from predominantly horizontal to predominantly vertical. The 12 values were then averaged to determine the threshold and standard deviations (SDs) were calculated.

Finite mixture model (FMM)
A finite mixture model [42] was used to classify the samples as seropositive or seronegative based on normalised OD 450 values. The data were fitted using maximum likelihood methods, estimating the distribution parameters for each classification group (seropositive or seronegative) as well as the proportion of samples in each category to fit the overall distribution of results [34,43,44]. The threshold for seropositivity was then defined as the mean of the Gaussian distribution of the seronegative population plus three SDs of the seronegative population [44,45]. FMM was performed on each set of samples, based on country of origin.

Expectation-maximisation algorithm (EM)
The expectation-maximisation algorithm is similar to FMM in that it classifies samples based on population parameters. It relies on the Bayesian information criterion to select an appropriate model. EM is an iterative optimization method to estimate some unknown parameter [40], in this case the threshold between seropositive and seronegative, given the number of clusters and the normalised OD 450 values. EM estimates where to set the threshold while maximising the likelihood of each sample parameter [40]. Using the 'mclust' package in R, parameters were set to specify a univariate model with equal variance between 2 clusters [46].

Receiver operating characteristics (ROC) Curve
Serum samples from 122 children from the United States and blood spots from 11 Ct-specific PCR-positive children from Tanzania were used to make the original ROC curve [19]. A second set of 124 Tanzanian dried blood spots were assayed using the multiplex bead array and dichotomised based on the original threshold. These samples were then re-tested with the ELISA and the data from this assay were used to generate the ROC curve used in this manuscript. The R package 'Epi' [47] was used to generate three different thresholds: the first of which maximises Youden's J-index to balance sensitivity and specificity [41], the second and third were set for high sensitivity (minimum 80%) and high specificity (minimum 98%), respectively.

Statistical analysis
The prevalence of signs of trachoma and the exact binomial confidence intervals were calculated using the R 'Stats' package [38]. Due to the low prevalence of clinical signs, Fisher's exact test was used to test for association [48].
Seroprevalence in each population was calculated using each of six thresholds. We also examined the relationship between the clinical data and serological data. Due to the low prevalence of clinical signs, data for clinical signs were pooled across all three studies.
Observed frequencies of clinical signs of trachoma in the various samples are summarised in Table 1. A more detailed description, including prevalence by age and gender, is presented in Supplementary S1, S2 and S3 Tables.

Serological analysis
The five serum controls were tested in triplicate and the mean values for each plate were tracked across each sample set. The coefficient of variation was less than 10% in each of the three replicates of each control specimen. Inter-plate variation of controls was less than 15% across all plates in each sample set as shown in Table 2. A plate was permitted to have no more than one control with >15% variation from the sample set mean for that control; if a plate had two or more controls with values more than 15% greater or lesser than the sample set mean, the plate was re-run. Less than 5% of plates were re-run due to this. Table 2 shows the mean values and the accepted 15% range for the five controls. The sample set for each country was tested separately. Each plate showed a large but narrowly distributed proportion of low-OD specimens, with a smaller proportion of higher-OD specimens. Fig 1 shows typical results from an ELISA plate. In all three sample sets, density data peak around 0.25 OD 450 ; this can be seen in centre panels B in Figs 2, 3 and 4.

Visual inflection point (VIP)
The leftmost panels of Figs 2A, 3A and 4A were shown to 12 people, each of whom was asked to determine each graph's point of inflection. The mean of the inflection points was calculated for each sample set and the SD and range were calculated. For Laos, the mean threshold was

Expectation-maximisation algorithm (EM)
An EM model was fitted to all three sample sets, specifying parameters for a univariate model with equal variance between 2 clusters [45]. The thresholds were set at 0. 65  were reasonably conformant and appeared to favour threshold placements that were substantially lower than those set by the ROC, which is calibrated with Tanzanian specimens, even when a higher sensitivity (i.e., lower threshold value) test was specified in the ROC analysis. As a consequence of this, the seroprevalence estimates that were determined by VIP, EM and FMM were similar to one another, while the seroprevalence estimates set by any of the ROC curve thresholds were much lower in all three populations (Table 3).
Seroprevalence for each sample set, using the six different thresholds were calculated, along with 95% confidence intervals. As the threshold increases in value, fewer specimens are classified as being seropositive, decreasing the seroprevalence. The seroprevalence for each sample set at each threshold is presented in Table 3. Seroprevalence for each country by sex, region and age is provided in Supplementary S4, S5 and S6 Tables. Table 4 presents the proportion of seropositive samples by clinical grade, as estimated by each threshold specification. Due to the relatively low prevalence of all clinical signs, prevalence values for have been pooled.

Discussion
Several previous studies have used anti-Pgp3-specific ELISAs to test for genital chlamydial infection [21,[51][52][53][54] but only one [55] has used the method for the detection of antibodies against ocular chlamydial infection. In this study, we used an ELISA test to detect IgG antibodies specific to the Ct protein Pgp3 in studies with large sample sizes from three countries. To Table 3. Seroprevalence by Country, as estimated using alternate threshold specification methods. date, this is the largest study to measure antibodies to Ct in trachoma-endemic populations and the first to look at populations from more than one country, including East Africa, West Africa and Southeast Asia. We have shown that within and between runs there is a low coefficient of variation in the assay and that the bimodal data distribution of normalised OD 450 values in those samples reflects that which would be expected in populations where a minority of individuals are seropositive and where there is a broad range of antibody titres in the seropositive sub-population. This is best observed in the data from the Gambia (Fig 4), where we included adults in the sample and where the more substantial seropositive sub-population can be accounted for by both sexually transmitted Ct infection and the formerly high level of endemicity of trachoma in the Gambia. Clinical specimens without any Ct-specific IgG still have some degree of baseline reactivity in ELISA tests because of non-specific binding of irrelevant antibodies. There is also substantial between-specimen variation in seropositives, which reflects natural variation in the antibody titre. The potential for there being substantial overlap between the seronegative specimens with high baselines and the seropositives with low anti-Pgp3 antibody titres means that it can be difficult to differentiate between the two groups.
There is very little published information on the prevalence of trachoma in Laos and Uganda [56], but on the evidence of our analysis, clinical signs of disease are rare and the levels of seropositivity appear to be comparable to those in The Gambia, where elimination has been declared. We have no data on the prevalence of Ct infection in the communities in Laos and Uganda, nor is there any longitudinal data to monitor changes in antibody levels following documented infection. Numerous studies have looked at the prevalence of ocular Ct infection in The Gambia and shown it to be negligible [7,57,58]. All the populations we studied have received MDA and we did not screen a population with higher prevalence levels. Further research in meso-and hyper-endemic populations will be needed in order to assess the utility of this method in other settings.
We have shown how the method that is selected for the statistical interpretation of ELISA data (with particular regard to the method of threshold specification) can greatly change the population prevalence estimates that are derived. Methods that indicate the use of a higher threshold value are likely to be more specific and have a higher positive predictive value (PPV), but they do incur a penalty in the form of reduced sensitivity. In the context of post-MDA trachoma control, a test with high PPV is more desirable as over-diagnosis might lead to the inappropriate continuation of MDA interventions. Meanwhile a lower sensitivity test, applied in a low prevalence setting such as the post-MDA population of the Gambia, is likely to have a high negative predictive value (NPV) and the clinical impact of the false negative rate is likely to be modest as long as the sensitivity does not fall too far. In our hands, the ROC analysis supported the use of higher thresholds than did the other methods. Unfortunately the reference material was not sampled from any natural population and so the estimated sensitivity and specificity of the test based on ROC were unlikely to reflect the true performance in the populations that were sampled in this study [59]. We explored three internally calibrated thresholding methods (i.e. using only data generated during the study), all of which specified thresholds at approximately the same OD 450 value. This was true across sample sets from all three countries. It is perhaps unsurprising that similar estimates emerged from FMM and EM, as there are methodological similarities in the two approaches. At face value the VIP method might seem arbitrary and crude, but the human brain can outperform computers in some aspects of pattern recognition and by obtaining a threshold estimate that closely matches that of EM and FMM, our data indicate that the results of a conditionally independent method (VIP) correlate closely with the computational approaches and are able to successfully determine where the most obvious bimodal split in the data occurs. What gives FMM and EM the edge over VIP is that they are more replicable and that the different requirements for higher or lower specificity and sensitivity in different clinical settings can be controlled by changing the number of SDs that the algorithm uses to determine the cut point. For instance, an increasingly specific test could be implemented by setting the threshold at four, five or six SDs of the negative population, rather than three SDs we used here. None of the populations that we surveyed would be expected (based on clinical signs) to have a high level of Ct seropositivity and it may be that the data in Tables 3 and 4 (and Supplementary Data S4, S5 and  S6 Tables) reflect a high false positive rate, low positive predictive value. By adjusting the parameters of the algorithms we might achieve a prevalence estimate that is more accurate, but without any gold standard we can never truly assess how accurate our estimates are. In the Gambian data, using respectively 4 or 5 SDs would have led to cut points at respectively OD = 0.81 and OD = 0.95, values much closer to the cut-points recommended by the ROC analysis.
For programmatic purposes, the absolute value and accuracy of the prevalence estimate is actually somewhat less important than the precision of that estimate and the longitudinal change in repeat measures from the same population across the lifetime of the intervention and monitoring programme. This is because the absolute estimate is clearly highly variable given quite arbitrary choices made during data analysis, whilst percentage changes in population seroprevalence across time (regardless of the actual number values) can be indicative of the effectiveness of MDA. As long as the method is fixed and replicable, then both longitudinal and between-population comparisons are appropriate and will have a fixed level of error, even though the absolute accuracy will remain unknown. The real value of using an internally controlled method such as FMM or EM is that it is possible to use an algorithmic approach that is simple to apply to any data set and which requires no additional testing of external specimens or controls. In this study, we generated a ROC curve based on specimens that had previously been calibrated against the original reference standards described by Goodhew et al [19]. There is no gold standard for serological testing of chlamydia, and mis-classification in the reference standards is likely to have introduced error in the reference panel. Goodhew described how one PCR-positive DBS tested negative for antibodies against Pgp3, while three samples that were in the negative reference group tested positive for antibodies against Pgp3 [19]. As these original reference standards were no longer available, we have had to rely on a second set of standards that were tested against the original standards. Problems relating to the ROC reference specimens could be solved by the establishment of a fully maintained and quality controlled international standard, but this is unlikely to happen as it is would be very difficult to identify a reliable source of large volumes of seropositive plasma.
FMM has been used in numerous serological studies [34,39,43,45,[60][61][62][63][64][65][66] and we propose that it, or the closely related EM, should be considered as the method of choice when performing data analysis for trachoma serology data. In trachoma control programmes, the SD parameter should be adjusted to favour high specificity and a larger number of SDs than used here would seem appropriate. One attractive option would be to use data from a post-elimination country (i.e. the Gambia) to subtract out the background positivity and by doing so calibrate or normalise the test for use in populations where elimination has not yet been reached and prevalence is unknown.
Variability and error are inherent to any diagnostic test and with every change in reference standard and assay technique, variability and error increase over and above any variation that may be inherent in a test due to inter-or intra-centre and user variation. Thus, we believe that an alternate approach to assay design, reference selection and threshold specification should be considered.
For all the sample sets included in this study, the density data peak around 0.25 OD 450 (Figs 2A, 3A and 4A), suggesting that a comparison of seroprevalence levels between populations is possible. Compared to ROC curves, internally-referenced thresholds inherently account for differing background levels in each population. If not accounted for using the ROC curve, this may result in an under-or over-estimation of seroprevalence. This will facilitate the programmatic usage of seroprevalence levels set by the finite mixture model or expectation-maximisation algorithm if serology is to be adopted as an alternative monitoring method.

Conclusion
The ELISA assay presented in this paper is easy-to-use, affordable in terms of both reagents and equipment required, and can potentially be deployed in low-and middle-income countries. The unit cost per sample was less than £4.00; this includes all materials required for sample collection and DBS testing, including reagents, ELISA plates and sterile gloves. Our results show that the technological aspects of the assay are robust and that there is low variation both between replicate samples and plates and between populations, making it possible to compare seroprevalence levels between countries. Internally calibrated thresholding methods, such as the finite mixture model or the expectation-maximisation algorithm are more appropriate than thresholds set by a ROC curve, but for programmatic surveillance, they may require calibration using data from countries where trachoma has been declared as having been eliminated.