A novel comprehensive metric to assess effectiveness of COVID-19 testing: Inter-country comparison and association with geography, government, and policy response

Testing and case identification are key strategies in controlling the COVID-19 pandemic. Contact tracing and isolation are only possible if cases have been identified. The effectiveness of testing should be assessed, but a single comprehensive metric is not available to assess testing effectiveness, and no timely estimates of case detection rate are available globally, making inter-country comparisons difficult. The purpose of this paper was to propose a single, comprehensive metric, called the COVID-19 Testing Index (CovTI) scaled from 0 to 100, derived from epidemiological indicators of testing, and to identify factors associated with this outcome. The index was based on case-fatality rate, test positivity rate, active cases, and an estimate of the detection rate. It used parsimonious modeling to estimate the true total number of COVID-19 cases based on deaths, testing, health system capacity, and government transparency. Publicly reported data from 165 countries and territories that had reported at least 100 confirmed cases by June 3, 2020 were included in the index. Estimates of detection rates aligned satisfactorily with previous estimates in literature (R2 = 0.44). As of June 3, 2020, the states with the highest CovTI included Hong Kong (93.7), Australia (93.5), Iceland (91.8), Cambodia (91.3), New Zealand (90.6), Vietnam (90.2), and Taiwan (89.9). Bivariate analyses showed the mean CovTI in countries with open public testing policies (66.9, 95% CI 61.0–72.8) was significantly higher than in countries with no testing policy (29.7, 95% CI 17.6–41.9) (p<0.0001). A multiple linear regression model assessed the association of independent grouping variables with CovTI. Open public testing and extensive contact tracing were shown to significantly increase CovTI, after adjusting for extrinsic factors, including geographic isolation and centralized forms of government. The correlation of testing and contact tracing policies with improved outcomes demonstrates the validity of this model to assess testing effectiveness and also suggests these policies were effective at improving health outcomes. This tool can be combined with other databases to identify other factors or may be useful as a standalone tool to help inform policymakers.

Introduction Therefore, a single comprehensive metric that assesses testing effectiveness by incorporating an estimate of the true total number of COVID-19 infections in the population was developed using publicly reported data universally accessible across nearly all countries and territories. Model estimates of true period prevalence and detection rate were validated against comparable estimates in the literature. The metric was then used to assess factors associated with COVID-19 testing outcomes. We aimed to create a new tool for policymakers and researchers to comprehensively assess COVID-19 testing outcomes and identify effective policies.

Data input
Data on COVID-19 were collected from the Worldometer website, which collects data directly from government communication channels and is managed by an international team of developers, researchers, and volunteers [30]. The data collected from Worldometer included total cases (C), total deaths (D), total recovered (R), active cases (A), population (P) (in millions), and total tests (T). These data have been reported explicitly or implicitly by the website since at least April 6, 2020. Prior to this date, subsets of these data were available.
Two other input data included the Global Health Security Index Detection and Reporting sub-index (I sys ) [31] and the Economist Intelligence Unit Democracy Index (I dem ) [32]. The I sys assesses a health system's capacity for early detection and reporting during epidemics of potential international concern. This index is available for 195 countries and has a scale from 0 to 100, where 100 indicates perfect detection and reporting. I sys values were not available for Hong Kong and Taiwan. The value for Hong Kong, I sys = 78, was imputed as the average between South Korea (I sys = 92.1) and Singapore (I sys = 64.5), which were assumed to have comparable health systems. Similarly, the value for Taiwan, I sys = 81, was imputed as average between South Korea (I sys = 92.1) and Japan (I sys = 70.1). All other states without a value (n = 10) were imputed as 42, which was the global average. I dem is a projected measure of the degree of democracy; it is calculated for 167 countries and states and has a scale from 0 to 10, where 10 indicates the highest degree of democracy. This value was assumed to act as a proxy for transparency in data reporting. All states without a value (n = 14) were imputed to be 5.4, the global average.
The inclusion criteria for incorporating countries and territories in the index calculations were if at least 100 cases had been reported and the population was greater than or equal to 100,000. The threshold of 100 confirmed cases was arbitrarily chosen to exclude locations where an outbreak had not yet occurred, in which case an analysis of testing policy was not relevant. Data were accessed daily; however, this report presents the results for data accessed as of 00:00 GMT on June 3, 2020 (n = 165).

Definition of key indicators
Several key indicators were computed from the input data, representing important epidemiological indicators used in the analysis.
the relatively small number of closed cases [33]. In these situations, the CFR is either incomputable or artificially high. Therefore, an alternative estimate of CFR was necessary. The ratio of deaths, D, to cases, C, which can be computed for any country, was used. Logically, the ratio of D:C is lower than the closed-case definition of CFR, D:(D+R), because C includes unresolved cases with unknown outcomes. These two ratios are related though. A scatterplot of these ratios revealed a positive relationship (R 2 = 0.76, n = 157), in which the closed-cased definition of CFR was between one and two times the ratio of D:C in 68.5% of countries. However, in some countries, the closed-case definition of CFR was substantially inflated relative to D:C and deviated from the linear relationship. Two possible scenarios would likely explain thiseither case resolution (count of R) was not tracked in real-time, or a recent outbreak occurred, in which a large proportion of cases were recently identified and had not yet resolved. Thus, the CFR, as computed using the reported closed-case definition, was used in our further analysis but was capped to be no greater than 2 times the ratio of D:C (Eq 1), in order to exclude artificially inflated CFRs.
Test Positivity Rate (TPR). TPR was computed as the ratio of cases, C, to tests, T (Eq 2).
While the reported number of tests from specific countries or territories may represent multiple tests conducted on a single individual or even number of specimens, no adjustment was attempted to account for such heterogeneity in the various definitions of the TPR. In some countries, T was not available and thus TPR was not calculated.
Active Cases (AC). AC was computed as the ratio of active cases, A, to cases, C (Eq 3).

AC ¼
In some countries, A was not available and thus AC was not calculated.

Estimating true number of infections and detection rate
It can be assumed that the reported number of cases, C, in a country represents a subset of the true number of infections. Some infections will go undetected, but as detection of cases increases, C will approach the true number of infections. Thus, the true number of infections (Inf) is some factor, f, higher than the cases that have been identified (Eq 4).
We conceptualized this factor, f, to be a function of the level of testing, the approach to testing (e.g., whether testing focused on symptomatic, hospitalized, or general populations), and the quality and completeness of the data. Two variations of f (f 1 and f 2 ) were constructed to formulate a numerical value for f, where f 1 was derived from CFR, I sys , I dem and f 2 used TPR as described below.
Factor 1 (f 1 ). Deaths have been used as an indicator of the true prevalence of an infectious disease and a means to track underreporting in real-time [19,34]. The proportion of Inf that resolve to death is the infection fatality rate (IFR). In the first four months of the pandemic, the IFR for COVID-19 was around 1% or less [20]. That is, for every recorded death, at least 100 infections likely occurred. Thus, if less than 100 cases were recorded per death attributed to COVID-19 (i.e., the ratio of 100D:C exceeded 1), it was an indication of under-detection of infections. The primary factor, f 1 , was computed in such a way to create a multiple that scales Inf to represent at least 100 infections occurring per attributed death (Eq 5).
However, it was also assumed that health system capacity and government transparency affect the completeness of data, including reported deaths, as represented by the factor m demsys in Eq 5. The multiplier intended to adjust for the health system capacity using I sys -an indicator of a health system's ability to detect and report deaths and/or cases; and I dem -a proxy for government transparency in reporting data. The relationship between I dem , I sys , and m demsys (Eq 6) was determined by fitting the data to estimates of the true number of infections in 35 countries across a wide range of populations and regions where the estimate was available from an existing model that used machine learning to estimate the number of infections [35]. In short, an existing model was used to determine what the m demsys multiplier would have needed to be to reach that existing model's estimates and then fit the data using multiple linear regression. A linear model was chosen since it matched the relationship better than power and non-linear models.
The mathematical relationship between the two indices and the multiplier in this equation followed a declining relationship, whereby increased health system capacity (I sys ) or increased government transparency (I dem ) reduced the multiplier, representing less underreporting.
Factor 2 (f 2 ). The primary factor f relied on deaths attributed to COVID-19, but in locations with inadequate or low testing levels, death data may also be underreported, making deaths a less representative indicator. Therefore, an alternative factor incorporated data on testing (Eq 7). This factor was based on TPR-an indicator of adequate testing relative to disease prevalence.
The World Health Organization (WHO) has suggested testing capacity is adequate when TPR is less than 5% [36]. If TPR was greater than 5%, it was inferred that increasingly more cases were undetected and that the factor f 2 would be equal to the ratio of the TPR to that 5% benchmark (Eq 7). : True number of infections. The two factors f 1 and f 2 provided two different ways to assess what the factor f might be for a given country. Generally, a high TPR is indicative of limited testing to symptomatic or severely ill patients, which will result in underestimation of infections within reported cases [17]. In such a testing strategy where only severe infections that are more likely to result in death were counted as cases, the ratio of D:C or CFR were also likely to be higher, as reflected in f 1 . However, if D:C ratio was low but TPR was high, these two indicators were providing conflicting information. Such a case could occur early in an outbreak when only limited deaths have occurred or where deaths were not being fairly attributed as caused by COVID-19. In either case, the high TPR suggests many infections were not being detected, despite the low number of deaths relative to cases. In most cases the two factors correlated. The factor f 1 , based on the reliable indicator deaths, was approximately equal to or greater than the TPR factor, f 2 . However, in cases where f 2 , gave conflicting information to f 1 , (i.e., f 2 > f 1 ), f 2 was used. Finally, the true number of infections, Inf, was estimated as the product of C and the maximum of the two factors, whichever was highest (Eq 8).
When possible, Inf was compared against other empirical country-level estimates in the literature in order to validate the model.
Detection Rate (DR). With an estimate of the true number of infections, it was possible to estimate what percentage of the infections (including asymptomatic) have been detected. The total number of cases, C, was divided by Inf to estimate the DR in each country or territory. Estimates of the DR were compared to DRs published in the literature.

Calculation of the COVID-19 Testing Index
The COVID-19 Testing Index (CovTI) consists of four sub-indices, each based on a single key indicator. The key indicators used were the DR, TPR, CFR, and AC (Fig 1). These indicators were chosen since they are commonly used as metrics and are computable with the methods described above. The relationship between each sub-index and its indicator was built on two principles. First, each sub-index was scaled from 0 to 100 with 0 representing the worst indicator value and 100 the best indicator value. Second, each sub-index was

PLOS ONE
computed from a mathematical relationship that parsimoniously approximated the ranked percentile of the country according to the underlying indicator. Previous indices used for ecologic studies have computed the ranked percentile for the ecological units (here, countries) for use as the index value [37,38]. We modified this approach, however, since ranking provides only relative information (e.g. how well one country is doing relative to another). We also wanted the index to have the property that an ideal value exists (e.g. 100% for DR or 0% for TPR or CFR). Thus, the countries were ranked and a function for computing the sub-index was chosen that mimicked the distribution of the countries but still retained the ability to have 100 represent the ideal value and 0 represent the worst value. These sub-indices were combined in a weighted average to compute CovTI. In the absence of a rationale to assign weighting to the sub-indices, equal weighting was used with details below [37].
Detection Rate sub-index (DR si ). If infections are undiagnosed, those individuals can actively spread the disease unknowingly. Without timely diagnosis, effective contact tracing cannot occur. Therefore, undiagnosed infections critically contribute to unchecked spread of COVID-19 and subsequently represent inadequate response [39]. DR is also most directly representative of testing effectiveness. Thus, DR si was given 40% weighting (double weighting compared to other metrics) in computing the CovTI (Eq 13). Ranking (Fig 2A) showed a relationship, in which as the detection rate increased, percentile increased asymptotically towards
Test Positivity sub-index (TP si ). The TPR is a commonly used testing metric for COVID-19 [40] and has been recommended to guide decisions regarding introducing or relaxing public health measures [41]. A high TPR represents a reactive, rather than proactive, approach to testing. Ranking showed an exponential decay relationship (Fig 2B), which was incorporated into the TP si function (Eq 10). If TPR was not available, a dummy value of 20 was used. TP si was given 20% weighting in computing the CovTI.
Case-Fatality sub-index (CF si ). The CFR should approach the IFR of 1 percent (or less) if testing is adequate to detect the majority of infections. If CFR is higher, it is likely only severely infected patients are being diagnosed. Ranking showed an exponential decay relationship ( Fig  2C), which was used to define the CF si function (Eq 11). CF si was given 20% weighting. If the CFR was 0% (i.e., no deaths had yet been recorded as attributable to COVID-19), a dummy value of 50 was used.
Active Case sub-index (AC si ). Finally, a fourth sub-index accounted for the activeness of the epidemic in a country. If an outbreak is active in a country, it is less likely the testing is adequate, with the increase possibly reflecting inadequate case identification. This sub-index also provides a metric to reflect progress as cases resolve. A lower AC is indicative that the epidemic is not increasing exponentially. Ranking showed a linear relationship (Fig 2D), which was used to develop the AC si (Eq 12). It was given a weight of 20%. In locations where AC was not computable, a dummy value of 50 was used.
COVID-19 Testing Index (CovTI). The CovTI was calculated as the weighted average of the four sub-indices (Eq 13) described above with a heavier weighting given to the DR si due to the importance of undetected cases in driving infections and because it most directly represents testing effectiveness (the primary objective of the index), whereas the other sub-indices are secondary indicators of testing effectiveness. CovTI was computed for the countries and territories meeting the inclusion criteria (n = 165).

Statistical analyses
Five independent grouping variables were assessed for their relationship with COVID-19 testing effectiveness by analyzing their association with CovTI (Table 1). Testing and contact tracing policy status were accessed from the Oxford COVID-19 Government Response Tracker [42] for May 13, 2020, which is three weeks prior to June 3, 2020, approximately the average time from symptom onset to death [22]. Islands were defined as any country that is an island, is part of an island (co-island), has limited land connections (limited land), or is an archipelago (see details in S1 Table). Any location for which at least one independent variable was not defined was excluded from analysis, creating an analysis data subset (n = 147). Crude bivariate analyses using two-tailed two-sample t-tests and one-way analysis of variance (ANOVA) were used to test whether the means between groups were different. A multiple linear regression (MLR) model was developed by using forced entry of all factors. Factors were removed by backwards stepwise method (p >0.05) with Bayesian Information Criterion (BIC) used to assess model fit and overparameterization. Analyses were performed in Stata 14 [43].

Sensitivity analysis
We conducted a sensitivity analysis to assess how robust the model was in response to uncertainty in chosen model parameters. Eight scenarios were evaluated in the sensitivity analysis. Two scenarios (A1 and A2) assessed the cap on computed CFR (Eq 1); two scenarios (B1 and B2) assessed the assumed IFR (Eq 5); two scenarios (C1 and C2) assessed the m demsys multiplier (Eq 6); one scenario (D1) assessed the TPR threshold (Eq 7); and one scenario (E1) assessed the weighting of the sub-indices (Eq 13

True number of infections and detection rate
Globally, the model estimated that approximately 65.7 million people have been infected with SARS-CoV-2 in the period prior to June 3, 2020, compared to the reported 6.47 million cases (mean multiplier factor, f, = 10.2, range = 1.5-85). In other words, for every reported case it was estimated that 9.5 infections had gone unreported in that period. The global DR was estimated to be 9.8% (range = 1.2-66.8%) in the period prior to June 3, 2020. The countries estimated to have had the highest DRs over that time period were Australia (66.8%) and Iceland (60.3%) (See S1 Table for full results).

Comparison to previous estimates
This model's estimates were compared against historical estimates in the literature ( Table 2). The results showed that this model's estimates were similar to previous estimates at comparable time periods (R 2 = 0.44). In many cases the estimates of true number of infections and DR closely matched previous estimates, and in most cases the estimates were within the 95% confidence interval of previous estimates.

COVID-19 Testing Index
Comparison between countries. Countries in the top quartile of CovTI had lower TPR, lower CFR, lower proportion of active cases, and higher DR compared to other quartiles (Fig  3). The top 15 countries according to the index included many island nations and states that are effectively islands (e.g., Hong Kong, Iceland, Australia) ( Table 3). Southeast Asian countries including Cambodia, Vietnam, Malaysia, Brunei and Thailand also had high CovTI. Full results are reported in Supporting Information (S1 and S2 Tables).

PLOS ONE
Variable analysis. Bivariate analyses showed that testing policy and contact tracing policy were significantly associated (p<0.0001) with CovTI (Table 4), with increasing levels of testing and contact tracing associated with higher CovTI. Islands had significantly higher CovTI than non-islands (p = 0.004). Unitary states had higher CovTI compared to federations, but the differences were not significant (p = 0.26). The CovTI values among OECD members and nonmembers were nearly equal.
All factors were entered into the initial MLR model. The final model included all factors except economic development (OECD member or non-member). The MLR showed that testing policy had the largest effect on testing outcomes, whereby widespread open testing was associated with a 31.1-point increase in CovTI compared to no testing policy (Table 5). Contact tracing, centralized governments, and islands were also associated with improved CovTI values. However, the difference in CovTI associated with type of government was not statistically significant (p = 0.20).

Model robustness
The results of the sensitivity analysis showed that possible uncertainty in the model's assumptions did not affect the outcomes of the study. The scenarios that caused the largest change in the F-statistic were scenarios B1 (IFR of 0.5%) and E1 (equal weighting to each of the four Table 3.

PLOS ONE
CovTI sub-indices). In all scenarios, however, the F-statistic and adjusted R 2 from the MLR were within 10% of the respective values for the final model. Additionally, the interpretation of the results (e.g., which factors were statistically significant) did not differ from the final model. Full results are reported in Supporting Information S1 Appendix.

Discussion
We developed a novel comprehensive metric, entitled CovTI, that attempted to measure effectiveness of testing during the first four months of the COVID-19 pandemic at the country level and was derived from key epidemiological indicators computable from data available across nearly all countries, as reported on the Worldometer website [30]. Previous research has assessed government response [26] and suggested specific indicators to facilitate inter-country comparisons [27]. However, this is the first published metric to comprehensively assess testing outcomes with a focus on detection/underreporting.

Key strengths and limitations
Mitigation efforts to contain the spread of SARS-CoV-2 should include testing, contact tracing, and isolation of cases, alongside social distancing and face masks [4,5]. National policy decisions and individual choices must be informed by an assessment of risk that is supported by data [46]. CovTI aimed to provide an additional data point that holistically combined four important epidemiological indicators into a single metric. As such, it facilitated inter-country comparisons that elucidated extrinsic factors associated with improved testing outcomes. Another advantage to using CovTI is that it incorporated a parsimonious empirical model to estimate the period prevalence and detection rate. Despite the parsimoniousness, the results were comparable to more complex modeling approaches [44,45]. The findings suggested that nearly 90% of global infections were unreported in the first four months of the pandemic, which was consistent with previous estimates showing that true number of infections are many times higher than reported cases [10][11][12]17]. In addition, the true number of infections as a percentage of the population has been modeled in various European countries. For example, period prevalence in Table 5. Multiple linear regression analysis F(7, 139) = 7.07 (p<0.0001), adjusted R 2 = 0.23 of factors associated with COVID-19 Testing Index from final model (n = 147) (data from 00:00 GMT June 3, 2020).

PLOS ONE
Italy was estimated at 4 percent in April [47] and 4.4 percent in France in May [48]. Several of the hardest affected countries early in the pandemic had an estimated period prevalence between 3 and 7.5 percent [49]. These estimates were generally consistent with our estimates. Seroprevalence is another way to assess the period prevalence of COVID-19. Nationwide seroprevalence studies, such as one in Spain, provided period prevalence estimates (5.0% through May 11, 2020) consistent with this and other models [50]. However, several factors can substantially affect the accuracy of seroprevalence studies, including low specificity, crossreactivity with other viruses, high false positive false positive rates in low prevalence environments, and study biases [51][52][53][54]. Therefore, models and parsimonious estimates may continue to play an important role in estimating the true number of infections.
CovTI has several limitations due to its inherent assumptions. In order to calculate DR, the model assumed specific and universal relationships between deaths and total number of infections, implying an inherent IFR. Advances in therapeutics and differences in health system capacity will influence this rate though [55]. In addition, several factors including age, sex, hypertension, diabetes, and blood groups are known to affect mortality and hospitalization rates [56][57][58][59][60][61][62][63]. These factors were not accounted for in this model. Furthermore, death data, which contributed substantially to the computation of CovTI, is not equitably comparable across all countries. Excess all-cause mortality (i.e., total mortality in excess of seasonal averages) are substantially higher than the reported COVID-19 deaths in many locations, including Brazil, Jakarta, New York City, and Ecuador [64]. These excess deaths suggest such locations may not have accurately captured deaths caused by COVID-19 in official figures or are experiencing increased mortality related to the pandemic but not necessarily due to infections [65,66]. Different definitions of attributable deaths substantially affect death data. For example, Russia had previously used a very limited definition for inclusion determined via autopsy that does not count many deaths even if the patient previously tested positive for SARS-CoV-2 [67]. On the other hand, some countries have opted to exhaustively include any presumptive death to COVID-19 in official data. Belgium has included unconfirmed deaths within their COVID-19 death total [68]. In Belgium, COVID-19 deaths were greater than excess deaths [64]. These examples of limited or conservative death definitions impact this model's estimates. Adjusting death data based on excess mortality data could improve the the model; however, such data limited [69].
The model also assumed specific relationships between proxy indicators, such as the Global Health Security Index and Democracy Index, and data outcomes. While regression showed they were associated with the level of underreporting, these factors are indirect proxies for a more complex set of variables that determine underreporting. Furthermore, this model aimed to be parsimonious (i.e., not introducing excessive parameters or uncertainty) and is, by nature, deterministic. The decision to include a minimum number of variables and data was strategic and aimed to avoid overparameterization, but a stochastic approach could better illustrate the uncertainty and sensitivity to the above assumptions. The model also reported estimates of total number of cases and detection rates. These values should be used cautiously as a comparative tool, rather than exact values, alongside other indicators.

Policy implications
High CovTI values were found in countries that have been recognized for their success in responding to the COVID-19 pandemic, including New Zealand, Taiwan, Australia, Iceland, and South Korea. These countries have had high testing rates, comprehensive contact tracing programs, and relative success in mitigating the health impacts of the virus [70][71][72][73]. Thus, this analysis provided quantitative evidence supporting policy recommendations to facilitate strong national leadership, expand diagnostic capacity, rapidly enact comprehensive contact tracing, and proactively test for SARS-CoV-2 [70,71].
Further supporting this notion, aggressive contact tracing and inclusive testing policies were found to be independently associated with increased testing effectiveness, as indicated by CovTI. This result both validates the model's ability to track COVID-19 testing outcomes and provides further confirmation that countries that prioritize policies and dedicate resources specifically to testing and contact tracing have effectively reduced underreporting and improved testing-related health outcome metrics. Previous studies have found an association between expansive testing early in the pandemic and improved mortality data in Germany, South Korea, and Iceland [74]. On the other hand, socioeconomic status had no significant association to the outcome in this study. Previous research has shown that policy decisions are more important than socioeconomic status [75].
Increased testing capacity is not a panacea, though. More testing is not necessarily better, especially if accuracy is overestimated or the testing is poorly targeted [76]. In the first four months of the pandemic, COVID-19 diagnoses were routinely confirmed through testing that employed the reverse transcriptase polymerase chain reaction (RT-PCR) technique [77]. The heavy reliance on this method has been questioned [78]. Alternative molecular detection techniques may be needed, especially if they can return results more quickly [79].
Interestingly, this research found a significant relationship between island nations and higher CovTI. Geographical isolation is an obvious advantage to controlling infectious disease [80]. Many island nations have chosen a strategy of eradication [81]. Geographic isolation and easier enactment of border closures have benefited island nations in responding to the pandemic [82].
Finally, it should be noted that while the estimates from this model showed substantial underreporting of infections, the overall period prevalence remained less than 10 percent in nearly all countries. Such a low proportion of the population presumably with antibodies is far from conferring herd immunity that may inhibit future disease transmission. It is important to note that, while these proportions are much higher than the officially reported cases, they do not represent herd immunity-a concept considered important to fully reopening society. Although herd immunity depends on the effective reproductive number (R e ) [83], which varies with effectiveness of interventions, some estimates specify a threshold of 50 to 60 percent seroprevalence to achieve herd immunity [84], while others, accounting for differential susceptibility, estimate the threshold may be as low as 20 percent [85]. Nevertheless, these estimates suggest that herd immunity has not yet occurred in the first four months of the pandemic at the national level of the countries analyzed. Therefore, achieving herd immunity through natural infection may be costly or unachievable [86].

Future research
We encourage other researchers to build on this analysis by combining this metric with other databases that can account for other possible factors, such as trust in government institutions, demographics, or urban/rural distribution. Such analyses could further elucidate other extrinsic factors related to COVID-19 testing outcomes. Research could also be done to understand how association with other factors changes over time, as the pandemic progresses through different stages. Sub-national analyses may also be possible using the mathematical relationships defined in this index.

Conclusion
This report described a novel comprehensive metric (COVID-19 Testing Index, CovTI) that evaluates the overall effectiveness of COVID-19 testing in the current pandemic using real-time publicly reported data among 165 countries and territories. The metric incorporated case-fatality rate, test positivity rate, proportion of active cases, and an estimate of detection rate based upon reported death data by adjusting for heterogeneity in testing levels, health system capacity, and government transparency. The estimated detection rate of COVID-19 aligned satisfactorily with previous empirical and epidemiological models. National policies that facilitated open public testing and extensive contact tracing were significantly associated with higher values of CovTI, which reflected improvements in the estimated detection rate. Extrinsic factors, including geographic isolation and centralized forms of government, were also shown to be associated with improved COVID-19 testing outcomes. Countries should commit to expanding policies on testing and contact tracing in order to reduce levels of undetected infections and reduce disease transmission. Applications of this metric include combining it with different databases to identify other factors that affect testing outcomes or using it to temporally track a holistic measure of testing outcomes at the national or sub-national level.
Supporting information S1 Table. Raw data inputs and computed values for the COVID-19 Testing Index (CovTI) on June 3, 2020. Raw data inputs, key epidemiological indicators, multipliers, factors, and sub-indices used to compute CovTI on June 3, 2020 among included countries and territories (n = 165). Further details of each variable are described in the text. C = cases, D = total deaths, R = total recovered, A = active cases, P = population (in millions), T = total tests, CFR = case fatality rate, TPR = test positivity rate, m demsys = multiplier to account for health system capacity and democratic transparency, f 1 = factor 1, f 2 = factor 2, Inf = true number of infections, Prev = Estimated Period Prevalence = Inf /P, Act = proportion active cases = A/C, DR = detection rate, DRsi = Detection Rate sub-index, TPsi = Test Positivity sub-index, CFsi = Case-Fatality sub-index ACsi = Active Case sub-index, CovTI = COVID-19 Testing Index. OECD = Organization for Economic Development member, BRIC = Brazil, Russia, India, and China. (DOCX) S2 Table. Dataset for bivariate and multiple linear regression analyses. CovTI as of June 3, 2020; island status; form of government; OECD membership; COVID-19 testing policy as of May 13, 2020; and COVID-19 contact tracing policy as of May 13, 2020 among included countries and territories with complete data (n = 147). Further details of each variable are described in the text. (DOCX) S1 Appendix. Sensitivity analysis using alternative scenarios of model construction. Eight alternative scenarios were considered. In each scenario, CovTI (and all subsequent inputs) was computed. Then, the MLR was run. MLR results were compared to the final model's results. (DOCX)