The Accuracy of Diagnostic Tests for Lyme Disease in Humans, A Systematic Review and Meta-Analysis of North American Research

There has been an increasing incidence of Lyme disease (LD) in Canada and the United States corresponding to the expanding range of the Ixodes tick vector and Lyme disease agent (Borrelia burgdorferi sensu stricto). There are many diagnostic tests for LD available in North America, all of which have some performance issues, and physicians are concerned about the appropriate use and interpretation of these tests. The objective of this systematic review is to summarize the North American evidence on the accuracy of diagnostic tests and test regimes at various stages of LD. Included in the review are 48 studies on diagnostic tests used in North America published since 1995. Thirteen studies examined a two-tier serological test protocol vs. clinical diagnosis, 24 studies examined single assays vs. clinical diagnosis, 9 studies examined single immunoblot vs. clinical diagnosis, 7 studies compared culture or PCR direct detection methods vs. clinical diagnosis, 22 studies compared two or more tests with each other and 8 studies compared a two-tiered serological test protocol to another test. Recent studies examining the sensitivity and specificity of various test protocols noted that the Immunetics® C6 B. burgdorferi ELISA™ and the two tier approach have superior specificity compared to proposed replacements, and the CDC recommended western blot algorithm has equivalent or superior specificity over other proposed test algorithms. There is a dramatic increase in test sensitivity with progression of B. burgdorferi infection from early to late LD. Direct detection methods, culture and PCR of tissue or blood samples were not as sensitive or timely compared to serological testing. It was also noted that there are a large number of both commercial (n = 42) and in-house developed tests used by private laboratories which have not been evaluated in the primary literature.


Background
Lyme disease (LD) is the most common tick-borne infection in Canada and much of the United States (Telford 1997;Ogden 2009). It was first recognized in North America in 1975 in the towns of Lyme and Old Lyme Connecticut as a result of an investigation into 51 cases (39 children) with a similar form of arthritis (Steere 1977). However, the history of Borrelia appears to be at least 5,300-years-old as the bacterium was identified in the mummified remains of the Tyrolean Iceman discovered in 1991 in the Italian Alps (Keller 2012).
In North America Lyme disease is caused by Borrelia burgdorferi sensu stricto while B. afzelii, B. garinii, B. burgdorferi, B. spielmanii and B. bavariensis cause disease in Europe with a wider variety of symptoms than reported in North America; B garinii is predominant in Asia.
Black-legged ticks of the Ixodes family transmit the spirochete through their bite. I. scapularis is the main vector in northeastern and upper midwestern USA and Canada while I. pacificus is the major vector in western USA (Gray 1998;Nelder 2014). The primary vectors of LD in Europe and Asia are I. ricinus and I. persulcatus respectively. The principal hosts of Ixodes ticks in North America include rodents, small mammals, birds and white-tailed deer.
Since first recognized in 1975, LD cases have increased progressively as the tick vectors have expanded their geographic range from the New England states into Canada and across some northern U.S. states (Hamer et al. 2010;Ogden et al. 2009) aided by migratory birds and terrestrial hosts (Leighton et al. 2012). There is increasing evidence that climate change will result in a further expansion of the tick vector range in Canada, resulting in increased future risk of LD among Canadians (Brownstein et al., 2005;Ogden et al., 2006a).
In North America early symptoms of infection may include a rash (characteristically a bulls eye rash), fever, headache and lethargy. If untreated, the disease can progress to chronic symptoms including arthritis, numbness or tingling in hands and feet and memory issues. The diagnostic tests available to confirm Lyme disease in humans are not perfect and have variable sensitivity and specificity depending on the stage of infection {{2014 linday, L.R. 2014;}}. There have also been concerns raised about the use of non-validated tests and test protocols {{2014 linday, L.R. 2014; 2013 Nelson,C. 2014;}}. The goal of this systematic review is to summarize the global evidence on the sensitivity and specificity of diagnostic tests and test regimes at various stages of Lyme disease.
Currently in Canada and the U.S. a two-tiered serology protocol is an accepted and validated test for disseminated Lyme disease diagnosis {{2014 linday, L. R. 2014;2013Nelson,C. 2014;}}. This two-tiered test is typically an enzyme immunoassay (EIA) to detect IgM or IgG antibodies in serum against Borrelia burgdorferi. There are a number of commercial ELISA kits available that use either whole cell preparation of B. burgdorferi and/or recombinant antigens. This variation in target is likely a source of some heterogeneity. If a sample is positive or not determined by EIA then a Western blot test with better specificity is used to detect antibodies in serum against Borellia and confirms whether sero-conversion from IgM to IgG has occurred. Lindsay et al. (2014) summarize some of the strengths and weaknesses of these tests {{2014 linday, L.R. 2014;}}.

Scoping Review -Identification of Relevant Studies
A scoping review was conducted by Greig et al (2015) to identify, classify and characterise the main features of the Lyme disease literature published up to September 2013.
The PICO scoping review question (The Cochrane Collaboration, 2011): "What is the current state of scientific knowledge on surveillance methods, prevention and control strategies, risk factors, and societal attitudes and perceptions towards LD disease in humans and Borrelia spp. in tick vectors and vertebrate reservoirs?" Several systematic reviews were prioritized from the scoping study including an evaluation of the performance of Lyme disease diagnostic tests / test regimes for humans. The full paper was used to confirm the paper's relevance to the Lyme disease issue and describe the purpose, study design, location of the study, Borrelia sp., host species investigated, and vector species investigated. We also collected information on the sampling dates, diagnostic tests used, what extractable data is available in the paper and what is not extractable.
The scoping review included an advisory group that helped define the scope, provided background information and validated the interpretation of the results.
Scoping review search strategy: A pretested search strategy, adapted to the specific requirements of each database, was implemented in the following bibliographic databases: BIOSIS (via web of knowledge), CAB abstracts, Scopus, PubMed, PsycINFO, APA PsycNet, Sociological Abstracts, and EconLit during September 2013. There was no limitation on year of publication. To achieve an effective balance of sensitivity and specificity for identification of potentially relevant citations, the search was pre-tested in Scopus. The search strategy consisted of a targeted combination of specific terms designed to address the research question: (lyme OR borrelia) AND ("host" OR sentinel OR landscaping OR "vector" OR "vectors" OR "monitor" OR "monitoring" OR surveillance OR reservoir OR reservoirs OR prevalence OR educate OR education OR barrier OR barriers OR intervene OR intervention OR incidence OR rate OR prevent OR prevention OR control OR risk OR risks OR attitude OR attitudes OR perception OR perceptions or detection) The capacity of the electronic search to identify all relevant primary research was confirmed by hand-searching reference lists from two primary research papers (Connally et al. 2009;Beaujean 2013) A search for grey literature on the websites of government and research organizations worldwide was conducted in February 2014 to complement the electronic database search; this resulted in the addition of 102 articles to the review (the full list of articles is available as supplementary material, Appendix 2). Only the following grey literature sources were considered for inclusion in the review: formal government and research reports; journal news, commentary, or editorial articles; and theses and dissertations.
Results: Of 16,516 records screened for relevance, 1843 relevant articles were analysed and categorized as follows: surveillance methods 722 articles, diagnostic tests 660, risk factors 452, efficacy of mitigation strategies 153, public knowledge, attitudes, or risk perceptions in North America 172, and economic burden of Lyme disease and/or cost-benefit of potential prevention/control strategies 57 articles.
Of the 660 diagnostic test papers 492 focused on diagnosis of Lyme disease in humans. These papers moved to the systematic review for further assessment and data extraction. The following is a summary of characteristics of studies identified to have evaluated a diagnostic test for Lyme disease in humans. These data will be used to inform the data extraction form for this SR and to confirm consistency in data.  1980-1984 4 1984-1985 36 1990-1994 122 1995-1999 118 2000-2004 90 2005-2009 B. andersonii, B. americana, B. parkeri , B. hermsii, B. turicatae, B. lonestari, B. anserina, B. coriaceae, B. turicatae, B. japonica, B. recurrentis  Early disseminated stage (days to weeks post tick bite): Initial period where the infection spreads to other parts of the body. Symptoms include: Facial palsy (loss of muscle tone on the face), severe headache and neck stiffness due to meningitis, pain and swelling in joints, shooting pains, heart palpitations and dizziness. Without treatment many of these symptoms will resolve, but there is a greater risk of further complications.
 Late disseminated Stage (months to years post tick bite): Approximately 60% of untreated infections may lead to prolonged malaise including: intermittent bouts of arthritis, severe joint pain and swelling. Up to 5% of untreated patients develop neurological symptoms including shooting pain, numbness or tingling in hands and feet and problems with short term memory. IgG reaction should be detectable and will remain detectable for months to years. EIA or other assays only need to target IgG at this point. 80-90% of EM positive patients will be ELISA positive.
 Post treatment Lyme syndrome: It is estimated that 10-20% of patients treated for Lyme infection still have symptoms that last months to years. These include: muscle and joint pain, cognitive defects, sleep disturbance, and fatigue. There is no evidence that this is due to a persistent Borrelia infection and is thought to be an autoimmune reaction, continuing antibiotic treatment doesn't improve this condition. The serological tests will not be able to differentiate a new Lyme infection from previous positivity.
 Chronic Lyme disease: has been used to describe patients that fit the symptoms of Lyme disease but no evidence of current or past infection with Borrelia has been detected. There has been a lot of variation in the use of this term and its use is not well supported. (Infect Dis Clin N Am 22:341-60, 2008, New Engl J Med 357:1422-30, 2008).

Samples
 Serology samples are typically blood serum or biopsy plasma. These would be the most common sample taken for diagnostic tests.
 Synovial fluid (joint involvement), cerebrospinal fluid (neurological symptoms) and serology + ECG (cardiac symptoms) are used to test for disseminated Lyme disease depending on symptoms. NAAT (Nucleic Acid Amplification Test) to identify Borrellia DNA in a sample such as cerebrospinal fluid is possible, but not really used as the concentration is often below detection limits of the PCR.

Tests
All standardized and approved tests for Lyme disease are based on serology and designed to detect an immune response to antigens of Borrelia burgdorferi sensu stricto particularly IgG and IgM.

Two-Tier Methods (Index test):
Canada/ USA (since 1995) approved diagnostic testing sequence: When clinical symptoms such as rash, fatigue, headache, joint pain and/or neurological symptoms of Lyme disease are present (>1 week after an EM has appeared) and there is likely tick exposure (geography-time and activity history) then use two-tier serological testing = EIA-typically an ELISA (positive or equivocal)  Western blot (WB). List of approved tests from FDA and HC in separate pdfs.
 Patient criteria: A patient must have symptoms of Lyme disease e.g. bulls eye rash, history of being in a positive geographic region and possible or self-reported tick exposure. If yes and infection started >2 weeks prior, test with two-tier method, repeat after 4 weeks if negative.
 First tier is an EIA (ELISA= current methods or IFA= 'old method') that is quite sensitive. This test must be positive or borderline to indicate a second tier test. These tests commonly use whole cell antigens grown in vitro; V1sE is an immunodominant antigen and a small target within that antigen C6 26 amino acid peptide (commercial name Immunetics) are also approved for commercial use.
 Second tier: standardized immunoblotting (Western blot OR blots striped with diagnostically important purified antigens) that is quite specific. IgG positive is positive, IgM positive is positive but only for early disease (post EM, 1-2wks, to <1month or up to 6 weeks). How to score immunoblots has been standardized (lyme book) in N. America  Test conclusion: If the specimen is positive on both tests, the patient specimen is considered positive. This has an average specificity = 99% or higher at reference centers (specificity of chronic Lyme = 97-100% and acute Lyme = 80-100%). High Sensitivity has been reported with few values or estimates.

IFA immunofluorescence assay
(older tests) immunofluorescence assay (IFA) is a powerful technique that utilizes fluorescentlabeled antibodies to detect specific target antigens. An antibody is a protein complex produced by B cells that Initiates an immune response against a target antigen. In this case, a fluorophore-labeled primary antibody directed against the suspected antigen is used to detect the presence or absence of the organism.
Some commercial tests previously approved These are less used as they require a skilled microbiologist and cannot be scored objectively.
Target IgG or IgM antibodies. Sensitive.

CLIA chemiluminescent immunoassay
Qualitative presumptive detection of lgG and IgM antibodies. Intended to be first tier in 2 tier test.

Immunoblot Tests
Separate the bacterial antigens spatially on a solid support so that the Sp and complexity of the antibody response is revealed. The evaluation of a result is subjective in that the interpreter is looking for the existence of certain "bands". Qualitative tests. Sp=92%.

WB Westernblot (aka protein immunoblot)
Detects antibodies in a sample by the separation and detection of proteins (antigens, recombinant antigens or recombinant peptides to Borrelia) of a certain length by electrophoresis.

Dotblot
A mixture containing the molecule to be detected is applied directly on a membrane as a dot, and then is spotted through circular templates directly onto the membrane or paper substrate Uses proteins (antigens, recombinant antigens or recombinant peptides to Borrelia. (Striped blot is a subset of this).

SDS-PAGE SDS-polyacrylamide gel -electrophoresis
immunoblot with antigen targets to Borrelia burgdorferi (strain B31 in Canada) from serum Antigen Capture Assays ??? -Not validated for urine Complement fixation testdetects antibody or antigen in serum. .

Multiplex immunoassay-
any assay that that simultaneously measures multiple analytes (dozen or more) in a single run/cycle. Likely antibody protein arrays in this project.
Commercial: Multiplex microsphere assay (aka AtheNA Multi-Lyte test system) on the Luminex diagnostic platform. Approved first tier test, uses defined peptides.

IP Immunoprecipitation
Is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein.

ACIF anticomplement indirect immunofluorescence assay
Indirect immunofluorescence utilizes a two-step technique, in which a primary, unlabeled antibody binds to the target, after which a fluorophorelabeled second antibody (directed against the Fc portion of the primary antibody) is used to detect the first antibody. This technique is more complicated and time consuming than direct immunofluorescence (because it requires a second incubation period); however, it is more sensitive because more than one secondary antibody can bind to each primary antibody, which amplifies the fluorescence signal.

A surface plasmon resonance (SPR) sensor
has been used for the direct detection of Lyme borreliosis specific antibodies in blood serum.

MAT microscopic agglutination test
is a serologic test that measures the patients serum ability to agglutinate live spirochetes (agglutins usually appear after >5days of infection). Uses live organisms, thus only performed in reference labs.

IEM Immune (sorbent) electron microscopy
immunoprobes (usually an antibody to borrelia) is used to identify antigens in the sample.

Tests for active infection via tcell activity
Tests for chronic infection via CD3-/CD57+ (NK-cells) levels, which decrease with chronic infection. Can be used to detect Borrellia from lesions or from cerebrospinal fluid with neurological Lyme cases; however both suffer from low Sn and are not recommended. Commercial: Sequence detection system and Light cycler are commercially available rt-PCR.

Southern blot
Is a method used in molecular biology for detection of a specific DNA sequence in DNA samples.

Objectives of the SR:
Evaluation of the sensitivity and specificity of diagnostic test regimes for diagnosis of Lyme disease in humans, a systematic review and meta-analysis of the evidence.
1. Compile a list of published Lyme disease diagnostic tests for humans -from scoping review. 2. Extract or calculate sensitivity and specificity information reported for all stages and types of disease and for all types of Borrellia. Individual or combined tests 3. Compare the appropriateness of the current 2-tier recommendations in Canada to the performance of other diagnostic tests both approved and not currently approved for testing. 4. Evaluate the cost-benefit of tests that appear to perform better than the standard two tier method approved for use.

Review methods:
-Studies will be confirmed relevant to this SR (see relevance confirmation tool). Those with insufficient data to extract or insufficient detail (ie conference abstracts) will be excluded from further evaluation and summary. -Studies will be evaluated by the QUADAS-2 tool {{2017 Whiting,P. 2008; 2016 Whiting,P.F. 2011; 2015 Whiting,P.F. 2013;}} for risk of bias and other methodological quality domains to assess the extent to which the results of each study or group of studies can be believed. (see Risk of bias and quality assessment tool). Chpt 9, the Cochrane Diagnostic Test Accuracy Handbook (Deeks 2009) and the more recent QUADAS-2 tool (2011) which has been updated from what the Cochrane chapter was based on, was used to construct this tool. Judgements of "at risk of bias" or "concerns regarding applicability" are judged based on the whether 1 or more domains indicated "high" or "unclear" deficiencies. Extensions of the QUADAS tool included a domain for comparison tests. -The data extraction form will extract pertinent outcome information so assessment of sensitivity and specificity can be calculated post-hoc where not directly reported in the paper. (see data extraction tool)

Study Design
-We expect these to be mainly diagnostic test accuracy studies, which are observational in nature and defined below. They were classified as diagnostic test studies at classification. No restriction on study design will be made at this point. The data will be grouped according to test, population/ stage of disease and study design for analysis.
-Typical diagnostic test: patients receive the index test, one or more other tests (optional) and the clinical reference standard (gold standard-what is used to diagnose patients).
Diagnostic test accuracy studies are typically cross-sectional studies. At inclusion in the study all patients are usually known to have or not have the condition of interest and there is usually not a lot of uncertainty about the status of the included individuals.
-Delayed cross-sectional studies occur when verification of the index test result is based on information that will only be available in follow-up after inclusion in the study. -Cohort type accuracy studies / single gate studies are still cross-sectional in design.
These studies employ a single set of inclusion criteria e.g. enroll everyone that presents to a clinic with symptoms of Lyme disease. -Case-control accuracy studies/ two-gate studies are still cross-sectional in design.
These studies employ different criteria for those with and without the target condition (Lyme disease). E.g. It may mean that patients with Lyme disease and without Lyme disease, but with another condition, were recruited from the same sampling base e.g. a clinic/ hospital. These are prone to bias as often they only include patients with severe forms of the disease of interest instead of a logical spectrum that reflects the disease in the population (these should be identified in study appraisal and perhaps omitted or sensitivity analysis with and without them during analysis). The generalizability of these studies may prevent it from addressing the clinical question.

Comparisons of Tests
-Head to head design: this is the strongest comparison that directly evaluates the test against each other. They can be fully paired where all participants received all tests AND the clinical reference standard. -Randomized direct comparison: study participants are randomly allocated to receive the index test or the comparator AND all participants received the clinical reference standard test. This is the best not fully paired design to avoid selection bias. -Indirect comparisons: While this may not be a study design, it can happen in a review.
Indirect comparisons are prone to selection bias. If possible the comparisons reported should be based on fully paired or randomized designs.
Observational study: Assignment of subjects into a treated group versus a control group is outside the control of the investigator.
-Cross-sectional: Examines the relationship of a risk factor and outcome (disease) at a point in time on representative samples of the target population.
-Cohort study: is a study in which individuals with differing exposures to a suspected risk factor are observed through time for occurrence of an outcome -Case-control study: compares exposure to the risk factor in subjects who have an outcome (the 'cases') with subjects who do not have the outcome, but are otherwise similar (the 'controls') and drawn from the same sampling frame.
-Prevalence survey: Measurement of an outcome at a point in time but doesn't measure or investigate potential predictors -Longitudinal prevalence: A study that measures outcome (prevalence and distribution of disease only) at multiple points in time on the same population.
Experimental study: Each subject is assigned to a treated group or a control group before the start of the treatment -Controlled trial: an experimental study in which people are allocated to intervention/comparison groups and evaluated for outcomes. Randomized (RCT) if authors specifically indicate random allocation of treatment/control. -Controlled before-and-after (CBA) study: A study in which observations are made before and after the implementation of an intervention, both in a group that receives the intervention and in a control group that does not. -Uncontrolled before-and-after study: observations are made on a population before and after receiving an intervention.

Participants
Parameters to assess before testing:

Management of the SR
This systematic review will be managed in Distiller (evidence partners, 2014) and each form will be completed by two reviewers working independently. Conflicts will be resolved by consensus. Data will then be exported to MS excel and prepared for summarization and analysis in STATA 13.
Relevance confirmation will confirm that the study is relevant to this SR and the study design.
Assessment of the methodological quality will follow the QUADAS-2 tool. All questions and definitions may be found in the Risk of bias and quality assessment tool Data Extraction will include defining the test attributes, the population / stage of disease studied, and all relevant data including sample size, number positives for each test, sensitivity and specificity, other available data like ROC curves etc can be found in in the data extraction tool.

Analysis Plan/Options
Studies will be summarized and grouped by test, test comparison, stage of disease, and age of the population, targeted Borrellia spp., and study design. Appropriate comparisons, sensitivity and specificities, and other descriptive summaries will be presented in tables and graphs as appropriate.
Hierarchical random effect meta-analysis will be used and if possible meta-regression will be used to explore reasons for heterogeneity in STATA 13. If there are not enough studies, then sub group analysis will be used to evaluate the impact of different study attributes on the effect estimates. Meta-analysis provides us with an estimate of diagnostic tests accuracy and the uncertainty and variability of the findings around this estimate. Meta-regression can statistically compare the accuracy of two or more different diagnostic tests and describes how test accuracy varies with different tests, thresholds and other study characteristics.
It will be important to ensure that the studies are similar enough, particularly in the participants recruited-changes or differences in patient selection criteria will alter the spectrum of disease and non-disease in the population, which can impact tests accuracy.
Ultimately diagnostic tests and testing protocols will be compared for their positive and negative predictive values and the differences or apparent equivalencies across different diagnostic tests will be evaluated. Tables will summarize 1) the number of studies/individuals for each analysis, 2) diagnostic test accuracy, 3) comparative accuracy, 3) results of any heterogeneity investigation, 5) results of sensitivity analysis (10.3.5).
Evaluating accuracy of a test: Average Sn/Sp and potential summary ROC curve for varying thresholds will be most appropriate. Where prioritized, particularly for the two-tiered method, an investigation into heterogeneity will be considered if there are enough studies to do so. Important population and tests protocol characteristics have been identified and are captured in the data-extraction form.

 Which tests?  2-tier method approved
Comparing two or more tests: Pairwise or multiple tests can be compared. Considerations for multiple test comparisons (statistical issue) and what studies to include in the comparisons are needed (should the comparison be restricted to only the studies that make a direct comparison either by testing all patients or a random sample of patients?).
 Which tests?  All others relative to two-tier  Evaluate variations in two-tier, particularly approved vs. not approved  DATA: Definition of a test positive for each test, if there are multiple thresholds then we need to capture that information. Direct and indirect comparisons will be presented and part of the sensitivity analysis respectively to aid in the information being presented to decision makers. Substantial differences will be thoroughly explored and discussed.
1) Binary data: positive vs. negative 2) Ordinal: ordered set of categories (5) from definitely positive to definitely negative.
3) Continuous or count: outcome reported on a continuous scale or as a count (concentration or number of features observed). These are often dichotomized by predefined thresholds.
 For meta-analysis the ordinal, continuous or count outcomes need to be dichotomised, which means a threshold "cut-off" needs to be established.
 Diseased and non-diseased is established by the clinical reference standard and everything else is compared to that.
 2x2 table can be drawn; o Sensitivity: the probability that the index test result will be positive in a diseased case. Sn=P(T+|D+)= a/a+c. Also referred to as detection rate, true positive rate or true positive fraction.
o Specificity: the probability that the index test result will be negative in a nondiseased case. Sp=P(T-|D-)= d/b+d. Also referred to as true negative rate or true negative fraction. o Negative predictive value (NPV) = probability that a non-diseased case is test negative= P(D-|T-)= d/(c+d).
 Likelihood ratios o Bayesian MA-likelihood ratios can be used to update a pre-test probability of disease using Bayes theorem. If a test is informative you will get a higher LR than the pre-test probability and if it is not informative you will get a lower LR than the pre-test prob.  If the study moving forward is not one of these it should have been excluded above!

Risk of Bias and Quality Assessment Tool
Only answer for sections applicable to this paper. I.e.: reference test and index test or index test and comparison test.
SR Question: Evaluation of the sensitivity and specificity of diagnostic test regimes for diagnosis of Lyme disease in humans, a systematic review and meta-analysis of the evidence. Patients: People suspected of having Lyme disease based on symptoms, and possible exposure Index Test: This would be any variation of the two-tiered method or a test that is in competition with the two-tiered method for diagnosing Lyme disease. Reference Standard: For this SR the two-tiered method is the clinical reference standard, thus a patient with symptoms consistent with Lyme disease for >2weeks can be tested with an EIA and if positive, a WB.

Comparison tests: Any test being evaluated against currently accepted methods to diagnose Lyme disease Question
Answers Explanation Indicate the study set-up briefly. [Text] Briefly indicate how participants were tested, in what order etc. so the analyst can understand the study.

Domain 1: Patient Selection Describe methods of patient selection [text]
Copy and paste from paper.

Was a consecutive or random sample of patients enrolled?
□ Applicability -Is there concern that the included patients do not match the review question?
□ Low concern ___ □ Unclear concern ___ □ High concern ___ Low concern = There was a spectrum of likely Lyme disease patients included High concern= patients included differ from those targeted by the review as they only focused on a subset of Lyme disease cases. Subsets by severity, demographics, differential diagnoses and comorbidities are typical. If a threshold was used, was it pre-specified?
Yes, threshold given Unclear, not discussed at all. No, threshold doesn't appear to be prespecified. NA-no threshold for this index test.

Risk of bias-Could the conduct or interpretations of the index test have introduced bias?
□ Low ROB □ Unclear ROB □ High ROB Low ROB, blinding, established thresholds and objective interpretation of the test Unclear ROB-one or more deficiencies noted. High ROB-concerns of bias due to deficiencies. Applicability -Is there concern that the index test, its conduct or interpretation differs from the review question?
□ Low concern ___ □ Unclear concern ___ □ High concern ___ Variations in test technology, execution, or interpretation may affect estimates. Given we are interested in exploring the variations, it is most important to note if there is a test that would not be applicable to this review question. High ROB-concerns of bias due to deficiencies. Applicability -Is there concern that the Comparison test(s), its conduct or interpretation differs from the review question?
□ Low concern ___ □ Unclear concern ___ □ High concern ___ Variations in test technology, execution, or interpretation may affect estimates. Given we are interested in exploring the variations, it is most important to note if there is a test that would not be applicable to this review question.

Domain 4 -Flow and Timing
Is the time period between clinical reference standard and index test appropriate to be reasonably sure that the target condition did not change between the two tests?

Population demographics
In what continent was the study conducted?
In what country was the study conducted?
□ text  Counts in group 1 ___  SD in group 1 ___  N in group 1 ___  Counts in group 2 ___  SD in group 2 ___  N in group 2 ___  Define group 1 ___  Define group 2 ___  P-value (exact only) ___  T value ___  For matched studies, specify pre/post correlation ___  Outcome units ___  Outcome scales (i.e. lowest/highest possible values) [Detection limit or analytical sensitivity] ___  Threshold for dichotomization as suggested by the author.

Continuous: Sufficient information includes:
 Mean, sample size, + EITHER a measure of variability (e.g. SD, CIs) or exact P-value/t-value or  Sample size and P-value/tvalue from t-test or  Difference in means and a measure of variability (SD, SE, CIs, variance) or  Difference in means, sample size, + EITHER a common SD or an exact Pvalue /t-value For meta-analysis the ordinal, continuous or count outcomes need to be dichotomised, which means a threshold "cut-off" needs to be established for positive / negative groups.

Difference in means from index or comparison test outcomes (between Disease positive and disease negative
 Difference in means (value) ___  N (total sample size) ___  Common SD ___  SE ___