Bayesian latent class models for identifying canine visceral leishmaniosis using diagnostic tests in the absence of a gold standard

Background Like many infectious diseases, there is no practical gold standard for diagnosing clinical visceral leishmaniasis (VL). Latent class modeling has been proposed to estimate a latent gold standard for identifying disease. These proposed models for VL have leveraged information from diagnostic tests with dichotomous serological and PCR assays, but have not employed continuous diagnostic test information. Methods/Principal findings In this paper, we employ Bayesian latent class models to improve the identification of canine visceral leishmaniasis using the dichotomous PCR assay and the Dual Path Platform (DPP) serology test. The DPP test has historically been used as a dichotomous assay, but can also yield numerical information via the DPP reader. Using data collected from a cohort of hunting dogs across the United States, which were identified as having either negative or symptomatic disease, we evaluate the impact of including numerical DPP reader information as a proxy for immune response. We find that inclusion of DPP reader information allows us to illustrate changes in immune response as a function of age. Conclusions/Significance Utilization of continuous DPP reader information can improve the correct discrimination between individuals that are negative for disease and those with clinical VL. These models provide a promising avenue for diagnostic testing in contexts with multiple, imperfect diagnostic tests. Specifically, they can easily be applied to human visceral leishmaniasis when diagnostic test results are available. Also, appropriate diagnosis of canine visceral leishmaniasis has important consequences for curtailing spread of disease to humans.


Methods/Principal findings
In this paper, we employ Bayesian latent class models to improve the identification of canine visceral leishmaniasis using the dichotomous PCR assay and the Dual Path Platform (DPP) serology test. The DPP test has historically been used as a dichotomous assay, but can also yield numerical information via the DPP reader. Using data collected from a cohort of hunting dogs across the United States, which were identified as having either negative or symptomatic disease, we evaluate the impact of including numerical DPP reader information as a proxy for immune response. We find that inclusion of DPP reader information allows us to illustrate changes in immune response as a function of age.

Conclusions/Significance
Utilization of continuous DPP reader information can improve the correct discrimination between individuals that are negative for disease and those with clinical VL. These models provide a promising avenue for diagnostic testing in contexts with multiple, imperfect diagnostic tests. Specifically, they can easily be applied to human visceral leishmaniasis when diagnostic test results are available. Also, appropriate diagnosis of canine visceral leishmaniasis has important consequences for curtailing spread of disease to humans.

Introduction
Leishmaniasis is a neglected tropical disease that is endemic in 98 countries and three territories. The most severe manifestation of leishmanaisis is visceral (VL); it is fatal in 10% of human cases. Six countries, India, Bangladesh, Sudan, South Sudan, Ethiopia, and Brazil, account for more than 90% of human VL cases [1]. It also presents a serious risk to domesticated dogs, with a prevalence between two percent and 33 percent, depending on location [2]. In the Americas, this infection is caused by the parasite Leishmania infantum (L. infantum) and is zoonotic; dogs are recognized as the main animal reservoir [3]. In this capacity, canine visceral leishmaniasis (CVL) is a significant risk factor for human disease, which makes disease identification in dogs critical to public health [4][5][6]. Diagnostic tools for identifying CVL include parasite culture, serology, PCR, and clinical assessment of symptoms by licensed veterinarians. As noted by Solcà et al., the reliability of serological and PCR test results in particular depends heavily on whether the sample is extracted from the site of infection [7]. Parasite culture, which is often considered the "gold standard" for CVL diagnosis [8,9], has low sensitivity for individuals that have a low parasite load [10,11]. Clinical assessment relies on the feasibility of performing physical examination and is subject to expert variability.
Latent class analysis (LCA) represents a statistical solution to the problem of imperfect or infeasible diagnostic tools. Indeed, LCA has been used to evaluate the accuracies of various dichotomized diagnostic tests, including quantitative PCR and serology for CVL, as well as hematological parameters [7,12,13]. In this paper, we compare two Bayesian latent class models based on the Dual Path Platform (DPP) serology test and PCR. The DPP test, while historically used as a dichotomous assay, also can yield numerical test information from the DPP reader. The aim of this study is to evaluate the utility of including numerical DPP reader information, as a proxy for strength of immune response, in a latent class model. This is carried out for symptomatic and negative dogs as defined below, and is compared to a traditional approach, which incorporates dichotomized test information in identifying an underlying disease state. While LCA models are applied to CVL in this paper, these methods are sufficiently general to be easily applied to human clinical VL identification.

Ethics statement
All dog caretakers gave signed informed consent, following a protocol approved by the University of Iowa Institutional Animal Care and Use Committee (IACUC).

Study population
A cohort of hunting dogs from multiple locations across the United States had whole blood and serum collected. These dogs were naturally infected by L. infantum transplacentally. Physical exams were performed by veterinarians after the blood was collected. Demographic characteristics are summarized in Table 1. Data are available on ResearchGate: https://www. researchgate.net/project/Leishmania-detection.

Clinical status
Dogs were defined as negative for L. infantum infection if they had both a negative qPCR and a negative DPP CVL test result. Dogs were defined as asymptomatic if they were positive on at least one diagnostic test (qPCR, DPP CVL) and presented with < 2 clinical signs. Dogs were defined as symptomatic if they had � 2 clinical signs and were positive on one or both diagnostic tests (qPCR, DPP CVL). Clinical signs include: lymphadenopathy, cachexia, weight loss, dermatitis, alopecia, poor hair coat, conjunctivitis, epistaxsis, and splenomegaly. Only dogs that were defined as negative or symptomatic, based on these criteria, were included in this analysis. Asymptomatic dogs have a large range of immune responses and ability to develop antibodies to Leishmania parasites, which makes DPP reader scores highly variable and unreliable, making them very difficult to include in this kind of study. For the purposes of this research they were excluded.

Statistical methods
In this paper, we introduce Bayesian latent class hierarchical models to estimate the underlying disease state for each observation based on properties and results from two diagnostic tests, as discussed above. Further, demographic variables age, sex, an indicator for senior (6 years or older), and an interaction between senior and age were included in the model based on their prior association with leishmaniasis and disease progression [4,16,17]. Of the 1309 observations included in the analysis, 307 are classified as senior. We assume that all observations are independent. We also assume that the two diagnostic tests are conditionally independent. This is common practice in latent class modeling for this type of application; it is a valid assumption here because the biological mechanisms on which the tests rely are different. The DPP test relies on antibody production by the canine immune system, indicating previous exposure to the parasite, which may or may not be coincident with concurrent infection, although rising rK28 antibody levels are associated with disease progression [18]. In contrast, a positive qPCR result is indicative of current infection by Leishmania parasites at a threshold detectable level in the bloodstream, and is independent of the humoral response of the host. Notation. For i 2 1, . . ., n, where n is the total number of observations, and diagnostic test j 2 1, 2: • Y ij : test j outcome for observation i; • η j : sensitivity of test j; • γ j : specificity of test j; • D i : underlying disease status for observation i; • θ: linear effects for intercept, age, sex (male baseline), senior (� 6 years), and an age/senior interaction Data model. While our approach can be easily generalized to more complex scenarios, here we consider two diagnostic tests. The first, DPP, can be either dichotomized or continuous, due to the presence of the DPP reader. If the test is used as a dichotomous assay, then Alternatively, if it is being treated as continuous, then we model it using a mixture of gamma distributions, where g 1 (�) is the density function corresponding to a Γ(α 1 , β 1 ) and g 0 (�) is the density function corresponding to a Γ(α 0 , β 0 ) distribution. Discussion of how these distribution parameters are determined follows in the Prior Model and Hyperparameters section. where where θ 0 is related to the logit of the prevalence of the disease in the population, although the interpretation is complex; θ 1 and θ 2 are the effects of sex (baseline is male) and age (in years), respectively; θ 3 is the effect of senior; θ 4 is the effect of the interaction between senior and age. The design matrix, x T i , is a 1 × 5 matrix: 1 I Female a I Senior a � I Senior � ½ ; I Female is 1 if individual i is female and 0 otherwise, I Senior is 1 if individual i is older than 5 and 0 otherwise, and a is the age, in years, of individual i.
Prior model and hyperparameters. Informative priors are placed on η j , γ j , α 0 , α 1 , β 0 , and β 1 . The first two priors are η j * Beta(S j /(1 − S j ), 1) and γ j * Beta(P j /(1 − P j ), 1), where S j and P j are the mean sensitivity and specificity for test j, respectively. This approach allows researchers to utilize prior information on the sensitivity and specificity of various assays, which is generally available in the literature, while also not ignoring the remaining uncertainty in these terms.
The priors for the two gamma distributions, α 0 , α 1 , β 0 , and β 1 are determined using past information about the mean and variance of the DPP score for negative and symptomatic dogs, respectively, as determined using clinical status. Note, clinical status is not explicitly included in the model in another capacity, and it is used to ensure identifiability for the mixture of gammas. For negative dogs, the average and standard deviation of the DPP score were m 0 = 2 and s 0 = 1, respectively. For symptomatic dogs, as defined in the methods section on clinical status, the average DPP reader score was m 1 = 150 and the standard deviation was s 1 = 103. Using these estimates, along with the analytical form of the mean and variance for gamma random variables, we established exponential hyperpriors on g 0 and g 1 with means equal to m 2 0 =s 2 0 and m 0 =s 2 0 , and m 2 1 =s 2 1 and m 1 =s 2 1 , respectively. The prior distribution for the coefficients in Eq 4 was θ * MVN(μ θ , S θ ); the first element of μ θ , is selected to correspond to a prevalence of 0.1 [19]. It is standard practice in these models to use an informative prior for the prevalence.

Simulation study
In this simulation study, we examine the relationship between sample size and prevalence, and the relationship with posterior predictive variability in the context of the latent class model which incorporates the DPP reader score through a mixture of gammas. To accomplish this, we simulate data with demographic properties similar to those in the real data employed here; we alter the population prevalence by varying the simulation value of θ 0 , which is related to disease prevalence for the average aged male. We fix θ 1 through θ 4 according to the effect sizes observed in the real data model fit. Then, we fit the simulated data sets according to the Statistical Methods section. Relevant files for conducting this simulation study are included in the S1 Appendix.

Statistical software
Each of the models was fit using Markov chain Monte Carlo; this was implemented in the opensource software R using the nimble and coda packages [20][21][22]. Model convergence was assessed using the Gelman-Rubin diagnostic. The relevant files are available to the reader in the S1 Appendix.

Results
The demographic variables, summarized by the test results, are recorded in Table 1. The majority of the observations corresponded to PCR and DPP negative results. The median ages were comparable across the test result combinations. The percent male was also comparable across the test results; there were only eleven dogs with a PCR positive and DPP negative result; nine were male.

Model 1: Dichotomized PCR and DPP
The posterior mean and median estimates, and corresponding 95% credible intervals for the intercept, the effects of sex, age, senior, and the interaction between age and senior, as well as those for sensitivity and specificity for PCR and DPP are given in Table 2. The mean population prevalence for average aged males (3.9 years in this data set) is estimated at 9.5%, with a 95% credible interval of (1.8%, 37.9%). This interval is consistent with prevalence ranges reported for CVL [3,19]. The mean odds of disease for males are 1.30 times those for females in this population, with a posterior 95% credible interval of (0.78, 2.27). There is an 83% chance that the mean odds of disease for males is greater than that for females in this population. This is consistent with past findings on the importance of considering sex as a risk factor for VL [23]. Fig 1 shows the median posterior predictive probability of disease as a function of age for each of the possible four combinations of PCR and DPP test results for Model 1. The probability of disease is clearly positively associated with age for all four test result combinations for those that are no more than 5 years old. In this age range, those that are DPP positive tend to have higher median probability of disease, although there is still considerable overlap in credible intervals among these four groups. In the canine population represented in this study, dogs that become infected with the Leishmania parasite tend to progress to clinical disease by age 5, after which they often succumb to disease. Survivors, represented by those that live with infection beyond age 5, tend to live to age 12, at which point they succumb to disease or die from other natural causes. The results from Model 1 are consistent with these observations. Table 3 contains information on the posterior mean and median estimates, 95% posterior credible intervals, and posterior probability that a parameter is greater than 0 for Model 2, which differs from the model discussed in the previous section as it contains DPP reader information, rather than a dichotomized test result. Based on this model, the mean population prevalence for average aged males (3.9 years) is estimated at 17.2%, with a 95% credible interval of (5.4%, 42.4%). This interval is higher than that obtained from Model 1 but is consistent with findings in the United States in dog populations where leishmaniasis is endemic [3].

Model 2: Dichotomized PCR and DPP reader
The median posterior predictive probability of disease as a function of age for Model 2 is shown in Fig 2 for each combination of PCR test result and DPP reader range (DPP < 10 corresponds to negative DPP, 10 � DPP < 100 and DPP � 100 correspond to two potential levels Table 2. Posterior mean and median estimates, 95% posterior credible intervals, and the posterior probability that a parameter is greater than 0 for Model 1, which uses dichotomized PCR and DPP test results, as well as sex, age in years, and an indicator for senior. Senior is defined as 6 years or older (307 of 1309 observations); male is the baseline for sex. For parameters that are bounded above 0, P(X > 0) is omitted. DPP = Dual Path Platform. of immune response). In this plot, we observe a similar trend to that from Model 1 for individuals less than 6 years old. Those that are PCR positive or have a DPP reader score greater than 100 have the highest median posterior predictive probability of disease at a recorded age of 5 years. Individuals that are PCR positive with a lower DPP reader score may be experiencing "rapid onset" of disease, which indicates that they are unable to control parasite replication and increases that likelihood of disease at an earlier age. Those that are PCR negative and DPP > 100 may be experiencing one of two immune responses that account for their test results. Individuals in this group may be "controllers", which are not as highly infected, as evidenced by the negative PCR test result but are controlling parasite replication through an immune response, which accounts for the high DPP reader score. Alternatively, individuals in this group may be highly infected with Leishmania and no longer mounting a strong immune response. This model cannot distinguish between these two scenarios. Those that are PCR negative with DPP > 100 that survive past age 5 attain a similar risk of disease by around age 9. Generally, those that survive past age 5 have a higher probability of disease if they have a positive PCR test. As in Model 1, the posterior credible intervals are still quite wide, so there is uncertainty in this model.

Simulation study
The results from the simulation study for Model 2 are summarized in Fig 3. The uncertainty in posterior predictive probabilities clearly decreases for larger sample sizes for all included values of disease prevalence. Thus, the gamma mixture model approach performs better as sample size increases, which is expected.

Discussion
In this paper, we employ Bayesian latent class modeling to improve the identification of clinical CVL. While this type of analysis has been performed before using a variety of dichotomized Table 3. Posterior mean and median estimates, 95% posterior credible intervals, and the posterior probability that a parameter is greater than 0 for Model 2, which utilizes the DPP reader (continuous) and dichotomized PCR test results, as well as sex, age in years, and an indicator for senior. Senior is defined as 6 years or older; male is the baseline for sex. For parameters that are bounded above 0, P(X > 0) is omitted. DPP = Dual Path Platform.

PLOS NEGLECTED TROPICAL DISEASES
diagnostic tests for leishmaniasis [7], in this paper we compare the utility of including numeric serology information from the DPP reader to that of including dichotomized serology test results. The modeling results presented in this paper suggest that inclusion of numeric DPP reader information, in the context of a gamma mixture model, can be helpful for illustrating changes in immune response as a function of age. Specifically, the DPP reader information allows us to reconcile conflicting dichotomized test results, particularly in cases where the DPP reader score is larger than 100 but the PCR test is negative. These results were obtained in a sample of dogs that were classified either as negative or symptomatic for L. infantum based on clinical examination. While they play an important role in transmission, asymptomatics were excluded from this study since it was important to evaluate the model performance for the more distinct groups of negative and symptomatic dogs. Application of this latent class modeling approach that incorporates the DPP reader information for asymptomatics (based on clinical examination) is a future line of study. It is possible that the additional information afforded by the DPP reader could help address the challenging problem of identifying and treating asymptomatic individuals appropriately; however this will require further inquiry and is not a conclusion of this paper.

Conclusion
While these models were developed in the context of a canine population, the methods, as well as possible extensions, are easily applied to human VL, where similar diagnostic tests and clinical examinations are utilized in disease diagnosis. In situations where clinical examination may not be immediately possible, but diagnostic testing is, models such as those presented in this paper may facilitate diagnosis, flagging individuals who are in need of further examination. It is also important to note that effectively diagnosing CVL has important consequences for curtailing human VL infection, since dogs serve as the primary animal reservoir [3]. Furthermore, studying VL in dogs can yield important information about human infections and disease, since parallels exist between infection and disease progression between human and their canine companions [24,25].
There are several potential limitations of the analyses presented. First, most of the individuals included in these data had negative PCR and DPP results, which is reflected in the low average posterior probabilities of disease depicted in Fig 1. Also, the small number of observations with positive PCR test results may hamper the estimated PCR sensitivity in Model 2, in the presence of numeric DPP. As demonstrated in the simulation studies, the variability associated with the posterior predictive probabilities for the LCA with the gamma mixture model decreases for larger sample sizes. Interestingly, it does appear that including numeric DPP test results in addition to the dichotomous PCR assay may bolster identification of diseased individuals, even when the number of positive PCR tests in the sample is low. Second, the prior information for the mixture of gammas to model DPP reader score was based on clinical status, which was not included in the model in another capacity, from the same study. Hyperprior distributions were placed on the gamma parameters, but strictly speaking, prior information on these parameters should be taken from other prior studies, which were not available in this case. One alternative method of imposing ordering on the gamma distributions, g 1 (�) and g 0 (�), would be to introduce appropriate constraints concerning their means. Specifically, one could employ the constraint, α 1 /β 1 > α 0 /β 0 , where α 1 , α 0 , β 1 , β 0 are the shape and rate parameters for g 1 (�) and g 0 (�), respectively. Then, vague gamma priors could be placed on those shape and rate parameters. This took considerably more time to fit, however, and ran into convergence problems. In subsequent applications of these methods to other data sets, this prior information on DPP score as a function of clinical status would be available, so this is not a limitation of the method. Rather, it is a limitation that exists only due to the relative novelty of the DPP reader.
Supporting information S1 Appendix. R code for simulations and model fitting. Separate files are included for each model implementation, as well as dependent functions. (ZIP)