A Likelihood Approach to Estimate the Number of Co-Infections

The number of co-infections of a pathogen (multiplicity of infection or MOI) is a relevant parameter in epidemiology as it relates to transmission intensity. Notably, such quantities can be built into a metric in the context of disease control and prevention. Having applications to malaria in mind, we develop here a maximum-likelihood (ML) framework to estimate the quantities of interest at low computational and no additional costs to study designs or data collection. We show how the ML estimate for the quantities of interest and corresponding confidence-regions are obtained from multiple genetic loci. Assuming specifically that infections are rare and independent events, the number of infections per host follows a conditional Poisson distribution. Under this assumption, we show that a unique ML estimate for the parameter () describing MOI exists which is found by a simple recursion. Moreover, we provide explicit formulas for asymptotic confidence intervals, and show that profile-likelihood-based confidence intervals exist, which are found by a simple two-dimensional recursion. Based on the confidence intervals we provide alternative statistical tests for the MOI parameter. Finally, we illustrate the methods on three malaria data sets. The statistical framework however is not limited to malaria.


Introduction
Infections are ubiquitous and ecologically complex processes. Indeed the chain of events conducing to the colonization and replication of parasites within a host involves many environmental, physiological, and genetic factors both in the host and the infectious agent. A common observation in many host-parasite interactions is that there are multiple genetically distinct lineages of the pathogen infecting the same individual host [1][2][3]. Whereas in some diseases such as malaria, this is considered an important parameter, in others it is still somehow a neglected aspect that is just starting to be considered [2].
The observation of multiple genetic variants or multiplicity of infection (MOI) is indicative of the transmission dynamics since it allows for the co-transmission of different parasite variants or the overlap of several genetic variants due to multiple infectious contacts. Thus, the incidence of MOI or superparasitism per se is an important metric of exposure [2,[4][5][6][7]. In addition to its epidemiological importance, as many other ecological processes involving genetically distinct individuals, MOI leads to several outcomes derived from the interactions among lineages. This process is usually referred to as the intra-host dynamics [3].
During the last two decades, the outcomes of intra-host dynamics have been the subject of several theoretical and experimental investigations exploring a broad spectrum of scenarios. Usually, such studies focus on major effects that different interconnected factors have in terms of parasite dispersion (parasite fitness) and/or the elicited manifestations of disease that may lead to an effect on the host's fitness [3,[8][9][10][11]. Furthermore, intra-host dynamics also affect the spread of parasite lineages with adaptive mutations conferring resistance to antimicrobial agents or that allow the evasion of immune and/or vaccine-mediated protection [12,13]. Under all these circumstances, following or measuring MOI as a parameter is essential whenever epidemiological inferences or models involving intrahost dynamics are formulated.
Although it is possible to control or measure the number of distinctive parasite lineages in models and experimental settings (e.g. [14]), a totally different scenario is the one faced by those studying naturally occurring infections in the context of ecological and epidemiological investigations [4][5][6]15,16]. Under such circumstances, MOI is usually measured by ad hoc metrics that rely on a set of genetic markers or the observed polymorphism in one or several genes [2]. The need for an experimental definition of MOI has generated approaches based on phylogenetic frameworks (e.g. many viruses) or some form of multi-locus genotyping [2,17]. Whereas such approximations have been useful, there is still need for a formal statistical framework that allows the estimation of the actual number of lineages and other approximations to MOI that facilitates and/or considers confounding factors.
Given the broad spectrum of genetic architectures observed in parasitic organisms, it is not possible to define a universal framework of MOI. E.g. HIV accumulates mutations at a rate that allows for the use of phylogenetic base methods [17]. On the other hand, eukaryotic parasites such as Plasmodium, Trypanosoma, Toxoplasma, and Schistosoma [18,19] and bacteria such as Mycobacterium [16] evolve at a rate at which it is possible to determine a stable number of genetically distinct lineages during the course of an infection given a set of genetic markers. In this investigation, we describe a formal statistical framework to estimate MOI that allows, among other aspects, building formal tests for comparing groups, e.g., before or after deploying an intervention such as a vaccine, complicated versus non-complicated cases, populations with different exposures, among other possibilities.
More specifically, we further develop the maximum-likelihood framework introduced by [20], which allows to estimate MOI and prevalences of pathogen lineages from a single genetic marker, e.g., microsatellite loci. We establish how to compute ML estimates and confidence intervals (or regions) for all involved parameters. Based on these, we show how statistical tests can be constructed to test the parameters. Although, the framework is -in principle -not restricted to a particular disease or species, we applied it to malaria by comparing data sets from three endemic regions with different levels of endemicity.
The philosophy behind the method section's structure is the following. We first establish the general methods and then refine them assuming that the number of co-infections follows a conditional Poisson distribution. This structure embraces a better understanding of how to derive particular results for alternative choices to the Poisson distribution. Moreover, rigorous mathematical proofs are shifted to the appendix. Readers less interested in these technical details should feel free to skip them.

Methods
We adapt the maximum-likelihood method of [20] to estimate the average MOI. This approach is fully compatible with the model of [12,21] which describes the hitchhiking effect associated with drug resistance in Malaria, for which MOI is a fundamental quantity. Being able to estimate MOI, the model can be 'reverse engineered' to reconstruct the evolutionary process underlying drug resistance. By doing so, a formal means is provided to identify those among the many compounding factors, which can be influenced to slow-down or prevent the spread of drug resistance in the course of public health initiatives.

Model background
Assume n different 'lineages' A 1 , . . . ,A n of a pathogen, e.g., n alleles at a marker locus (or haplotypes in a non-recombining region), circulate in a given population. Particularly, we have neutral markers in mind characterizing linages, so that their frequencies do not change too rapidly, e.g., due to selection. The n lineages considered are those that contribute to infection, not new variants that are generated by mutation inside hosts, but 'fail' to participate in transmission.
Because we identify a pathogen with the allele at the considered locus, we will use the terms 'lineage' and 'allele' synonymously. (We refrain from using the term strain, as we refer here to a genotypic characterization and the term strain may have different meanings across pathogens.) In vector notation, the lineages' relative frequencies are p~(p 1 , . . . ,p n ). An individual (host) is infected by m (not necessarily different) lineages of the pathogen with probability k m . The m lineages are sampled randomly from the pathogen population. Hence, within an infection, the combination of pathogen linages follows a multinomial distribution with parameters m and p 1 , . . . ,p n . Consequently, the probability that m k of the infecting linages carry allele A k (m 1 z . . . zm n~m ) is given by P(mDm)~m m p m , where m~(m 1 , . . . ,m n ), m m :~m ! m 1 ! . . . m n ! is a multinomial coefficient, and p m :~p m1 1 . . . p mn n . Clearly, m summarizes the pathogen configuration infecting a host.
In practice, m is unknown for a given host. It is possible to detect which alleles (or lineages) are present in a clinical sample, but it is difficult to reliably reconstruct m without using next generation sequencing, a technology that is not practical to use in many settings. For instance, if only a single allele, say A 1 , is found in a clinical sample, the patient might have been infected by just one parasite lineages (m~1), or co-infected by several lineages (m~2,3, . . . ), all of which carry allele A 1 . Hence, it is convenient to represent an infection (lineages detected in a patient) by a vector of zeros and ones of length n, referring to the detected alleles (lineages). Hence, a clinical sample is represented by a vector i~(i 1 , . . . ,i n )[f0,1g n \f0g, where i k~1 if A k is found in the infection, and otherwise i k~0 . In mathematical terms i~sign m~(sign m 1 , . . . ,sign m n ). (Remember sign 0~0 and sign x~1 for xw0). Note that the vector~(0, . . . ,0) is excluded, which corresponds to no infection. In the following, m will always denote a vector of nonnegative integers and i a vector of zeros and ones.
Let m be the multiplicity of infection (MOI) with distribution k m . Because k m is unknown in practice, we aim to estimate it from clinical samples -or rather some summary statistics characterizing k m . Assume a total of N clinical samples, taken from different hosts roughly at the same time. We assume that the n lineages A 1 , . . . A n detected in the samples are all lineages circulating in the population. (There is no knowledge of undetectable lineages.) Each clinical sample contains one or more of the n lineages (alleles). (We assume that lineages that infected the host have not vanished due to intra-host dynamics, e.g., drug treatments, and that new lineages have not emerged inside the host, e.g. by mutation, recombination etc.) A clinical specimen with allelic (or lineage) configuration i could descend from an infection with pathogen configuration m as long as sign m~i. Let Q i denote the expected frequency of clinical specimen with allelic configuration i. Then, where the first sum runs over all integers larger than or equal to DiD :~i 1 z . . . zi n , as this obviously is the minimum number of parasite lineages that could have caused the infection. The second sum runs over all possible configurations m of exactly m parasites that lead to the allelic configuration i (i.e. sign m~i), and hence could have potentially infected the host. It follows, that for a given allele-frequency distribution p, Q i is determined by the distribution k m . If infections with the pathogen are rare, a natural assumption is that the number of pathogens infecting a host is Poisson distributed, or more precisely follows a conditional Poisson distribution (CPD), i.e., Of note, this conditions on the fact that each host is infected by at least one pathogen. The mean value of this distribution is m~l e l e l {1 : Assuming the CPD (2), Q i can explicitly be derived. In Analysis (subsection 4.1) it is shown that Since the (natural) likelihood for observing these samples is P i Q ni i , the log-likelihood is given by Assuming the CPD for the number of lineages infecting a host, it is shown in Analysis (subsection 4.2) that the log-likelihood becomes where N k~X i:i k~1 n i~X i[f0,1g n i k n i is the number of samples that contain allele A k . The prevalence of allele k is then N k =N.
Notably, X n k~1 N k §N with equality if and only if exclusively singlelineage infections occur. This is one of two special cases that need to be treated separately. In the other special case all lineages are found in every infection. These cases are somewhat non-generic. We shall therefore formulate the following generic assumption. Assumption 1 Assume that the sum over the alleles' prevalences is larger than one, but not all alleles are 100% prevalent. In other words, more than one lineage is found in at least one infection, i.e., X n k~1 N k wN and not all lineages are found in every infection, i.e., N k =N for at least one k.

Results
In the following l will refer to the parameter of the CPD, or in the general case, to the parameter (or parameter vector) summarizing the distribution k m . In the latter case l~0 has to be interpreted as k 1~1 .
We shall start by deriving the maximum likelihood (ML) estimates for the parameters of interest. Before we do so, we shall start by a rather intuitive observation.
Not surprisingly l~0 can never be an ML estimate if multiple alleles are found in at least one sample, as l~0 implies single infections only. We summarize this in the following remark which is proved in Analysis (subsection 4.3).
Remark 1 If at least one sample contains more than one allele, i.e., X n k~1 N k §N, l~0 is not the maximum likelihood estimate.
To obtain the ML estimate for h~(l,p 1 , . . . ,p n ), (4) needs to be maximized on the simplex, either using the method of Lagrange multiplies or by eliminating one of the redundant variables, i.e., by setting e.g., p n~1 { X n{1 i~1 p i . When using Lagrange multipliers we need to find the zeros of the derivatives of where (Dh t , Db t ) is the solution of the system of linear equations and (h 1 ,b 1 ) is any initial choice of h and b. Here, H(h t ,b t ) denotes the (transposed) Hessian matrix evaluated at (h t ,b t ), i.e., If, in the general case, l is a parameter vector, the derivatives above have to be interpreted accordingly.
In the case of the conditional Poisson distribution (2) the entries of the Hessian matrix are derived in Analysis (subsection 4.4).
Clearly, instead of (6) also can be iterated, which, however, is numerically less recommendable. Alternative approaches would be using an iterative least-square algorithm or the EM algorithm (cf. e.g. [22]).
Of note, in general, an ML estimate does neither necessarily exist, nor is it unique, not to mention that closed formulas typically do not exist. Unfortunately, assuming the CPD (2), the ML estimate indeed cannot be calculated explicitly. However, the estimate exists and is unique. Furthermore, although it can be straightforwardly derived by the above methods, the complexity of whole procedure can be greatly simplified.
Result 1 Assume the conditional Poisson distribution (2) for k m . Under Assumption 1 there is a unique maximum likelihood estimatê h h~(l l,p p 1 , . . . ,p p n ). The first componentl l is the unique positive solution of the equation.
It is found by iterating which converges monotonically and at quadratic rate from any initial value l 1 §l l.
The maximum likelihood estimates of the allele frequencies are given bŷ The result is proven in Analysis (subsection 5.1).
For the sake of completeness we shall also consider the instances in which Assumption 1 is violated. In the first situation, only one pathogen lineage is found in each infection, i.e., there is no indication whatsoever of co-infections. The results are summarized in the following remark which is proven in Analysis (subsection 5.1).
Remark 2 Assume that each sample contains only one allele, i.e., X n k~1 N k~N . Then the ML estimates arel l~0 andp p k~N k N .
In the other non-generic case that all alleles are found in every sample an ML estimate does not exist, more precisely, it is ?, implying that -with probability one -all alleles are in every sample independently of the allele-frequency distribution.
Remark 3 Assume N~N k for all k. Then the ML estimate is ''l l~z?'' for every allelic distribution.
A proof can be found in Analysis (subsection 5.1). Of note, the maximum likelihood has an intuitive interpretation. We summarize this as the following result which is proven in Analysis (subsection 5.1).
Remark 4 The maximum likelihood estimateĥ h~(l l,p p 1 , . . . ,p p n ) is the set of parameters for which the observed number of samples containing allele A k equals its expectations, i.e., Hence, the maximum likelihood maximizes the expectation of the loglikelihood.

Confidence intervals from the profile-likelihood
Letĥ h~(l l,p p 1 , . . . ,p p n ) denote the ML estimate. Confidence intervals can be derived from the profile-likelihood for each parameter.
We are interested in finding a confidence interval (CI) for l. For a fixed value of l, the profile likelihood is defined as i.e., as the maximum likelihood taken over the remaining parameters while keeping the parameter of interest fixed. Moreover, denote the maximum likelihood byL L (clearlŷ L L~max l L (l) ). Suppose l 0 is the true parameter and L (l 0 ) the corresponding profile likelihood. Then i.e. twice the difference of the maximum likelihood minus the profile likelihood assuming the true parameter is x 2 distributed with one degree of freedom (cf. e.g. [23], chapter 4). This can be used to construct confidence intervals for the true parameter l 0 .
To construct a CI at the (1{a) level, we need to find all l satisfying 2(L L{L (l) )ƒc 1,1{a , i.e., we need to find lƒl satisfying 2(L L{L (l) )2 (L L{L (l) )~c 1,1{a , where c n,a denotes a-quantile of the x 2 distribution with n degrees of freedom. In other words, the equation L (l) {L Lzc 1,1{a =2~0 needs to be solved. By definition of L (l) , this means that L l,p ð Þ{L Lzc 1,1{a =2~0 needs to be solved with respect to (l,p), while simultaneously maximizing L(l,p) with respect to p. The latter is done using the method of Lagrange multipliers for fixed l, i.e., . Therefore, following [24] the bound of the confidence intervals are found by solving the following system of equations where l Ã~L L{c 1,1{a Clearly, f (l,p,b)~0 can be straightforwardly solved by a Newton method, i.e., by iterating where (Dl t , Dp t , Db t ) is the solution of the system of linear and (l 1 , p 1 ,b 1 ) is any initial choice of l, p and b. The derivative +f(l n , p n ,b n ) is identical to (7) except for the first line, which needs to be replaced by The derivatives of L are given by (39). Hence, +f(l t ,p t ,b t ) is given by where all derivatives are given by (39) and (40). Again, alternatively (l tz1 ,p tz1 ,b tz1 )~(l t ,p t ,b t ){(+f (l t ,p t ,b t )) {1 : f (l t ,p t ,b t ) can be iterated, which however requires to invert the matrix +f (l t ,p t ,b t ) in every iteration step. The alternatives to the Newton method are again the EM algorithm or an iterated least-mean-square algorithm.
To obtain the confidence bounds l and l it is necessary to iterate (13) from two different initial values. Of note, obtaining one bound for the confidence interval is numerically only as demanding as obtain the ML estimate.
Confidence intervals for the allele frequencies p i are obtained similarly by iterating (13) with obvious changes. Namely, the first component of the function f needs to be replaced by LL(l,p,b) Ll and the (iz1)-th component by L(l,p){l Ã , i.e., f is the gradient of L with the derivative with respect to p i replaced by L(l,p){l Ã . Consequently +f is identical to (7) with the (iz1)-th component replaced by (14).
Importantly, existence and uniqueness of the confidence bounds l and l can be proved under the assumption of the CPD (2). Moreover, it is possible to significantly reduce the complexity of the Newton method (13) to find the CI's bounds. We obtain the following result, which is proven in Analysis (subsection 5.2).
Result 2 Suppose Assumption 1 holds. If k m is given by the conditional Poisson distribution (2), the confidence interval forl l (based on the profile likelihood) is uniquely defined.
The bounds of the confidence interval (l and l) forl l are obtained by iterating where and There are exactly two possible solutions (l,b) and (l,b). The algorithm is converging quadratically for any initial values (l 1 ,b 1 ) sufficiently close to the one of the solutions. The proof is found in Analysis (subsection 5.2). Formally, the above result holds true in the non-generic cases X n k~1 N k §N and N k~N . If all samples contain just one lineage, i.e., X n k~1 N k §N, the ML estimate isl l~0 and the confidence interval has the form ½0,l. If all samples contain all lineages, i.e., N k~N the maximum likelihood estimate iŝ l l~? and the confidence interval has the form ½l,z?), hence it is infinitely large. Although, formally the result still holds, the asymptotic (11) is no longer true, as discussed in Analysis (subsection 6), rendering the result inapplicable if Assumption 1 is violated.

Asymptotic confidence intervals
As an alternative to the profile likelihood, one can use the asymptotic normality of the maximum likelihood to construct confidence intervals. Asymptotically the difference of the maximum likelihood (h~(l l,p p)) and the true parameter (h 0~(l l 0 ,p 0 )) is normally distributed. However, it is important to notice that -unless one eliminates one of the redundant allele frequencies -the Lagrange multiplier b needs to be treated like a regular parameter. The corresponding likelihood function is of course given by (5). Hence, the actual parameters involved areq q~(h,b). The difference of the maximum likelihood (q q) and the true parameter (q q 0 ) is asymptotically distributed according to PLOS ONE | www.plosone.org where I N (q q)~{E( L 2 L Lq 2 )D q~q q is the expected Fisher information and J N (q q)~{ L 2 L Lq 2 D q~q q is the observed Fisher information (based on sample size N). The matrix { L 2 L Lq 2 D q~q q is the transposed Hessian matrix given by (7).
The expression (q q{q 0 )*N (0,I {1 N (q q)) is the convenient, although imprecise notation, for (I N (q q)) 1 2 (q q{q 0 )*N (0,II), where II is the (nz2)-dimensional identity matrix and (I N (q q)) 1 2 the symmetric square root of the Fisher information. Namely, any positive semi-definite, symmetric matrix A (as it is the case of any covariance matrix, and particularly the Fisher information) has a spectral decomposition A~ODO T , where O is orthogonal and D is the diagonal matrix that contains all eigenvalues. These are real and nonnegative, and the diagonal matrix that contains the square roots of the eigenvalues is denoted by D  2 . An often used alternative notation is with I(q q)~1 N I N (q q) and J(q q)~1 N J N (q q).
From (17) the asymptotic distribution of the parameters of interest q follows immediately by dropping the 'dummy' variable b and the corresponding rows and column in the inverse Fisher information. Of note, this is not identical to 'formally' derive the inverse Fisher information based on L and h. Namely, it is important to drive the asymptotic covariance matrix with respect to L and q.
Since (q q{q 0 )*N (0,J {1 N (q q)) the bounds for the (1{a) CI for l 0 are given byl and those for the components of p 0 bŷ Here, z a denotes the a quantile of the standard normal distribution.
Of course, when using the expected Fisher information, J N needs to be replaced by I N . Under the assumption of the conditional Poisson distribution (2), the second derivatives L 2 L L 2 needed to derive the Fisher information are calculated in Analysis (subsection 4.4; eq.39). Moreover, evaluated at the maximum likelihood estimate, N k~E N k , it is seen that the expected and observed Fisher information are identical, i.e., J(q q)~I(q q)~, when assuming (2).
With some algebraic manipulation it is possible to simplify the expressions for the confidence intervals assuming the CPD (2).
Result 3 Suppose the number of co-infections follow the conditional Poisson distribution (2) and that Assumption 1 holds. Then an asymptotic (1{a)-confidence interval forl l is given bŷ Alternatively, the following formula, requires just the ML estimate for l For a proof, see Analysis (subsection 5.3). In the non-generic case N k~N for all k, the ML estimate is not unique, and we havel l~z?. Hence, asymptotic CIs make no sense in this case, neither for l nor for the frequencies p k .
In the case X n k~1 N k~N , it also impossible to derive CIs as the asymptotics (17) break down (cf. subsection 6 in Analysis). Explicit formulas for the CIs of the allele frequencies are obtained similarly.
Result 4 Under the same assumptions as Result 3, an asymptotic (1{a)-confidence interval forp p j is given bŷ The proof can again be found in Analysis (subsection 5.3).

Testing the parameters
In practice, data from several loci is typically available, each of which yields a different ML estimate or there might be some prior estimate for the parameters of interest. Depending on particular properties of the marker loci (mutation rate, allele-frequency spectrum, biochemical issues in determining motif repeats, etc.) different marker loci will lead to different ML estimates. Hence, it is desirable to test whether different estimates are significantly different. The confidence intervals can be adapted to test the parameters.
Clearly, at different marker loci, different alleles will segregate and the allele-frequency spectra will be very different. Hence, for the present purpose, it is meaningless to compare the allele frequencies at different loci. However, the estimate for l should be consistent, as this parameter is the same for all loci. Consequently, in the following we will focus on testing l and present three alternative tests for the null hypothesis H 0 : l~l 0 vs. the alternative H 1 : l=l 0 .
3.1 The likelihood-ratio test. The first test is rather straightforward. Since under the null hypothesis l~l 0 , it is rejected at significance level a if 2(L L{L (l) ) §c 1,1{a : In other words, we reject the null hypothesis for any l 0 that lies outside the (1{a)-confidence interval ofl l, which are obtained as outlined above in ''Confidence intervals from the profile likelihood''. Therefore, this test requires no additional numerical effort if the confidence intervals were already derived.
The corresponding p-value is given by To calculate the p-value, L (l0) needs to be derived first. Similarly as in section in ''Confidence intervals from the profile likelihood'', this leads to the equations +L L l (p,b)( LL Lp 1 , . . . , LL Lp n , LL Lb )~0. Therefore, the system of equations needs to be solved by a Newton method, i.e., by iterating where (Dp t ,Db t ) is the solution of the system of linear equations and (p 1 ,b 1 ) is any initial choice of h and b. The derivative +g l 0 (p t ,b t ) is obtained from (7) by deleting the first row and column and substituting l~l 0 , i.e., where all derivatives are given by (39) and (40).
Result 5 Suppose Assumption 1 and l 0 =0 holds. In the case of the conditional poisson distribution, the p-value under the null hypothesis H 0 : l~l 0 is given by (24), where L (l0) is given by (4) with l~l 0 andp 0 given byp The proof is presented in Analysis (subsection 5.4).
In case of H 0 : l~l 0~0 , there are two possibilities. If X n k~1 N k wN, then L(0,p)~{?. Hence, the null hypothesis is always rejected. This is clear, because if l 0~0 is the true parameter, it is impossible to observe data X with X n k~1 N k wN (see Remark 7 in Analysis, subsection 6). However, if X n k~1 N k~N , then l l~0 and L(0,p)~L L, and the null hypothesis is always accepted. Therefore, in the case of l 0~0 the test can still be formally performed in a meaningful way. However, note that the asymptotic (23) does not long hold true, as l 0~0 does not lie in the interior of the parameter space.
3.2 The score test. In the following, for any parameter choice l 0 , letp p 0 by the corresponding profile-likelihood estimate, i.e.,p p 0~a rg max where Im m 0m m is obtained from the Fisher information with the first row and column deleted. The definitions of the remaining submatrices follow accordingly.
A test for the null hypothesis H 0 : l~l 0 vs. the alternative H 1 : l=l 0 is obtained by using the fact that (cf. Remark 6 in subsection 5.4 of Analysis). The function serves as test statistic, where the data is X~( The correspond- Note that it is legitimate to write L on the left-hand side of (30) because LL Ll~L and that based on the expected Fisher information is The p-values are 2(1{W(DT(X)D)) in either case. The frequencieŝ p p 0~(p p 1 , . . . ,p p n ) are derived as specified in Result 5.
Of note, instead of (30) the ML estimate can be used as a plug-in estimate for the asymptotic variance, i.e., LL Ll D l0,m m 0 *N(0,Il ll l {Il lm m 0 I {1 m m 0m m 0 Im m 0l l ). In this case, it is not necessary to distinguish between the expected and observed Fisher information as they coincide (cf. section ''Asymptotic confidence intervals'').
In summary one obtains: Remark 5 Under the assumptions of Result 6, a test statistic for the null hypothesis H 0 : l~l 0 vs. the alternative H 1 : l=l 0 is whereN N andn n are sample size and number of alleles, in the data yielding the estimatel l.
The proof is analogously to the one of Result 6.
The test cannot be applied in the special cases X n k~1 N k~N or N k~N for all k, as the asymptotic (30) no longer holds true (cf. subsection 6 of Analysis).

The Wald test.
A third test for the null hypothesis H 0 : l~l 0 is an adaptation of the Wald test for the profile likelihood. It is based on the same asymptotic properties that we used to derive confidence intervals namely (q q{q 0 )*N (0,I {1 N (q q)). This is exactly the same as the asymptotiĉ q (l l{l 0 )*N (0,1). Hence, the test statistic q can be used. The p-value is 2(1{W(DT(X)D). Now, we shall consider again the CPD. An explicit expression for (I {1 N (q q)) 11 is given by (54). Hence, we obtain: Result 7 Under the assumptions of Result 5, the Wald test for the null hypothesis H 0 : l~l 0 vs. the alternative H 1 : l=l 0 has the test statistic based on the (expected or observed) Fisher information.
The p-values are 2(1{W(DT(X)D) in either case. Here,l l and the frequenciesp p 0~(p p 1 , . . . ,p p n ) are derived as specified in Result 1.
Alternatively, if the profile-likelihood estimate based on l 0 is used as a plug-in for the asymptotic variance, one can employ (q q{q 0 )*N ( ,J {1 N (l 0 ,m m 0 )) or (q q{q 0 )*N (0,I {1 N (l 0 ,m m 0 )). In the first case, using (53) implies that the test statistic changes to In the second case, (54) implies that the test statistic changes to Also the Wald test cannot be applied in the special cases X n k~1 N k~N or N k~N for all k, as the asymptotic for (q q{q 0 ) no longer holds true (cf. subsection 6 of Analysis).

Testing the method
Although -as we have seen -most of the theory works quite general, assuming a CPD for the number of co-infections permits to derive explicit results or, at least, reduces the complexity significantly. However, assuming a CPD might not be justified. Therefore, it is desirable to have a test for the model's fit. Namely, let be the likelihood assuming a perfect fit to the data, in which the expected frequencies of infection with stain configuration i equal their observed frequencies. In other words, L N is the maximum likelihood of the saturated model. As there are 2 n {1 possible allelic configurations i infecting a host, L N has 2 n {2 degrees of freedom. The maximum likelihoodL L~L(l l,p p) of the reduced model (assuming the CPD) has n{1 independent allele frequencies and one Poisson parameter. Therefore, Hence, the following test can be used. Result 8 To test H 0 : ''the conditional Poisson distribution is justified'' vs. H A : ''the conditional Poisson distribution is not justified'', the test-statistic T(X)~2(L N {L L) can be used. The p-value is given by .x 2 2 n {n{2 (T(X)) It should be mentioned that the above test might perform poorly if the number of lineages or alleles n is large. The reason is that the x 2 distribution has too many degrees of freedom. This might be the case when using hyper-mutable microsatellite markers with 10 or more alleles found across samples.

Application to data
As an illustration, the methods are applied to three previouslydescribed data sets [25][26][27]. Each of which comprises molecular data from P. falciparum-infected blood samples from endemic areas with different levels of malaria incidence. For each blood sample, parasite DNA was extracted and several microsatellite markers assayed.

Preliminary remarks
It is important to note beforehand that only (selectively) neutral markers should be included in the analysis. Namely, loci linked to others that are targets of selection (e.g., mdr1, crt, dhfr, dhps in P. falciparum that are associated with selection for drug resistance) will have skewed allele-frequency distribution. Hence, using these markers might lead to artifacts and severe misinferences. In practice, a marker located on a chromosome not carrying a strongly selected gene (e.g. resistance-conferring gene), can be regarded to be neutral. Moreover, clinical samples from groups that will be compared need to consider confounding effects such as differences in treatment polices, control interventions, and changing transmission intensities (e.g., a group should not contain samples from two time points during which treatment policies changed). By not considering such effects, the estimates of MOI would be inappropriate. For these reasons, we only used parts of the available data sets.

Data description
The first data set emerged from a longitudinal study conducted in Asembo Bay, a hyper-endemic region in Kenya, and was described in [27]. We included five (neutral) microsatellites on chromosome 2 and four (neutral) markers on chromosome 3. Additionally, we included two markers on chromosome 8, quite close to dhfr, which are common to all three data sets and meet Assumption 1. Only blood samples collected in the first study year (mid 1993 to mid 1994) were included, resulting in 42 blood samples.
The second data set described in [26] is from a study from Yaoundé, Cameroon, a region of intermediate/high transmission. Besides the two markers on chromosome 8 mentioned above, we included all eight available (neutral) microsatellite markers on chromosomes 2 and 3 from all 331 blood samples (data of one of the 332 original samples was unavailable).
The third data set is from Bolivar State, Venezuela, a region of low transmission. It was described in [25] and consists of 97 blood samples. Due to the low transmission intensities, for most markers each blood samples contains only one allele, violating Assumption 1. We included all markers that met Assumption 1 as well as all available neutral markers. Particularly, we included four on chromosome 2 and three on chromosome 3, two markers on chromosome 8 and one on chromosome 4, which are sufficiently distant from respectively dhps and dhfr to be considered neutral, and the two makers on chromosome 4, which were also included in the other data sets. All 97 blood samples were used.

Results
The results are summarized in Figures 1 and 2 and Tables 1-3. In all cases, the test for the model fit (cf. Result 8) justified the assumption of the CPD (cf. Tables 1-3). This is important because the three locations exhibit different transmission intensities. In all three regions, the ML estimatesl l or rather the mean MOI, l lel l el l {1 , obtained from different marker loci are fairly consistent.
As expected, most variation in the estimates is observed in Kenya because of the low sample size. Moreover, the transmission intensities are stronger, which leads to more variation in allelefrequency spectra among marker loci, resulting in more variation among the ML estimates. From Figure 1 it is apparent that the estimates for MOI are highest in Kenya, followed by Cameroon, whereas they are very low in Venezuela. This is summarized in Figure 2 showing that the average ML estimates across the regions differ by several standard deviations.
The 95% profile-likelihood CIs forl lel l , are reasonably large for the data sets from Cameroon and Venezuela (cf. Figure 1). However, due to the relatively small sample size, they are much less informative for the Kenya dataset. The asymptotic confidence intervals agree well with the profilelikelihood CIs (cf. Figure 1 and Tables 1-3). This is particularly true for Cameroon, as expected because of the large sample size. In relative terms, this is more pronounced in Venezuela than in the Kenya data set. The reason is that the ML estimates (l l) from the Venezuela data are close to zero, i.e., the boundary of the parameter range. This results in a very skewed likelihood function, yielding quite asymmetric profile-likelihood CIs. On the contrary, in Kenya, the ML estimates are rather large, and the likelihood function tends to be symmetric around its maximum. Furthermore, we tested for pairwise differences between the estimates based on different marker loci. Tables 4-6 report the pvalues for the likelihood-ratio, the Score, and the Wald test for the three regions. In all data sets, all tests perform equally well. There are some discrepancies, mainly due to the above mentioned skewness of the likelihood function. In the case of a skewed likelihood function, the likelihood-ratio test is the most preferable, because it accounts for the skewness.
Tables 7-9 compare the three versions of the Score test, while Tables 10-12 compare those for the Wald test. The results are fairly consistent. However, the versions given by eqs. 34, 37 and 36 of the Score and Wald tests, respectively tend to be most inconsistent with the other tests, especially the likelihood-ratio test. The reason is that these use the roughest approximations.
Overall, the methods perform well for all data sets and provide meaningful results. However, the statistical tests also yielded   (Tables 4-12). The allele frequencies differ of course but all are based on the same true parameter l. If the estimates for l are significantly different, some of them cannot be trusted. This can have various reasons. First, it can be a type I error. However, this occurs only with small probability if the CIs are well calibrated, i.e., their nominal coverage (1{a) is close to the actual coverage. Asymptotic CIs and tests based on them (Wald, Score) will be more affected than profile-likelihood-based intervals, because the former are inherently forced to be symmetric. This is particularly true if the estimates for l are close to zero. To quantify this effect, and to suggest heuristic methods to recalibrate the CIs, a systematic numerical robustness study of the approach is planned. Preliminary investigations, however, have shown that particularly the profile-likelihood-based CIs are well calibrated. Second, the tests are designed to compare the ML estimate based on the data with a value l 0 , which has to be interpreted as prior knowledge. Strictly speaking, it is not meant to be estimated from data itself, or at least data which is available. A test designed to compare two estimates, should incorporate information from both data sets (data from both markers). A standard approach to resolve this is as follows. One could calculate the product of the maximum likelihood from both markers and compare it with the maximum likelihood of both markers conditioned on equality of l. This however would require much more numerical effort than the tests here. Note further, that the structure of the data does not allow to perform a permutation test, because the allele-frequency distributions are expected to be different. This is true for two different marker loci in the same endemic region as well as for the same marker in two different populations. Third, the model assumptions might be violated, i.e., the underlying Poisson distribution might not be correct. This can again be quantified in the coarse of a robustness study.
Fourth, the allele-frequency spectra of two different marker loci is very different, and the method might be sensitive to this. For instance strong skewness in the data distributions might bias the estimates. This is obviously the case if one marker shows no variation at all. Moreover, the number of different allele at different markers is very different, which results in very different probabilities of the ML estimates. These issues again need to be investigated in a numerical study.
Fifth, some STR markers tend to be hyper-mutable. As a result, not just the frequency distribution might be more problematic, but it is also more challenging to correctly identify the tandem repeat numbers. Hence, for hyper-mutable markers the data might have very bad quality. In our examples the marker labelled L1 appears to be hyper-mutable.
Because of all these possible reasons, it would be pre-mature to suggest a heuristic on how to decide, which estimates can be trusted the most. A systematic numerical follow-up study is planned to investigate all these possibilities in detail to provide suggestions on the criteria upon which the data is chosen.

Discussion
The number of genetically distinct lineages co-infecting a hostcommonly referred to as ''multiplicity of infection'' (MOI) -is a key quantity in epidemiology. First, it relates with transmission intensity since it provides a metric for the number of secondary infections after a primary infection; assuming that the lineages circulating are identifiable (e.g. secondary infections within a clonal outbreak simply cannot be traceable). Second, it measures the possibility of genetic exchange among those lineages as determined by the genetic system of the pathogen in question. Finally, if phenotypic differences are associated with those lineages, MOI could lead to very complex dynamics driven by natural selection.
Measuring MOI is desirable in a variety of infectious diseases, but -in many instances -only feasible if it can be measured at low cost and with a reasonable effort. Optimally it should fit into standard study designs and should be easily computable with whatever genotyping data can be collected from clinical specimens. In order to meet these goals, we further developed the maximum-likelihood (ML) method originally proposed by [20] and applied it to three malaria datasets as examples.
From a total of N samples (e.g. blood samples), the number of genetically distinguishable lineages present in each host are recorded. From the resulting data, assuming that hosts are infected randomly by those lineages according to their prevalence, we derived the likelihood function. If infections with the pathogen are rare events, a natural choice for the number of co-infecting lineages is a conditional Poisson distribution (CPD). This distribution comes with the appealing feature that it is characterized by a single parameterl l, whose transforml lel l el l {1 is the average MOI. Assuming a CPD, the likelihood function simplifies as well as the procedure to derive the ML estimates. Although, this was previously described by [20], we were able to derive a number of important results: First, the ML estimate always exists and is unique. Second, it has the intuitive interpretation of being the parameter vector under which the observed are the expected prevalences for the distinguishable lineages, i.e., the observation is the expectation, if the ML estimate is the true parameter vector. Third, the recursion to compute the ML estimate forl l reduced from a multi-to a one-dimensional recursion, which just depends on the number of samples N and the observed prevalences. The ML estimates for the lineages frequencies are explicit functions of l l. Fourth, the recursion forl l converges (at least) from every initial value l 0 wl l. Convergence is monotonically, at quadratic rate, and typically occurs within a few iterations. Besides the obvious computational advantages provided of our results their actual foremost importance is that they justify the ML approach. Using an ML estimates is only appropriate if it has a significantly higher probability than distant alternative parameter choices, which is difficult to evaluate in a multi-dimensional space. However, the form of the ML estimate here -particularly because the lineages prevalences depend continuously onl l -indicates that the observation will have significantly lower probability under distant alternative parameter choices. The method worked well for the three malaria datasets to which it was applied, and gave similar results when applied to different independent microsatellite loci. Although, our results justify the ML approach, it is nevertheless of fundamental importance to provide confidence intervals (CIs). We reported here on asymptotic and profile-likelihood-based CIs for all parameters. Asymptotic CIs are either based on the observed or the expected Fisher information, which under the CPD coincide. Explicit formulas for the CIs for all involved parameters were derived. Profile-likelihood based CIs were already emphasized by [20]. However, it was important to note that they can actually be derived at low numerical costs by using the method of Lagrange multiplies. This reduces the numerical effort to the same magnitude as for the ML estimate. Assuming the CPD, we proved that the CI for the parameterl l, yielding the estimate for the MOI, is uniquely defined. The confidence bounds are derived by a two-dimension recursion, which converges locally at quadratic rate. Both kinds of CIs gave meaningful results for the three data sets to which we applied the methods and they agree well. Although the asymptotic CIs are easier to derive, we suggest to use the profile-likelihood-based CIs if sample size is low and/or the ML estimate forl l is small for the reasons discussed in the application section. Although, we discussed CIs for the linages' frequencies, these are somewhat less interesting, unless one focuses on the prevalence of a particular linage. Otherwise one should derive confidence regions on the simplex for the lineage frequencies, which is done as outlined, but numerically more demanding.
To test the ML estimate against other parameter choices typically three statistical tests are used, the likelihood-ratio, the Score, and the Wald test. The latter two are based on the asymptotic CIs, while the likelihood-ratio test builds upon the profile-likelihood-based CIs. Motivated by our intention to apply the methods to malaria we focused on using these tests to compare estimates for the parameterl l. Namely, several genetic markers characterizing linages are typically available (e.g., several microsatellite markers), to all of which the methods are applicable. While the true parameterl l is of course the same for all markers, the ML estimates obtained from them will differ. It is therefore important to test whether these estimates differ significantly. The parameterl l changes on temporal and spatial scales. An obvious question is, whether MOI changes over time (e.g. before and after the implementation of control measures) or varies across endemic regions. Hence, it is important to test for significant differences in estimates forl l.
Not surprisingly all tests described perform equally well as they are asymptotically equivalent. However, as in the case of CIs we suggest to use the likelihood-ratio test if sample size is small or the parameters compared are small. If interested in p-values additional effort is required for the likelihood-ratio test, because a twodimensional iteration needs to be performed. However, numerically this is only as demanding as obtaining the CIs. Because the test statistics for the Score and Wald tests can be derived, it is easy to derive p-values in these cases. For each of these two tests we provided three alternative variants, which all worked almost equally well in the provided examples. We should point out that it was our intention to indicate only how tests for the parameters can be constructed. With the usual approaches one could compare multiple parameters at the same time, including the information of    Table 5. See description of Table 4 but for the Cameroon data set.  Table 6. See description of Table 4 but for the Venezuela data set.    Table 8. See description of Table 7 but for the Cameroon data set.  Table 9. See descriptions of Table 7 but for the Venezuela data set and Table 6.  Table 10. The same as Table 4 Table 11. See description of Table 10 but for the Cameroon data set.  Table 12. See description of Table 10 but for the Venezuela data set and Table 6.  all these markers. This however, exceeds both our intention and the scope of this article. Finally, as a justification for using the CPD, which simplifies the method to a great extent, we summarized the test suggested by [20]. Although the test will be uninformative if many lineages are present it provides a justification for the approach. Of note, the CPD is an intuitive assumption if infections are relatively rare events. This does not relate with the overall prevalence but rather with how high the observed incidence is in a given population in terms of the time scale required for the pathogen to complete its transmission cycle. Such relationship is hard to establish without complex simulations but it is worth noting that there could be biologic scenarios (particular pathogens or epidemiologic settings) where this assumption does not hold. Thus, it is advisable to check whether the CPD assumption is violated using the tests for the model fit proposed in this investigation. In our case of study, we observe robust estimates across very different epidemiologic settings. Overall, the methods developed here can be used to compare groups under different exposures, different manifestations of disease, groups of patients that have different genotypes (e.g. sickle cell or any other hemoglobinopathies associated with protection), or the efficacy of a given vaccine. Biologically, this method assumes that the rate of evolution of the marker used is ''low'' relative to the time of the infection. That is, there is a ''numerable'' set of lineages that can be estimated and no variants are generated during the time scale of one infection. Thus, it is not suitable for pathogens such as HIV or any other hypervariable virus. The second assumption is that the set of markers used to detect and characterize the MOI are effectively neutral, so they are not linked to genes under selection. Thus, the loci cannot be associated with antigens or drug resistance. As presented, each loci is considered independent, which is a typical assumption of genotyping base approximations used in molecular epidemiology. We also want to emphasize that this MOI estimate depends on the number of detectable lineages given a laboratory method. Thus, results from different markers such SNPs or microsatellites are expected to differ as a function of their differences in mutation rates and mode of evolution. One could actually calculate the fit of individual loci and then exclude potential outliers if there is any biological reason to do so (e.g. microsatellites under different evolutionary models where one is hyper-variable or nonvariable when compared with others). The method is sensitive enough to detect differences in MOI under different epidemiologic settings as indicated by the analyses of empirical data. Whereas this is not per se a ''genomic'' method, in the sense that is not designed to estimate MOI directly from reads generated from next generation sequence (NGS) data, it can do so from a given set of SNPs or microsatellites detected by using NGS. Whereas the method was originally intended for applications to malaria, it can be applied to other parasitic or microbial diseases where the assumptions are not violated. E.g. variation on the VNTRs in a multi-clonal infection of Mycobacterium tuberculosis. Unlike empirical approaches where simply alleles are counted and then averaged, the proposed ML method provides a robust and computationally efficient statistical framework that can be integrated in epidemiological investigations.

Analysis
1 The Model Here, Q i given by (1) is explicitly derived under the assumption that k m is given by the CPD (2). Namely, because Nv X n k~1 N k (note that this holds also true if p k~0 for some k). This proves that l~0 is not a maximum likelihood estimate, which is quite intuitive. % 1.4 Derivatives of the log-likelihood. Assuming the CPD (2) the log-likelihood function is given by (4) and the derivatives of (5) are hence straightforwardly calculated to be LL Ll~L and since L does not depend on b The entries of the Hessian matrix (7), i.e., the second derivatives of L, given by (5), are calculated to be . Therefore, we obtain le l e l {1 or proving the last assertion. Hence, it remains to prove the statements forl l. By using (41) and equating (39c) to zero, we obtain , which is equivalent Therefore, the ML estimate is a solution of (42). Straightforward calculation gives Note that, f (0)~0 and f '(0)~1{ X n k~1 N k N v0, because Since, N k vN for at least one k, f ''(l)w0, implying that f (l) is strictly convex for l §0. Because f is strictly convex there can be at most one positive solutionl l of f (l)~0. Moreover, f is strictly monotonically increasing for lwl l.
The solution can be found by a Newton method. Because f (l) is strictly convex and monotonically increasing for l §l l, the Newton method converges monotonically to the solutionl l. Moreover, because f ''' is continuous, the rate of convergence is at least quadratic. Noting that l tz1~lt { f (l t ) f '(l t ) yields (9) completes the proof. % The special case, in which only single infections occur, is summarized by Remark 2. It can be proven as follows.
Proof of Remark 2. Examining the proof of Result 1 yields that that the ML estimate is any positive root of f (l)~0. In the present case f (0)~f '(0)~0. However, since N k vN must hold for at least one k, f is still strictly convex. This implies that f (l)w0 for all lw0. Hence, no maximum likelihood estimate with lw0 exists.
Moreover, since which is maximized at p k~N k N . Particularly, the likelihood function is finite in this case. % In the other non-generic situation, every lineage is found in all samples, which is described in Remark 3 and can be proven as follows.
Proof of Remark 3. The proof of Result 1 yields f (l)~l(1{n). Hence, f (l)~0 has no positive solution, and hence no ML estimate with lw0 exists. Clearly, Remark 1 states that l~0 is also not an ML estimate.
In this case the log-likelihood function simplifies to L(l,p)~lim Since L~0 implies that the likelihood is one, this limit case, which is -of note -independent of the allele-frequency distribution, is the maximum likelihood. % Remark 4 states that the expected number of samples containing a given lineage equals the observed number of samples containing this allele if the ML estimate is the true parameter. The proof is as follows.
Proof In the following we will use that Q i~E n i N . To simplify the notation assume k~n. Hence, Since the alleles can be arbitrarily labeled, we obtain The proof is completed by noting that EL is obtained from (4) by replacing N k with EN k .
2.2 Profile likelihood based confidence intervals. The existence sand uniqueness of the profile-likelihood-based confidence intervals are proven as follows.
Proof of Result 2. The proof consists of several parts.
Part A: Existence in the generic case. We first assume X n k~1 N k wN and N k =N for at least one k and prove the CI's existence.
The CI's bounds satisfy (12). The equations LL Lp k~0 , yield b~N k l e mpk e mp k {1 , or which implies that bwN k l must hold for all k. Since, X n k~1 p k~1 , by summing up the above expression one arrives at ). Thus, for fixed l the Lagrange multiplier b is a zero of the function Its derivative is given by Hence, g is strictly monotonically increasing in b, and consequently has at most one zero b(l).
is a solution of (12), where p is given by (44). This proves the existence of the CI's bounds. Part B: Uniqueness in the generic case. Next, the uniqueness of the confidence intervals is proven. Assume two values l 1 vl 2 with L(l 1 ,b(l 1 )){l Ã~L (l 2 ,b(l 2 )){l Ã~0 . Since L(l,b(l)) is continuously differentiable the mean value theorem implies that there exists l 1 vl lvl 2 with dL dl (l l,b(l l))~0. Application of the chain rule yields dL dl (l,b(l))~L L Ll (l,b(l))z X n k~1 LL Lp k (l,b(l)) Lp k (b(l)) Ll .
By definition of b(l), the relation LL Lp k (l,b(l))~b(l) holds.
L dl (l l,b(l l))~L L dl (l l,b(l l))~L L dl (l l,p p), wherep p is given by (44) with b~b(l l). This implies that (l l,p p,b(l l)) is a zero of (39), or, in other words, thatl l is a maximum likelihood estimate. Because of its uniquenessl l~l l, and l 1 vl lvl 2 . Hence, l 1 vl 2 vl l or l lvl 1 vl 2 is impossible, and the CI is therefore uniquely defined. Part C: Existence and uniqueness in the non-generic cases. In the case X n k~1 N k~N the same proof holds with obvious modifications. As It follows that at least one solution of (12) exist. The above proof of uniqueness, implies that this is the only solution.
Similarly, for N k~N for all k, (48) is violated and becomes lim l?0 L(l,p)~0, from which the existence of exactly one solution of (12) follows from the same proof as in the generic case.
Part D: Derivation of the CIs in the generic case. Parts A and B reveal that the bounds of the CI's bounds are the two solutions ((l,b) and (l,b)) of the equations L(l,b){l Ã~0 and g l (b)~0, where g l (b) is given by (45), and L(l,b)~L(l,p) with p~p(b) given by (44). A little algebraic manipulations yields that L(l,b) is given by (16e).
The solutions can be found by a Newton method. Straightforward calculation gives where A~A(l,b) and g l (b) are given by (16c) and (45) or (16d), respectively. Hence, the Newton method leads to the following iteration Due to its relatively simple form, the above matrix can be easily inverted and the iteration can be rewritten as (16a) and (16b).
The Newton methods converges locally quadratically if the above matrix is nonsingular in the solutions. Part A of the proof reveals that these solutions satisfy g l (b)~1, yielding Lg l Ll~1 l 2 (l{A). Hence, the matrix simplifies to According to the proof of Result 1 this condition is only fulfilled at the unique ML estimate. Hence, det M=0 in (l,b) and (l,b). Therefore, the Newton method converges quadratically for any initial value sufficiently close to the respective solution. %

Asymptotic confidence intervals.
Proof of Result 3. This proof is slightly more general than necessary as we will re-use part of it later.
First, consider a matrix M~(m ij ) with the following structure LetM M~(m m ij ) ij~M {1 . We aim to derivem m 11 . We do so by inverting M blockwise. Namely, The formulae applies whenever, d i =0 and the 2|2 matrix We are now ready to derive the confidence interval given by (18). To derive (J {1 N (q q)) 11 we first note that (7), (40) and rearrangement of the parameters imply that the Fisher information matrix has the form (50), with a~{ L 2 L Ll 2 (q), b k~{ L 2 L Lp k Ll (q) and d k~{ L 2 L Lp 2 k (q) given by (40), and (J {1 N (q q)) 11 corresponds tom m 11 . Therefore, Substituting the above with q~q q into (18) -using the fact that I N (q q)~J N (q q) -yields (20) after after a little algebraic manipulation.
Substituting this into (54) gives Substitution of the above evaluated atq q (using the fact that N k~E N k ) into (18) yields (21) after some rearrangement.
Proof of Result 4. To simplify the notation, we first derive the formulas for the confidence interval of p n . By re-arranging the parameters as in the proof of Result 3, it is obvious that the matrix M given by (50)   By replacing n by j, one obtains the confidence interval ofp p j given by (22). Second, both l 0 and p 0 can be replaced by the ML estimate (l l,p p). In this case the expected and observed Fisher information coincide.
Proof of Result 6. The remark is proven by explicitly deriving the test statistic. To simplify the notation we write l and p for l 0 and p 0 , respectively. To derive J ll {J lm J {1 mm J ml (or I ll {I lm I {1 mm I ml ) we can follow the proof of Result 3. From the blockwise inversion formula (51) the relation follows immediately, where the denominator on the left-hand side is given by the reciprocal of (53).
Noting that LL Ll~L Of course, (56) also holds if J N is replaced by I N , where (I {1 N (q)) 11 is given by (54). Thus, the same reasoning as above yields (33). %

The casel l~0
3.1 Log-likelihood. In the limiting case that the true parameter is l~0 the conditional poison distribution becomes k m~1 for m~1, 0 for mw1 : Following the derivations in subsection 4.1, Q i becomes Q i~p k for i~e k , 0 else ,