Gaussian Mixture Models of Between-Source Variation for Likelihood Ratio Computation from Multivariate Data

In forensic science, trace evidence found at a crime scene and on suspect has to be evaluated from the measurements performed on them, usually in the form of multivariate data (for example, several chemical compound or physical characteristics). In order to assess the strength of that evidence, the likelihood ratio framework is being increasingly adopted. Several methods have been derived in order to obtain likelihood ratios directly from univariate or multivariate data by modelling both the variation appearing between observations (or features) coming from the same source (within-source variation) and that appearing between observations coming from different sources (between-source variation). In the widely used multivariate kernel likelihood-ratio, the within-source distribution is assumed to be normally distributed and constant among different sources and the between-source variation is modelled through a kernel density function (KDF). In order to better fit the observed distribution of the between-source variation, this paper presents a different approach in which a Gaussian mixture model (GMM) is used instead of a KDF. As it will be shown, this approach provides better-calibrated likelihood ratios as measured by the log-likelihood ratio cost (Cllr) in experiments performed on freely available forensic datasets involving different trace evidences: inks, glass fragments and car paints.


Introduction
A likelihood ratio represents a ratio of likelihoods between two competing hypothesis. In the context of forensic science, these two hypotheses are that of the prosecution, H p (for instance, the suspect originated the crime scene mark), and that of the defence, H d (for instance, the suspect is not the origin of the crime scene mark). If some samples of a given material coming from a known source (control data) and some others coming from an unknown source (recovered data) are given, both known as the evidence (E), and some other information (I) related to the crime is available, the trier of fact (judge or jury) looks for the ratio between the probabilities be shown, this KDF approach overestimates the between-source density function in some areas of the feature space for datasets where sources are grouped in several clusters.
In order to avoid this problem, an alternative approach is presented in this work, in which the between-source distribution is represented by means of a Gaussian mixture model (GMM) [16,17], whose parameters are obtained through a maximum-likelihood (ML) criterion, with the aim of obtaining a better representation of how the parameter being modelled (sources mean) varies across the different sources observed in the background population. As being also a probabilistic method for clustering data, GMMs provide a better representation of such kind of datasets, which leads to obtain better calibrated likelihood ratios.
The rest of the paper is organized as follows. In Section [Likelihood ratio computation], the likelihood ratio computation method is presented and the generative model defined. Section [Models for between-source distribution] describes the expressions to be used for a normally distributed between-source variation and those to be used when it is represented by means of a Gaussian mixture; for this latter case, the KDF expression used in [10] is also shown. In Section [GMMs for non-normal between-source distributions], the GMM training process is described, and the differences between using the KDF and the GMM approaches are highlighted. Section [Experimental framework] describes the forensic databases, the experimental protocols and the evaluation metrics, while the results are presented and discussed in Section [Results and Discussion]. Finally, conclusions are drawn in Section [Conclusions].

Likelihood ratio computation
In order to compute the likelihood ratio, the probability of the evidence has to be evaluated under the two competing hypothesis, H p and H d , where the evidence consists in both the control (y 1 ) and the recovered (y 2 ) datasets (see the mathematical notation given in the [Appendix]). If H p is assumed true, the joint probability of both datasets has to be evaluated; on the other hand, if H d is assumed true, each dataset is generated from a different source and hence they are independent.

LR ¼
PðEjH p Þ PðEjH d Þ ¼ Pðy 1 ; y 2 jH p Þ Pðy 1 ; y 2 jH d Þ ¼ If a generative model with parameters Λ for the observed samples is assumed, the Bayesian solution is obtained by integrating out these parameters (if they vary from one source to another) for a given distribution which is usually obtained from a background population dataset, p(Λ|X). pðy 1 ; y 2 Þ pðy 1 Þ Á pðy 2 Þ ¼ R L pðy 1 ; y 2 jLÞ pðLjXÞ dL R L pðy 1 jLÞ pðy 2 jLÞ pðLjXÞ dL Final expressions for the numerator and denominator of the likelihood ratio will depend on the assumed generative model, which defines both the parameters Λ and the specific density functions. In this Section, we will describe the generative model used in [10], and the withinsource distribution will be defined.

The generative model
The two-level random effect model [18] used in [10] can be seen as a generative model in which a particular observed feature vector x ij coming from source i is generated through where θ i is a realization of the source random variable Θ and ψ j is a realization of the additive random noise C representing its within-source variation. This noisy term is taken to be constant among different sources and randomly distributed following where W is the within-source covariance matrix. Thus, the conditional distribution of the random variable X i (from which x ij is drawn), given a particular source i, follows a normal distribution with mean θ i and covariance matrix W Within-source covariance matrix can be computed from a background population dataset, comprising N = m Á n samples coming from m different sources, through being S w the within-source scatter matrix given by where x i is the average of a set of n feature vectors from source i. As the assumed generative model has only one varying parameter, θ, characterizing the particular source, and the observed samples are assumed i.i.d. conditioned on the knowledge of θ, the numerator and the denominator of the likelihood ratio given in Eq 4 can be expressed, respectively, by pðy 1 ; y 2 Þ ¼ Z y pðy 1 jy; WÞ pðy 2 jy; WÞ pðyjXÞ dy ð10Þ where the parameter θ jointly varies for both control and recovered conditional probabilities, as they are assumed to come from the same source (say θ 1 = θ 2 = θ), and pðy 1 Þ Á pðy 2 Þ ¼ Z y pðy 1 jy; WÞ pðyjXÞ dy Â Z y pðy 2 jy; WÞ pðyjXÞ dy ð11Þ where these conditional probabilities can be integrated out independently as they are assumed to come from different sources (say θ 1 6 ¼ θ 2 ). Similarly to the random variable X ij , the conditional distribution of a random variable X i representing the average of a set of n feature vectors {x 1 ,x 2 , ‥,x n } coming from a particular source i is given by Thus, when evaluating the conditional probability of a set of n 1 control samples, y 1 , or a set of n 2 recovered samples, y 2 , they will be evaluated in terms of their sample mean. That is, pðy l jy l ; WÞ ¼ pð y l jy l ; W n l Þ ¼ Nð y l ; y l ; D l Þ; This leads to the following expressions for the previously shown integrals: Nð y 1 ; y; D 1 Þ Nð y 2 ; y; D 2 Þ pðyjXÞ dy ð14Þ and pðy l Þ ¼ where only the distribution of the parameter θ remains undefined.

Models for between-source distribution
Regarding the distribution p(θ|X) from which the parameter characterizing the source θ is drawn, its shape depends on how the between-source variation is modelled. In this Section, two different types of distribution of such parameter, obtained from a background population, are shown. First, we will describe the expressions for a normally distributed between-source variation. While this is not the case under analysis in this work, it will serve to derive the expressions for the non-normal case, which is expressed in terms of a weighted sum of Gaussian densities.

Normal case
If sources means can be assumed normally distributed, Y $ N ðm; BÞ, then where μ and B are, respectively, the mean vector and the covariance matrix of the betweensource distribution. These hyperparameters can be obtained from a background population (with m sources, n samples per source and N total samples) through where the between-source scatter matrix, S b , is given by Using this distribution for the parameter θ of the generative model, the integrals involved in the likelihood ratio computation can be written Using the Gaussian identities given in the Appendix, the numerator of the likelihood ratio can be shown to be equal to: where and Finally, each of the integrals in the denominator is given by Non-normal case When the normal assumption does not hold for the distribution of sources means among the background population data, the between-source distribution p(θ|X) can be approximated by a weighted sum of C Gaussian densities in the following form: pðyjXÞ ¼ where {π k } c = 1, . . ., C are the weighting factors and have the following constraints With this distribution as the prior probability for the parameter θ of the generative model, the integrals involved in the likelihood ratio computation can be written As it can be seen, the Gaussian mixture expressions become a weighted sum of the expressions given for the normal case, and so the probabilities involved in the likelihood ratio computation can be easily derived, resulting in pðy 1 ; y 2 Þ ¼ Nð y 1 ; and pðy l Þ ¼ In [10], between-source distribution p(θ|X) is approximated through a KDF [15] where the kernel function K(Á) is taken to be a multivariate normal function with smoothing parameter, or bandwidth, H = h 2 B: Nðy; As it can be seen, this Gaussian KDF is in fact a Gaussian mixture whose parameters, equating terms in Eq 26, are given by Thus, the between-source variation is approximated by an equally weighted sum of multivariate Gaussian functions placed at every source mean present in the background population, x i , being their covariance matrices given by h 2 B. That is, a weighted version of the betweensource variation is translated to each source mean present in the background. As we will show later on, this will lead to overestimations of the between-source density in some areas of the feature space.

GMMs for non-normal between-source distributions
In this work, we propose to use a Gaussian Mixture Model (GMM) trained by means of a maximum-likelihood (ML) criterion in order to represent the distribution of the parameter θ characterizing the source. This model assumes that the observations are generated from a mixture of a finite number of Gaussian densities with unknown hyperparameters. Thus, it has been widely used to model the distribution of datasets in which the observations are grouped in several clusters, being each of them represented by a Gaussian density. In the case at hand, the observations are the means of the sources ( x i ) present in the background population dataset (X), from which the distribution p(θ|X) is going to be modelled.

GMM training
Maximum likelihood (ML) is a method of determining the parameters F of a model that makes the observed samples the most probable given that model. Conversely to KDF, where the parameters ( x i , H) are first established and the density function p(θ|X) arises from them, in the GMM approach the density function is obtained by maximizing the likelihood of the observed data given the model, p(X|F), from which the optimum parameters of the model are derived. In the case of a GMM of C components in the form of Eq 26, the ML parameters of the model, F = {π c , μ c , S c } c = 1, . . ., C , are obtained [17] by maximizing the following log-likelihood: This can be done through the well known expectation-maximization (EM) algorithm [17,19], which is an iterative method that successively updates the parameters F of the model until convergence. A recipe for this iterative process can be found in [17].
For a faster convergence of the algorithm, usually some steps of the k-means algorithm [17,20] are previously iterated in order to obtain a good initialization of the GMM, as this clustering method provides the mean vectors {μ c } c = 1, . . ., C (known as centroids) and the initial assignment of samples to clusters, from which {π c } c = 1, . . ., C and {S c } c = 1, . . ., C can be obtained.
The specific number of components, C, can be set by different methods. If the feature vectors are low-dimensional, the number of components can be visually estimated by inspecting a 2-D or 3-D projection of the background population data; however, depending on the structure of the data, there can be a lot of ambiguity in this process. Another option is to apply the elbow method [21] in the initial clustering stage, in which the cost function is plotted for different (increasing) number of clusters; for the first number of clusters there will be a great change when increasing the number of clusters, but at some point the marginal gain will drop indicating the proper number of clusters. A similar method can be applied by training GMMs for different numbers of components and evaluating the gain in terms of likelihood when increasing the number of them. Finally, similarly to the previous approach, if different GMMs for different number of components are trained, some model selection methods, like the Bayesian information criterion (BIC) [22] or the Akaike information criterion (AIC) [23], can be applied.
In this work, results are reported for several number of components in order to analyse how the evaluation metrics vary depending on this parameter, and the proper number of components related to the log-likelihood of the background data given the between-source density. For a given number of components, the k-means algorithm is iterated until convergence previously to the EM algorithm. In order to avoid local minima in k-means clustering, 100 random initializations are performed for a given number of components.

GMM versus KDF
For the purpose of illustrating the differences between KDF and GMM approaches, a synthetic 2-dimensional dataset has been generated (see Fig 1), in which 10 samples from 50 sources are drawn from normal distributions with the same covariance matrix (having then the same within-source variation). Sources means are drawn from 2 different normal distributions (25 sources each), each centred at a different separated point of the feature space, and one having a larger variance than the other in one of the dimensions. As a consequence, samples coming from different sources are grouped in two clearly separated clusters, one of them having a larger local intra-cluster between-source variation than the other. Also, the overall betweensource variation is higher in one of the dimensions.
As already shown in Section [Models for between-source distribution], the density function p(θ|X) given by KDF approach is an equally weighted sum of Gaussian densities centred at each background source mean with covariance matrices h 2 B (see Eq 32). Thus, a weighted version of the overall between-source variation is translated to every source mean, reproducing this variation locally at each source mean. The resulting density function p(θ|X) for our synthetic dataset can be seen in Fig 2, where it is shown that the local intra-cluster between-source variation in dimension 1 is highly overestimated for both clusters, and slightly overestimated in dimension 2 for one of them due to the larger variation in the other one.
Conversely to KDF, in the GMM approach the Gaussian components are not forced to be centred at each source mean present in the background population, but a smaller number of components can be established allowing different sources means being generated from the same Gaussian component. Moreover, covariance matrices are neither fixed in advance, allowing to be locally learned for each component. As a consequence, the resulting density function can better fit the local between-source variation and the clustered nature of the dataset, as it is shown in Fig 3 for a 2 However, care must be taken in order to avoid overfitting when computing the density function through maximum likelihood. For a ML-trained GMM, the degree of fitting to the background data can be controlled through both the number of components C of the mixture and the number of EM iterations. In this work, for a given number of components, only two EM iterations are performed in order to avoid overfitting.

Accounting for within-source variation in the background population
When training a GMM from background sources means by maximizing the log-likelihood in Eq 35, it is assumed that there is no uncertainty in these mean values. However, the number of samples per source in the background population can be limited in forensic scenarios, and so these means cannot be reliably computed. In order to account for the uncertainty in these mean values, every observation belonging to those sources can be used to train a GMM by maximizing the following log-likelihood: While there can be not much difference in the values obtained for components means μ c in a well balanced background dataset (same number of samples per source), taking into account the variation of the samples from each source around its mean value through Eq 36 provides a more conservative background density, as every background sample is considered as a possible mean value of a source. Furthermore, this also helps to avoid Gaussian collapsing when a reduced number of sources are assigned to a particular component. The effect on our synthetic dataset is shown in Fig 4, where the Gaussian densities are placed at the same locations as in Fig 3 but larger variances and covariances are obtained, specially for the cluster with lower intra-cluster between-source variation.

Forensic datasets
In order to test the approach proposed in this work, several types of forensic datasets have been used, being one of them the glass-fragments dataset also used in [10], which can be downloaded from http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-9876/homepage/glass-data.txt. A detailed description of the other two datasets can be found in [12], and can be downloaded from http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470972106.html.  Table 1 gathers the already mentioned characteristics of these three datasets, while Figs 5, 6 and 7 show 2-dimensional projections of their sources means. As it can be seen, sources means in the last two datasets (glass fragments and car paints) present a clustered nature, while those in the first one (inks) are normally distributed [12].

Protocols
The protocol followed in [10] used the whole glass-fragment dataset in order to obtain the between-source probability density function p(θ|X). Then, for each source, the first 3 samples (out of 5) were used as control data and the last 3 were used as recovered data, having so both datasets one sample in common. While this non-partitioning protocol alleviates the lack of data due to the small size of the dataset, it may lead to overoptimistic results as the different subsets (background, control and recovered) are overlapped.
In this work, a cross-validation protocol is also used in order to avoid overoptimistic results, in which the dataset is divided into two non-overlapping subsets devoted to: • obtain the between-source distribution p(θ|X) (known data or training subset), and  • compute same-source and different-source likelihood ratios (unknown data or testing subset). This subset is further divided into two non-overlapping halves acting as control and recovered data.
In order to alleviate the lack of data, this procedure is carried out in the following way. For each of the m(m − 1)/2 possible pairs of sources in the dataset, all the samples belonging to those two sources are taken apart from the dataset in order to be used as the testing subset, being the remaining sources used as the training subset. Each of the two sources in the testing subset is divided into two non-overlapping halves ({1a, 1b} and {2a, 2b}) that can be used either as control or recovered data to perform 2 same-source comparisons (1a-1b, 2a-2b) and 4 different-source comparisons (1a-2a, 1a-2b, 1b-2a, 1b-2b). Although the same control and recovered data from a particular source is used in all the different pairs in which it is involved, as the remaining sources change for each different pair, different between-source distributions p(θ|X) are involved in likelihood ratio computations. This procedure allow us to perform a total number of m(m − 1) same-source comparisons and 2 × m(m − 1) different-source comparisons for a given dataset, instead of the m same-source comparisons and m(m − 1)/2 different-source comparisons performed in [10], while the between-source distribution p(θ|X) used in every   Table 2.

Evaluation Metrics
The main evaluation metric used in order to compare the different approaches is the log-likelihood ratio cost function (C llr ) [2,24], which evaluates both the discrimination abilities of the computed log-likelihood ratios and the goodness of their calibration. Given a set of log-likelihood ratios fLg ¼ fL 1 ; L 2 ; :::; L C g obtained from C comparisons, the C llr can be computed in the following way: where 'ss' is the set of N ss same-source comparisons and 'ds' is the set of N ds different-source comparisons. As it is a cost function, the larger the C llr value, the worse the verification method, being C llr = 0 the minimum achievable cost. Note also that this metric allows to define a neutral reference which does not provide support for any of the two hypothesis (that is, L c ¼ 0 for every comparison), providing a reference value of C llr = 1. Thus, a verification method for which C llr is larger than 1 means that it is providing misleading likelihood ratios. An important aspect of the C llr is that it can be decomposed into two additive terms, one due to the discrimination abilities (C min llr ) and another one due to the calibration of the verification method (C cal llr ) where  and C min llr is obtained by means of the Pool Adjacent Violators (PAV) algorithm [25,26] and represents the minimum achievable C llr in the case of having an optimally calibrated log-likelihood ratios set fL 0 g (details can be found in [24]).
In order to show the performance over a wide range of prior probabilities, the Empirical Croos-Entropy (ECE) plots [27,28] will be used. These figures (see, for example, Fig 8) graphically represent what would be the accuracy (solid curve) when using the set of logLR values fLg for each of the prior probabilities (represented as logarithmic odds) in the given range. Additionally, the discriminating power is also plotted (dashed curve) for the optimally calibrated (ideal) logLRs set fL 0 g, along with the neutral reference (dotted curve).

Inks dataset
For this dataset, as the background sources means are normally distributed, GMMs with a single component has been trained by maximizing either Eq 35 or Eq 36. Table 3 shows the detailed results (C llr , C min llr and C cal llr ) for KDF and GMM approaches (Eq 35 and Eq 36) when applying both the non-partitioning and the cross-validation protocols.
First, it should be noted that results in the non-partitioning protocol are slightly better for every method as it is an overoptimistic framework where data is shared between training and testing subsets. Regarding the comparison between methods, it can be seen that no significant improvement is obtained by the GMM approach as the sources means for this dataset do not present a clustered nature. Moreover, among the two GMM variants, the results obtained when maximizing Eq 35 are slightly better, presumably due to the fact that enough number of samples per source are available (n = 10), compared to the number of features (d = 3), to compute reliable sources means, and further uncertainty accounted for Eq 36 seems to be counterproductive.
Finally, Fig 8 show ECE plots for KDF and GMM (Eq 35) approaches when applying the cross-validation protocol, where it can be seen that both present similar performance for a wide range of prior probabilities.

Glass-fragments dataset
For this dataset, several GMMs have been trained, by maximizing Eq 35, in order to analyse how the main evaluation metric (C llr ) varies as a function of the number of components, C. In the experiments carried out, the maximum number of components has been limited to 6 in order to avoid Gaussian collapsing due to a reduced number of observations (sources means) per component (62 total sources in the whole dataset). Results for the non-partitioning protocol can be seen in Fig 9 for both KDF and GMM (Eq 35) approaches, where also the log-likelihood of the background data (sources means) given the between-source density has been plotted.
As it was expected for this non-partitioning protocol, C llr decreases as the number of components increases, due to the shared data between training and testing subsets, which can lead to overfit the background density. However, as soon as the log-likelihood for the GMM surpass that obtained for the KDF density, better results are obtained with the GMM approach. It is also worth noting that this happens for a number of components (2-3) around that which could be expected from visual inspection of the 2-dimensional projections shown in Fig 6.  Fig 10 show the same analysis for the cross-validation protocol. In this case, the log-likelihood is not plotted as the GMM change for every testing sources-pair (being trained on the remaining sources). Similar conclusions than before can be drawn, but here the overfitting problem affecting the non-partitioning protocol is revealed, as the C llr for the cross-validation Table 3. Performance of KDF and GMM approaches on the inks dataset for the non-partitioning and cross-validation protocols.

Non-partitioning
Cross-validation GMMs of Between-Source Variation for LR Computation from Multivariate Data protocol reaches a minimum value for a given number of components (C = 4) and then increases. Results are also shown for GMMs trained by maximizing Eq 36, with similar conclusions but slightly better results, presumably due to the small number of samples per source (n = 5) compared to the number of features (d = 3). Table 4 shows the detailed results (C llr , C min llr and C cal llr ) for KDF and GMM approaches (Eq 35 and Eq 36) when applying both the non-partitioning and the cross-validation protocols. For GMM approaches, results are given for the optimum number of components (C = 4) when the cross-validation protocol is applied. Again, as the non-partitioning protocol constitutes an over-optimistic framework, results are slightly better for every method compared to the crossvalidation protocol. This is also the reason of obtaining better results when GMMs are trained by maximizing Eq 36, as the same sources are present in both training and testing subsets. However, when the cross-validation protocol is applied, there is no shared data between those subsets, and so the additional uncertainty accounted by Eq 36 provides slightly better results. In any case, both GMM approaches outperform the KDF one due to their better calibration properties for this clustered dataset.
Finally, Fig 11 shows the comparative results between KDF and GMM (Eq 36) in the form of ECE plots when the cross-validation protocol is applied.

Car-paints dataset
An equivalent analysis to that shown for the glass-fragments dataset has been performed for the car-paints one. Fig 12 shows both the C llr and the log-likelihood of the background data   C llr value is reached is slightly larger (C = 5); this also happens for the non-partition protocol, as the log-likelihood of the training data (observations) given the model for the GMM do not surpass that of the KDF until a larger number of components (C = 4) is reached. Table 5 shows the detailed results (C llr , C min llr and C cal llr ) for KDF and GMM approaches (Eq 35 and Eq 36) when applying both the non-partitioning and the cross-validation protocols. For GMM approaches, results are given for the optimum number of components (C = 4 for Eq 35, C = 5 for Eq 36) when the cross-validation protocol is applied. Similar conclusions to those obtained for the glass-fragments dataset can be drawn, but much better results are obtained by GMMs approaches presumably due to the distance among clusters, which lead to KDF densities which overestimate the between-source distribution in some areas of the feature space (as shown in Fig 2 for the synthetic dataset). Among GMM approaches, the maximization of Eq 36 leads to much better results for the cross-validation protocol due to the small number of samples per source (n = 3) compared to the number of features (d = 7), which lead to unreliably computed sources means when training GMMs by maximizing Eq 35.

Conclusions
In this work, we present a new approach for computing likelihood ratios from multivariate data in which the between-source distribution is obtained through ML training of the parameters of a GMM. Using the same generative model as in [10], a common derivation of the LR expressions is presented for both Gaussian KDF and GMM, in which the between-source distribution is represented in terms of a weighted sum of Gaussian densities. Then, differences between KDF and GMM approaches are highlighted, and the effects on the obtained probability density are shown for a synthetic dataset. Furthermore, a variant in GMM training has been tested in order to account for the uncertainty in sources means when few samples per source are available in the background data.
The proposed approach has been tested on three different forensic datasets and compared with the KDF approach. Additionally to the non-partitioning protocol applied in [10], a more realistic cross-validation protocol is applied in order to avoid overoptimistic results, as MLtrained GMMs can overfit the background population density. Performance is evaluated in terms of the log-likelihood ratio cost function (C llr ), which allows to decompose the performance in a term due to the discrimination abilities and another one due to the calibration properties. ECE plots have been used to show the behaviour in a wide range of prior probabilities, which is needed in forensic science.
Results show that, although KDF and GMM approaches present similar discrimination abilities, when the datasets have a clustered nature, the between-source distribution is better described by a GMM, leading to better calibrated likelihood ratios. If clusters are not easily distinguishable, the between-source distribution still can be modelled by one single component, obtaining similar results to the KDF approach. Specially remarkable are the results obtained for the car-paints dataset, where *50% improvement in terms of calibration performance is obtained.

Gaussian identities
Product of two multivariate Gaussian functions.
Nðx; a; AÞ Á Nðx; b; BÞ ¼ Nða; b; a þ BÞ Á Nðx; c; CÞ ð 40Þ Convolution of two multivariate Gaussian functions. Z x Nðx; a; AÞ Nðy À x; b; BÞ dx ¼ Nðy; a þ b; A þ BÞ ð43Þ Expressions for a normal between-source distribution Derivation of the numerator. First, we solve the product of the two Gaussian functions depending on either the control or the recovered data means, obtaining the following expression where z ¼ ðD 1 þ D 2 Þ À1 ðD 2 y 1 þ D 1 y 2 Þ ð45Þ and Being Nð y 1 ; y 2 ; D 1 þ D 2 Þ independent of θ, we can solve the remaining integral as a convolution of two Gaussian functions: Finally, replacing D l = W/n l , l = 1, 2, in z and Z we obtain pðy 1 ; y 2 Þ ¼ Nð y 1 ; Derivation of the denominator. Each of the integrals in the denominator of the LR can be solved by the convolution of two Gaussian functions pðy l Þ ¼