A novel computational strategy to predict the value of the evidence in the SNP-based forensic mixtures

This study introduces a methodology for inferring the weight of the evidence (WoE) in the single nucleotide polymorphism (SNP)-typed DNA mixtures of forensic interest. First, we redefined some algebraic formulae to approach the semi-continuous calculation of likelihoods and likelihood ratios (LRs). To address the allelic dropouts, a peak height ratio index (“h,” an index of heterozygous state plausibility) was incorporated into semi-continuous formulae to act as a proxy for the “split-drop” model of calculation. Second, the original ratio at which a person of interest (POI) has entered into the mixture was inferred by evaluating the DNA amounts conferred by unique genotypes to any possible permutation of any locus of the typing protocol (unique genotypes are genotypes that appear just once in the relevant permutation). We compared this expected ratio (MRex) to all the mixing ratios emerging at all other permutations of the mixture (MRobs) using several (1 - χ2) tests to evaluate the probability of each permutation to exist in the mixture according to quantitative criteria. At the level of each permutation state, we multiplied the (1 - χ2) value to the genotype frequencies and the h index. All the products of all the permutation states were finally summed to give a likelihood value that accounts for three independent properties of the mixtures. Owing to the (1 - χ2) index and the h index, this approach qualifies as a fully continuous methodology of LR calculation. We compared the MRs and LRs emerging from our methodology to those generated by the EuroForMix software ver. 3.0.3. When the true contributors were tested as POIs, our procedure generated highly discriminant LRs that, unlike EuroForMix, never overcame the corresponding single-source LRs. When false contributors were tested as POIs, we obtained a much lower LR value than that from EuroForMix. These two findings indicate that our computational method is more reliable and realistic than EuroForMix.


Introduction
In the last two decades, multi-allelic polymorphisms (essentially short tandem repeats; STRs) have been widely used in the routine casework of forensic genetics. Single nucleotide polymorphisms (SNPs) have recently emerged as promising polymorphism to be adopted in this field. SNPs are mostly biallelic, and the Mendelian distribution of their genotypes is determined by

Genotypic matrices and likelihoods
Mixtures are a class of synthetic forensic evidence (E) contained in a biological trace and supposed to originate from a collection of "n" individual genotypes (n > 1; n is also the number of individuals involved in a crime and leaving their DNA in the trace). Mixtures based on the autosomal STRs have multiple patterns of evidence-each pattern is denoted by a definite number of alleles-and a variable number of contributors, each of whom confers the trace one or two alleles (depending on their homozygous or heterozygous condition). When nothing else is a priori known about the trace content except the number of contributors, the evidence should be explained by a series of equally possible combinations of genotypes elicited from classical combinatorial work. If, for some reason, the order in which these genotypes are listed in each combination matters (as in the case of quantitative likelihood calculations), combinations turn into permutations, and the number of admissible records equally accounting for the same evidence increases. Therefore, mixtures may be conveniently represented as genotype permutation matrices, comprising as many columns as the alleged contributors and as many records as the permutation states embodying the evidence.
Mastering the combinatorial work of STR mixtures involves harnessing a system with multiple matrices and thousands or tens of thousands states, a task that may be challenging and error-prone [37]. By contrast, the combinatorial work involving SNP mixtures is simple. Here, the evidence may only be a single-allele E{a} or double allele E{a;b}. With E{a}, no matrix is generated. Nonetheless, E{a;b} generates only one matrix for each number of contributors, all with an acceptable number of permutations (one can count 7 permutation records at the 2PM matrix, 25 records at the 3PM matrix, 79 at the 4PM, 249 at 5PM, 727 at 6PM, 2185 at 7PM, and 6559 at 8PM). If a minimal, standard nomenclature is adopted for the combinatorial items, only one matrix will be suitable for whatever locus of interest. Here, for example, we will adopt: "a" for the quantitatively predominant allele, "Q a "for the number of its DNA molecules, "f a " for its gene frequency; "b," for the minor quantity allele, "Q b " for its DNA quantity, "f b " for its gene frequency; AA, AB, and BB for the biallelic genotypes; Q AA , Q AB , Q BB for their quantity; P(AA) = f a 2 , P(AB) = 2f a f b , P(BB) = f b 2 for their probability in the reference population. In our matrices, the alleles at all loci of the protocol were redenominated in the same way by changing the original wildtype major/mutant minor allele frequency (MAF) nomenclature most often found in the literature: as expected, the genotypes behaved accordingly (e.g., Q AA > Q BB ). Permutation matrices generated by E{a;b} under the only prior hypothesis of knowing "n" and no more than n represent the most generic way of interpreting mixtures and the most basic likelihood to calculate. All individuals contributing to the trace are here regarded as unknown (here we will use a 'U' to denominate every single unknown individual within a synthetic likelihood formula).
When further prior hypotheses are formulated introducing η persons of interest (POIs; also denominated as "known contributors"; with 1 � η � n; we will use a letter 'K' to denominate each individual POI placed within a synthetic likelihood formula), the inference to draw becomes conditional on both n and η [38]. As noted for E, the POI reference sets can have a single or biallelic pattern (K{a}; K{a;b}). A POI genotype is used to transform the all unknown, general matrix (a likelihood with n Us) into one with few permutations (only those permutations that contain the POI genotype; synthetically: a likelihood with one K and (n-1) Us). More than one POI with further restrict the number of selected permutations. For practical examples of how to reproduce this procedure on a worksheet, the reader should refer to some matrices (NITZq_SNPs_2PM_upl.xlsx; NITZq_SNPs_3PM_upl.xlsx; NITZq_SNPs_4PM_upl. xlsx) to be found in the S1 File of this article.

Semi-continuous likelihoods
When compared to one another, the evidence and the POI reference sets generated nine assortments, as shown in Fig 1; this figure also shows the matching cases and exclusions. Exclusions only emerged when the evidence had a single allele. When this happens, either K and E are disjointed sets (K T E = Ø; no allele in common; for example, E{a} comparing to K {b} and vice versa) or E is a subset of K: E � K (whereas in the case of compatibility one would normally expect the reverse K � E to be true).
All of these nine cases may be solved according to the binomial properties: 1. at a single-source trace, the whole evidence is: PE{a;b} = (f a +f b ) 2 These formulae are a generalization of those previously reported [34] and they have been modified by the introduction of rule 3 described in the previous section. As previously noted by these authors, these six formulae do not account for cases where there is a mismatch between the evidence and the POI.

Incompatible likelihoods.
We need to assume that the monoallelic evidence derives from originally biallelic evidence by the loss of one allele (allele dropout). Within our nomenclature system, only the minor quantity allele can drop out of evidence, with some probability (P(D)) that is inversely proportional to the template concentration. In the available literature, dropouts are "stochastic events", and the P(D) values have been inferred from experiments conducted on various template dilutions using logistic regression analysis [39,40]. The resulting experimental values were incorporated into the semi-continuous calculations. A widely used computational approach relying on such dropout probabilities is the "split-drop approach" [41]. The split drop approach was readapted to the case of SNPs [34] and is now adopted by the two widespread software LRMIX [35] and EuroForMix [36].
Instead of using the split drop approach, we will incorporate dropout probabilities (denoted by the letter 'h') into some of our algebraic formulae: PðEfa; b d gjKfbg; PðhÞ; n; PðEfa; b d gjKfa; bg; PðhÞ; n; ZÞ ¼ h or : It should be noted that, with reference to Fig 1, E{a;b d } replaces the original E{a} notation and E{a d ;b} replaces E{b}. P(h) is our proxy for the probability of the minor allele b dropping out of the evidence, replacing the experimental logistic regression values. For its meaning, importance and use the readers are referred to the following paragraphs.

Dropped out and still compatible cases.
Although the concept of dropout is used to denote cases of contributors that do not match the evidence, "exclusion" and "dropout" are not logically equivalent. There may be instances where an allele drops out of the evidence, but the POI remains compatible. Such "latent" dropout cases can be modeled by turning Eqs 1 and 2 into: It should be noted that, as long as our conventional nomenclature (a, major read allele; b, minor-read allele) holds, Eq 2D is never actually used ("a" cannot drop out unless "b"' is first dropped; if both drop the whole evidence disappears and Eq 2D becomes useless).

Re-defining the inferential table.
The algebraic solutions we have just introduced (1d, 2d, and 3-9) can now cover all the inferential scenarios expected to occur at any test based on biallelic markers. An overview of the nine relevant cases and formulas is shown in

The h index as a proxy for experimental P(D) values.
Dropout also occurs at the SNP level [42], but specific dropout values are not available for all SNPs of forensic interest. This is not surprising as SNPs are very numerous. Their P(D) values should be inferred (by treating experimental data with logistic regression) from each locus in use at a given protocol, possibly by each laboratory adopting the protocol, and the relevant task is not easy to achieve. For this reason, a good deal of semi-continuous calculations at SNP mixtures (as those performed by typical users of LRMIX and EuroForMix) tend to rely on P(D) values that are arbitrary. This encourages the search for other solutions.
We found that a convenient proxy for P(D) is the h index [43], in its simplest form (the minor read simply divided by the major read; at the NGS technology, a read is the equivalent of a peak height in an STR electrophoretic diagram). There are several reasons to introduce this index instead of logistic regression-derived P(D) values: a. h applies to single-source stains and mixtures as well.
b. In mixtures, h may represent the entire propensity of the minor-read allele to drop out of the mixed evidence.
c. (1-h) can estimate the probability of the existence of two-allele evidence.
d. It has been noticed [44] that a P(D) can automatically emerge from the peak height analysis, without any external logistic regression modeling; P(D) values emerging from the peak area analysis can only be the h index.
The index h enters the likelihood formulae, as shown in the previous paragraphs. We will further use it for quantitative calculations as well.
2.2.6. Dropins. According to the definition issued by a commission of the International Society for Forensic Genetics [45], a dropin is an allele that is not associated with the crime sample and remains unaccounted for both under the prosecutor and defense hypothesis. In principle, a dropin phenomenon is nothing more than an additional contributor to a mixture, and its case can be modeled by increasing the number of contributors. Because the SNP-based evidence generates relatively simple permutation matrices even with high numbers of contributors, we will address the dropin case by systematically switching the current scheme of computation to one scheme with one more contributor.

Number of contributors.
Predicting the number of contributors entering a SNP mixture may be difficult. The importance of the whole issue is directly proportional to the impact that the value of n may have over the LR values. This issue was explored by simulating 100 mixtures with two to up to six virtual contributors and calculated likelihoods at various quantitative POI proportions [36]. It was consequently found that when the POI is not quantitatively predominant, their two-person mixture model of analysis would return them higher LRs than those with the three-or more-person models. They also found that, from three-person mixtures onwards, increasing the number of contributors has a negligible effect on LR of POIs, regardless of its quantitative predominance. We have an explanation for this phenomenon: unlike higher-rank mixtures, 2-person mixtures cannot have permutation states with the full assortment of biallelic genotypes (AA + AB + BB). Switching from 2PM to 3PM has, therefore, the effect of completing the genotype assortment, elongating each permutation, increasing the number of permutations, increasing the uncertainty and lowering the LR. The same does not occur when switching from 3PM to 4PM or higher-rank models. Consequently, it is logical to believe that 3PM is the minimum standard for biallelic mixture analysis. In this study, however, we would always calculate LRs under three distinct hypotheses (2PM, 3PM, and 4PM). Fig 2 may represent the viewpoint of prosecution or defense in a typical judicial test. A ratio between two of these likelihoods will return the value of the available prevailing evidence. The two parties may choose to opt for the same or different n values, but they usually diverge on the choice of the η value. The prosecutor's choice η1, placed in the numerator of a ratio, is always higher than the defenses' choice η2 placed in the denominator ((η1-η2) � 1). In this paper, the classical LR formula [46] is therefore written as:

LRs. Each likelihood in
2.3. Quantitative likelihoods 2.3.1. An empirical, quantitative approach. The issue of quantitative mixture analysis has been reviewed several times [47][48][49][50][51] and according to the opinion of most authors, a key aspect of this issue is finding a suitable way to infer the peak fraction values pertaining to each contributor from the whole evidence. In a pioneering study [26], the gamma distribution was used to deduct all the individual peak height fractions that coalesce into the mixed STR profile. Concurrently, the Markov chain Monte Carlo (MCMC) simulations were introduced to infer the same information [23]. These two methods have earned much relevance in the 2010-2020 decade, and presently numerous computer programs calculate LRs based on the gamma model or MCMC simulations.
Herein, we adopt a novel empirical method for calculating likelihoods based on quantitative evidence. The rationale of our method is as follows: First, regardless of any available POI, we will quantitatively denote each genotype of each permutation of each locus of interest by converting the synthetic locus-specific evidence into individual genotypic quantity [E(Q a ;Q b )! E(Q AA ; Q AB ;Q BB )] according to basic Mendelian rules. Starting from these individual DNA amounts, we will assign a mix ratio (MR obs ) to each observed permutation. With specific reference to the calculation examples we report in this study, a 2PM protocol based on 133 loci will have 931 MR obs ; a 3PM protocol will have 3,335.
Second, when a POI is available, we will estimate the quantitative contribution of the POI to the mixture by selecting the POI unique genotypes (a unique genotype is a genotype that appears just once in the permutation) to be found at each locus permutation of the protocol, by summing the relevant DNA amounts and dividing this sum by the total DNA amount in the protocol. An overall mix ratio (MR ex ) is obtained. There will be only one MR ex per mixture protocol, and MR ex will be regarded as the ratio expected to invariably show up at each locus when the POI is the true trace contributor.
Finally, we give each permutation state appearing at each locus a distinctive probability to exist based on a (1χ 2 ) statistical test. To follow the previously cited examples, 931 tests will be performed at the 2PM/133 loci and 3,335 (1χ 2 ) tests at 3PM. At each (1χ 2 ) test, MR ex will be the expected MR dataset, and the permutation-specific MR obs will be the observed datasets. Calculating (1-χ 2 ) is an essential step in the process of calculating quantitative likelihood values.

Giving MRs to all permutation states.
As a first step in quantitative calculation, the synthetic evidence E{Q a ;Q b } is to be converted into genotypic evidence E {Q AA ;Q AB ;Q BB } by using rules that can apply to all mixture schemes. We found that, regardless of the contributor number, the most parsimonious genotype combinations generating the E{a;b} evidence are: rAA+sBB, with n = r+s rAA+sAB, with n = r+s rAA+sBB+tAB, with n = r+s+t (not applicable to a 2PM) rAB+sBB, with n = r+s nAB These five combinations account for all possible mixture permutations, whatever the number of contributors may be. Because apportioning individual quantities to identical genotypes is impossible, groups of identical genotypes have to be assigned just one DNA amount. The following rules will consequently enable to calculate the mixture ratio everywhere: With the exception of rule 4, these rules are based on elementary autosomal Mendelian principles and they return the same mixture ratio value whatever the number of contributors may be. A network of quantities is consequently distributed to all permutation states, as it is shown in the three matrix examples (NITZq_SNPs_2PM_upl.xlsx; NITZq_SNPs_3PM_upl. xlsx; NITZq_SNPs_4PM_upl.xlsx) to be found in the S1 File of this article.

Finding the 'expected' mixture ratio of a trace.
In SNP matrices, the genotypes have to overlap at individual alleles (AA + AB; AA + AB + BB) and to be redundant (for example AA+AA; AB+AB; BB+BB; "redundant" is here said of a genotype that appears more than once within the same combination) to make room for additional contributors. When contributors grow in number, the relevant combinations generate overlaps and redundancies of various orders of complexity. However, even within the context of heavy genotype stacking, a network of unique genotypes persists regardless of the mixture type. Unique genotypes are genotypes that share their DNA amount with no other and are of two types: 1. non redundant-non overlapping genotypes (NOGs; with parsimonious structure AA+(n-1) BB; (n-1)AA+BB); here if the unique genotype is BB then Q BB = Q b ; if the unique genotype is AA, then Q AA = Q a .
2. Non redundant-simply overlapping genotypes (SOGs; with structure AA+ (n-1)AB; (n-1) AA+AB). Here if the unique genotype is AB, the 'B' half quantity is Q b and Q AB = 2Q b .; if AA is unique, then Q AA = Q a -Q b .
Interestingly, mixture matrices grow by accumulating redundant/overlapped combinations and by retaining the same NOGs/SOGs network (Fig 3).
NOGs/SOGs are of no use when no conditional hypothesis on the presence of a POI is formulated. At a matrix with no POI, NOGs/SOGs quantities are similar to all the other genotype quantities in such a way that the final mixture ratio equalizes (2PM = 0.5:0.5; 3PM = 0.33:0.33:0.33; 4PM = 0.25:0.25:0.25:0.25). Only once a POI is available, their unique genotypes align with one another and become discernible from the pool of unknown contributors. For every given POI there arise the following unique genotype alternatives: (POI unique genotypes in red typeface, non-POIs unique genotypes in blue): POI AA: AA+(n-1)BB aut (n-1)AA+BB; but also: AA+(n-1)AB aut (n-1)AA+AB POI AB: AB+ (n-1)AA aut (n-1)AB+AA but also: AB+(n-1)BB aut (n-1)AB + BB POI BB: BB+(n-1)AA aut (n-1)BB+ AA but also: BB+(n-1)AB aut (n-1)BB+ AB The availability of two combinations containing the same unique genotype generates uncertainty in the way to assign unique POI and unique non-POI quantities. However, within our nomenclature system some of these combinations (AB+BB; BB+AB) are unrealistic because they contradict the Q a >Q b rule (there cannot be more 'b' than 'a' DNA molecules). And by a combinatorial exercise one can easily show that as long as matrices are built by the Q a >Q b rule, unrealistic combinations appear in big number only when a false contributor is chosen as POI, whereas choosing a true contributor as POI will mostly intercept AA+AB combinations. There is therefore much sense of collecting unique genotype quantities from only: POI AA: AA+(n-1)AB and (n-1)AA+AB POI AB: AB+(n-1)AA and (n-1)AB+AA POI BB: BB+(n-1)AA and (n-1)BB+AA To intercept the underlying overall MR, a series of POIs and non-POIs unique genotypic amounts have to be collected from just one of these three combinations at each locus (the type of combination depending on the POI genotype), and the three datasets are separately summed together and a mixed ratio (MR ex ) is calculated as follows: Let us assume i to be the number of loci where POI = AA, j the number of loci where POI = AB, l the number of loci where POI = BB. Then the expected POI DNA amount is: The expected amount for the unknowns (Us) is: And the expected mix ratio is: For details on how to implement this procedure on a spreadsheet, the reader should refer to a practical example (ex_MR_calc_115_XEN46_P1P2P3RND.xlsx) included in the S1 File attached to this article.
The expected MR has the following properties: if a true contributor is chosen as a POI, all POI NOG-SOG DNA amounts (collected from all loci) will cluster at /around the same MR; if the POI is a false contributor, all SOG/NOG combinations will return balanced ratios (e.g., 2PM with 0.5: 0.5 MR; 3PM with 0.33:0.33:0.33 MR; 4PM with 0.25:0.25:0.25:0.25 MR). True contributors within a balanced mixture behave as if they were false contributors. If more than one POI is available, the respective SOGs/NOGs should be extracted from the permutations that contain them all. To conclude, MR ex is the expected value for a series of (1χ 2 ) tests to set at every permutation state of each locus of an SNP protocol.

Setting the (1-χ 2 ) statistics.
It is now necessary to distribute the distinctive probabilities of existence to each permutation state of a mixture. This will be accomplished by setting a (1χ 2 ) within every permutation.
The rationale underlying the use of the (1χ 2 ) test is the following: a. Quantities originating from unique genotypes (MR ex ) invariably reflect the original ratio established by the true contributors.
b. Quantities assigned by the autosomal Mendelian rules to all permutation states (we will call them 'MR obs ') will comply with the MR ex only if the corresponding genotypes exactly coincide with those originally contributed by the true mixture founders; in all other cases, the MR values will be sparse.
c. Assuming MR ex as the expected value and all other "Mendelian" MR obs as the individual observed values and setting a (1χ 2 ) test will give each permutation state a distinctive probability based on the DNA apportioning.
As previously stated, there is just one MR ex but numerous observed MR obs . In a previous chapter ("Giving MRs to all permutation states") we have given rules to calculate them. In the S1 File attached to this article we will apply these rules, with relevant algebra, to the case of 2PM and 3PM.

Calculating quantitative likelihoods by three statistical indexes.
Our quantitative scheme of analysis now contains three well-established statistical indices: the (1χ 2 ) MR value, the h index (the minor peak read divided by the major peak read), and the population genotype frequency. The three values reflect several independent properties of the mixtures: the MR pertains to the specific trace, the h index is a Mendelian autosomal property, and the genotype frequency is a population statistic. The three indices repeat themselves with different values at each permutation state; they can multiply at this level and give each state a synthetic probability of existence. All products of all permutation states are summed to give the value of any likelihood of interest.
For example, the probability of a mixed evidence E{a;b} under the prior assumption of two contributors and one POI with genotype AB and the following experimental data: Therefore, the value of the likelihood is 0.47 + 0.03 + 0.18 = 0.687. We report formulae for calculating the simplest likelihoods in the S1 File attached to this paper. These formulae are also implemented within three calculation worksheets available as S1 File.

Testing datasets
To test the performance of our calculation methods we downloaded a collection of evidence data (massive parallel sequencing read data referring to 140 loci forensic SNPs) and reference data (three 'true contributors; or individuals P1,P3,P2 typed for the same 140 SNPs) available at the open-source EuroForMix website http://www.EuroForMix.com/?q=data). These mixtures were prepared in vitro [36] by blending DNA aliquots of P2,P3,P2 so as to reproduce a series of 2PMs (individuals P1 and P2; ratios 1:1, 1:3, 1:9; 3:1; 9:1 and 3PMs (individuals P1, P3, P2; ratios: 1:1:1; 1:1:5; 1:5:1; 1:5:5; 5:1:1; 5:1:5 and 5:5:1). A selection of all available reference data (the XEN13_a and the XEN46_a subsets) was considered for our statistical analysis and encoded according to our alphabetical nomenclature. The original 140 loci in this evidence were reduced to 133 loci by eliminating incomplete typing data records. An array of European gene frequencies available on the EuroForMix website was used. To complete our inferential scheme, we obtained 40 more reference data of false contributors to use as POI, provided graciously by Professor Andreas Tillman (Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden).
Each group of evidentiary data was analysed by adopting our 2-person, 3-person and 4 person likelihood calculation routines, regardless of the fact that the original evidence would originate from a two-person or a three person mix. Our MRs and LRs were compared to those generated by the EuroForMix shareware (version 3.0.3, developer: Dr Ø. Bleka).
It is here appropriate to notice that both the EuroForMix shareware and our Excel worksheets do not allow to compute LRs referring to mixture models based on more than 4 contributors.  show twelve series of mix ratio predictions as they come out from our NOGs/SOGsbased method and from the EuroForMix software. We used the three NITZq worksheets to  calculate the NOGs/SOGs MR, and will refer to the NOGs/SOGs MR as the "NITZq MR." We used EuroForMix version 3.0.3. Evidentiary datasets were XEN46a (a mix made of three contributors P1, P3, P2 mixing at different ratios of 1:1:1, 5:5:1, 5:1:5, 1:5:5, 5:1:1, 1:1:5, 1:5:1; quoted in the order P1, P3, P2) and Xen13a (made of two contributors P1 and P2 blending at ratios 1:1;1:3;1:9; 3:1; 9:1; first quoted quantity is P1). The P1, P3, and P2 were individually used as the POI along with one, two, and three additional unknown (U) contributors (KU; KUU; KUUU; here the synthetic notation K stands for "POI"). The POI is here a "true" contributor-a person whose DNA had been used to assemble the mixture evidence-his genotypes having been downloaded from the EuroForMix website. Only one of the three hypotheses on the number of contributors used to interrogate each evidentiary example was assumed to be "true" based on what was declared on the EuroForMix website. The other predictions were made by assuming an incorrect number of contributors. These data shows that NITZq quantitative predictions on the POI DNA amount are similar to those issued by the EuroForMix software-whereas predictions on unknown contributors operated by the two methods may occasionally diverge. In principle, NITZq always flattens the unknown MR proportions to exactly the same value; EuroForMix occasionally assigns differential values, other times distributes just one value to all unknowns. Predicting quantities under an incorrect assumption about the true number of contributors has a limited effect on the share assigned to the POI-in the range of two to four contributors. But in principle, expanding the contributors' number shrinks the quantitative proportion left to a true POI. Both NITZq and EuroFor-Mix sometimes diverged from the mix ratio originally declared by the authors who built the experimental mixtures [36] presumably due to experimental dilution errors and, occasionally, owing to clerical errors committed by the data producers (as acknowledged by the authors [36] at XEN46 ratio 1:1:5 P1 (and not P2) is in reality the predominant contributor; at XEN46 ratio 5:1:1 P2 (and not P1) is the predominant DNA).

MRs, with false contributors in the role of POI
Another series of MR predictions were produced by replacing the "true" P1, P3, and P2 with false contributors in the role of POI. False contributors were taken one by one among a population of 40 unrelated Swedes (data provided by Prof. Tillmar). The predictions followed the same scheme as before, but for simplicity, each POI was matched to a fewer evidentiary dataset (XEN13, ratios: 1:1, 1:9; XEN46a ratios: 1:1:1, 1:5:5, 1:1:5). With both NITZq and EuroForMix, the quantitative fractions predicted for each of the 40 POIs were so close to one another that they could be conveniently averaged with little standard deviation, as shown in Figs 7 and 8. These figures reflect the different approaches of the two methods: EuroForMix seeks the best fitting quantitative proportions in numerous random simulations, whereas NITZq singles out values tending to 1/n, which is the usual result of a free permutation (with no available POI). We envision that the 1/n value is eventually reached by predicting MRs over a very large number of loci. Increasing the number of contributors has the effect of halving the quantitative fraction assigned by EuroForMix. NITZq moves in a similar direction, but the decrease in the POI quantitative proportion has its limit at 1/n. Matching the same POIs to balanced or unbalanced mixtures has little effect on the predicted proportions.

LR values from true contributors as POI
The Log10(LR) values calculated from the two tutorial datasets are shown in Figs 9-11. Here again, we used our three NITZq worksheets to produce our LR data; thus, we will

PLOS ONE
henceforward refer to our quantitative LRs as to the NITZq LRs. To produce these data, we set NITZq to call the biallelic evidence E{a;b} every time the minor allele read would be greater than or equal to the major read multiplied by 0.05 and to call the monoallelic E{a} below this  Both NITZq and EuroForMix LRs return a comfortably high WoE in tests where the POIs are true contributors and match the evidence. However, NITZq underestimates LRs by one to sometimes ten orders of magnitude (or bans), compared to EuroForMix, every time the POI amounts predominate within the evidence. Although both methods give very high LRs, this circumstance prompted us to investigate further. Therefore, we selected experiments with unbalanced evidence and extracted all individual locus-specific LRs obtained from both computation methods and compared them to the corresponding single-source LR values (1/P (GT); or 1 divided by the genotype frequency of the POI).
To provide a typical example, the unbalanced evidence XEN46 1:1:5 (originally made up by mixing five parts of the contributor P1 DNA with one part P2 and one part P3; then an unbalanced 3PM) was compared to the three true contributors as POI. The locus-specific LRs were calculated under two, three, and four contributors' prior hypotheses (the 3PM being the true hypothesis), then copied into a worksheet. The NITZq and EuroForMix datasets were then separately compared to a series of 1/P(GT) data corresponding to each of the 133 loci, using the relevant pair of gene frequencies as data sources. All results and comparisons about this example were collected in an Excel file (LR_over_1_GT_at_115_xen_46_P1P2P3.xlsx) to be found in the S1 File of this article As a result of this comparison and concerning the content of this documental file, we can state that: As it has been repeatedly asserted that the WoE against a suspect based on a mixture model can never be stronger than the WoE obtained by a single-source DNA profile [27,50], it can be concluded that the EuroForMix WoE predictions in these cases are seriously overestimated.
2. When the POI (P3 and P2 in Xen46, a 1.1.5) enters the mixture evidence with proportionally lower DNA amount than other contributors, the two computational methods return nearly the same WoE, and neither EuroForMix version 3.3 nor NITZq LRs overcome the 1/ P(GT) ceiling.
These two facts are proof-of-principle that EuroForMix may occasionally perform better than NITZq (for example, XEN46a 1:1:5 with POI = P1: EuroForMix πLR = 1.7E + 45; NITZq πLR = 1.2E + 37) only at the cost of violating the fundamental rule of forensic inference: LR mixture < 1/(P(GT)). Because of these findings, we can conclude that NITZq predictions are more reliable and realistic than those issued by EuroForMix, which sets a key point in the favor of our mathematical method.

LR values for false contributors as POI
These datasets were produced using the same analysis scheme detailed in the previous section. An excerpt of the whole data showing the maximum and minimum values of NITZq and EuroForMix LRs, along with the internal percentile distribution, is shown in Figs 12 and 13 (log10(LR) are shown). It is easily noticed that NITZq gives much lower LRs to false contributors, thus ruling out type-two errors (false inclusions) more efficiently than EuroForMix. This is essentially because of three issues: a. NITZq is highly efficient in spotting the homozygous evidence, owing to the dynamic threshold (based on the minor vs. major read ratio) used to call the minor-read allele.
b. NITZq allows for more exclusion cases to emerge, with evidence E{a} either facing a POI = AB or a POI = BB.
c. NITZq deals with these two classes of exclusion using algebra, according to Eqs 6-9; this results in low ((E{a} vs. K = AB) or very low (E{a} vs. K = BB) LR values.

Correlation between MR and LR values, with true contributors as POI
The LR vs. MR relationship was investigated at both the XEN46a and XEN13a evidence by matching P1, P2, and P3 (the "true" contributors) as POI to variously imbalanced mixtures. For simplicity, the LR values were calculated by adopting the true number of contributors (at XEN13a: KU/UU; at XEN46a: KUU/UUU). Then, the Log10(LR quant ) minus the Log10 (LR semicont ) value was paired to the quantitative fraction predicted for each POI. Pairs of values were plotted together, regardless of the difference in genetic profiles originating from the data points.
We found that when the true contributors are chosen as POI, a direct proportional relationship between the POI MR proportion and the Log10(LR) value emerges from the NITZq-generated series of data (Figs 14 and 15). The higher the MR fraction of the POI, the higher the LR value. This relationship holds even when an incorrect number of contributors is used to calculate the LRs.

MR and LR relationship-with false contributors as POI
This relationship was investigated using the same analysis scheme adopted in the previous section. Forty false contributors were chosen one at the time as the POI in the following schemes of analyses: XEN46a ratio 1.1:1, KUU/UUU; XEN13a ratio 1.1, KU/UU. In the false contributors' tests, no coherent relationship was found between MR and the LR (Figs 16 and 17).

Similarities and differences between EuroForMix and NITZq
In short summary: a. EuroForMix adopts a an absolute threshold for calling the minor SNP allele, NITZq is based on a dynamic threshold; because of this difference, false inclusions may occur with EuroForMix but not with NITZq.
b. When a POI is present in small quantity, EuroForMix predicts him a small quantity, NITZq tends to turn the relevant MR into a balanced mixture; c. EuroForMix lets some in-mixture LRs overcome the corresponding single-source likelihoods ratios, NITZq never exceeds the ceiling value of 1/P(GF);

Discussion
In this study, we have introduced a series of original procedures to re-define the way semi-continuous and quantitative LR calculations are performed using biallelic markers. On the semi-continuous side of the issue, we justified the viewpoint that the routine biallelic evidence at a mixture cannot be but biallelic (E{a;b}) and we modelled this predominant example of SNP evidence by matrices of multiple, equally possible genotypic permutations. These matrix schemes can be easily reduced to algebra, and we have introduced a series of notations that make universal use of previously reported formulae [34] in a semi-continuous context. If our general formulae are used, semi-continuous inference at SNPs can be reduced to simple arithmetic, and it does not require a precompiled software. Our formulae addressed all cases of likelihoods without any limitation in the number of contributors and the number of POIs to compute on. We interpreted the monoallelic evidence at an SNP mixture (E{a}) as "residual" evidence-the product of biallelic evidence having undergone dropout events. We have assigned the E{a} locus evidence only when the h index at a given locus falls below a predetermined dynamic threshold, reflecting extreme imbalance among the minor and major reads of an individual NGS record. We noticed that only a few cases of monoallelic evidence E {a} emerging at an SNP protocol can modify the inferential value of an identity test to a large extent, by either considerably increasing the LR (case: E{a} vs. POI AA ) or considerably lowering it (cases: E{a} vs. POI BB ; E{a} vs POI AB ). To calculate the relevant LRs in these "residual but important" cases, algebraic solutions covering all possible instances of E{a}/POI genotype assortment were considered. Within these formulae, the "h" index-or the minor vs major read ratio of a typical NGS experiment-played the role of an acceptable proxy of the dropout probability, and no experimental P(DO) value was needed. The entire procedure summarized here is, in our view, an acceptable equivalent of the split-drop approach.
As far as the quantitative calculation is concerned, we have introduced a novel, empirical way to capture the ratio at which any given POI of interest mixes with unknown contributors. We have identified all the permutations containing unique POI genotypes (NOGs; SOGs), collected the unique NOGs/SOGs amounts, and calculated an expected "POI-vs-unknowns" mixing ratio (MR ex ). Then, we compared this unique MR to all the MRs observed at all permutation states of the entire experimental protocol. The degree of compliance between MR ex and each of the observed MR values was measured as a (1χ 2 ) value, and it returned the probability of any permutations state to exist from a quantitative viewpoint. Compared to the other methods [23][24][25][26], our method for calculating mixture MR is surprisingly simple. Its implementation into a spreadsheet returns quantitative ratios that, in most cases, approximate well the corresponding MCMC and MLE (maximum likelihood) estimates.
To calculate the quantitative LRs, we integrated the (1χ 2 ) values, h values (the index evaluating the heterozygous imbalance), and the Mendelian genotype probabilities. These three probabilities being independent were multiplied together to yield the value of a likelihood, according to a unique procedure of our conception (in fact, a fully continuous methodology of LR calculation). We implemented our procedure within a series of freely available spreadsheets (NITZq). By borrowing the nickname of our worksheets, we would therefore like to refer to our quantitative procedure as the "NITZq procedure." We tested NITZq on a 133 SNP loci protocol (a subset of a previous SNP protocol [52]), by using the "dataset 4: SNP data" package available at the EuroForMix website as evidence and by assuming the relevant EuroForMix calculations as the benchmark for comparison. We have shown that when true contributors are tested as the POI, NITZq computations provide LR values of comparable magnitude to those returned by EuroForMix. However, when the POI is quantitatively over-represented, EuroForMix LRs may overcome NITZq by a significant (one to ten) order of magnitude. We have evidence that EuroForMix overperforms NITZq by infringing the logical rule of not letting the LR in mixture overcome the single-source WoE of 1/P(GF). This is a proof of principle, demonstrating that our method is more reliable and realistic than that underlying the EuroForMix package.
The type-one errors (mediocre or negative LogLRs) occasionally occur with NITZ, essentially when the quantitative share of the true contributor tends to be too low and then disappears from the mixture. When false contributors are chosen as POI, a much lower LR is gathered by NITZq than by EuroForMix. Type-two errors (false inclusions) occasionally occur with EuroForMix, whereas they do not occur with NITZq. We believe that this is because the NITZq detects incompatible events much more efficiently than EuroForMix.
When a true contributor is a POI, a linear relationship exists between its quantitative share in the mixture and the log 10 (LR) value, as calculated by NITZ. When the POI is a false contributor, this relationship becomes incoherent and graphically sparse. Betting on the wrong number of contributors does not necessarily result in lower LRs.