Risk assessment for airborne disease transmission by poly-pathogen aerosols

In the case of airborne diseases, pathogen copies are transmitted by droplets of respiratory tract fluid that are exhaled by the infectious that stay suspended in the air for some time and, after partial or full drying, inhaled as aerosols by the susceptible. The risk of infection in indoor environments is typically modelled using the Wells-Riley model or a Wells-Riley-like formulation, usually assuming the pathogen dose follows a Poisson distribution (mono-pathogen assumption). Aerosols that hold more than one pathogen copy, i.e. poly-pathogen aerosols, break this assumption even if the aerosol dose itself follows a Poisson distribution. For the largest aerosols where the number of pathogen in each aerosol can sometimes be several hundred or several thousand, the effect is non-negligible, especially in diseases where the risk of infection per pathogen is high. Here we report on a generalization of the Wells-Riley model and dose-response models for poly-pathogen aerosols by separately modeling each number of pathogen copies per aerosol, while the aerosol dose itself follows a Poisson distribution. This results in a model for computational risk assessment suitable for mono-/poly-pathogen aerosols. We show that the mono-pathogen assumption significantly overestimates the risk of infection for high pathogen concentrations in the respiratory tract fluid. The model also includes the aerosol removal due to filtering by the individuals which becomes significant for poorly ventilated environments with a high density of individuals, and systematically includes the effects of facemasks in the infectious aerosol source and sink terms and dose calculations.

If α, β k , γ, and w are all constant with respect to time; the model has both an explicit and recursive solution for the concentration density and dose. Otherwise, it may not be possible to get a closed form analytical solution from the general solution and one would need to solve the model numerically. Of course, even with an analytical solution, one can solve it numerically. There are a number of numerical pitfalls with both the analytical and numerical solutions, which arise as M c becomes large.
For a numerical solution, the time step of integration must be small compared to the smallest time scale in the model. The smallest time scale could be the time scale on which α, β k , γ, and/or w change over time; but for large enough M c it will always be time scale of the total sink for k = M c aerosols. The total sink for n k is −α(d 0 , t) − kγ(t), which scales linearly in M c for n Mc when M c is large enough. The time step δt must be δt < (α + M c γ) −1 . Thus the total number of timesteps N t scales as N t ∼ M c for large M c . Since the total number of terms scales linearly in M c as long as ∂n k /∂t is not calculated in vector-matrix form or A is stored in a sparse format, the total computational effort scales as O M 2 c . Also note, the model is stiff when M c γ(t) α(d 0 , t) + γ(t) since the total sink timescale for k = M c is much shorter than the total sink timescale for k = 1. The analytical solutions have additional difficulties -avoiding numerical overflow and maintaining accuracy. The problem is the binomial coefficients i k and i−k p where i → M c and p ∈ [0, i − k]. They can be calculated naively by computing each factorial and then doing the multiplication and division (k! will be the largest); or carefully with cancellation handled explicitly or using logarithms of the gamma function, which overflows later. And then even if all the binomial coefficients don't overflow, their products and sums can still overflow (the explicit version is worse than the recursive version in this regard due to the products of binomial coefficients).
To see this, we will find the upper bound for V k ( y, x) in the recursive solution. Consider its formula In the model, all the elements of y, which is always either β or n 0 , are non-negative. Since x ∈ [0, 1], 1 − x ∈ [0, 1] and is therefore also non-negative. This means that all elements in the sum in V k are non-negative, meaning V k ≥ 0 and thus we only need to find the upper bound to determine the risk of overflow (the lower bound, zero, is not a worry). Now, y i ≤ max ( y) for all i where max ( y) shall denote the maximum element of y. Since x ∈ [0, 1], 0 ≤ (1 − x) i−k ≤ 1. Then we just need to get an upper bound for the binomial coefficients. The binomial coefficient j m = k!/m!(j − m)! is at its greatest value for fixed j when m = j/2 or m = j/2 where · and · denote the floor and ceil operators respectively. This value is This will be maximized when i = j = M c . Then, the upper bound for V k is For any M c , the largest upper bound for any of the V k is found for k = 1, so the upper bound we need to worry about is Now, the factorial function has the bounds [1] We will still have an upper bound for V k if we use the factorial lower bounds in place of the factorials in the denominator of Eq (4) and the factorial upper bound for the factorial in the numerator. This leads to If we take the log 2 , we get the value of the base-2 exponent. Taking the log 2 of both sides, where we have used the fact that M c /2 + M c /2 = M c . To represent all the V k for a particular M c without overflowing, a floating point format's maximum supported base-2 exponent, emax (using the same notation as the IEEE 754 standard [2]), must be at least this value (emax ≥ log 2 (V k )). But, even if the value log 2 (V k ) might not overflow with this minimum value, calculating the binomial coefficient when max ( y) < 1 could overflow before the multiplication drops the magnitude. So for the minimum emax, we must replace the max ( y) with max [1, max ( y)] and we get To see how much of an overestimate this is for emax, let's compare it to log 2 max (V k ) for a few simple cases. First, lets compare it to V k 1, 0 since y = 1 has all elements equal to the maximum element, x = 0 maximizes (1 − x) i−k , and all terms in the sum are integers. In Fig 1 (left panel), log 2 max (V k ) is compared to emax from Eq (8) and the emax values for the four smallest IEEE-754-219 binary floating point formats [2]. We calculated max (V k ) using variable sized integers before being converting to multi-precision floating point numbers for taking the log 2 with the largest supported exponent with the GNU Multiple Precision Floating-point Reliable Library (MPFR, see https://www.mpfr.org) and the GNU Multiple Precision Arithmetic Library (GMP, see https://gmplib.org) in Python using the gmpy2 package (https://pypi.org/project/gmpy2). The minimum emax from Eq (8) is only barely larger in a logarithmic sense than log 2 max (V k ) except for small M c , so it isn't an excessive overestimate of the required emax.
Second, we will compare it to typical y used in the model. We will base y on β I,k for single infectious person but remove all of the environment parameters and person specific parameters except for the expected average multiplicity k j = π 6 d 3 0 ρ p,j . We shall use the vector ψ k = V ρ j λ I,j S I,m,out,j β I,k = P P k j , k .
For a range of k j , M c was determined with the single infectious person production heuristic M c,I,j for thresholds T of 10 −1 and 10 −9 . Then the V k ψ, 0 were calculated in binary floating point with a 256 bit mantissa, emax = 32768, and minimum exponent emin = −32767 with gmpy2, MPFR, and GMP as before. The binomial coefficients were calculated exactly before conversion to floating point. To improve the accuracy, the sum in Eq (1) was done as a sorting sum where all the terms were sorted in a list, the two smallest terms removed, those terms added together and inserted back into list; and this process repeated till only a single term (the total sum) remained. The right panel of  sense for k j < 10, but not by much in a logarithmic sense for larger k j . For small k j , it suggests an overkill emax but binary16 (half precision) and binary32 (single precision) are typically the smallest floating point formats with hardware support on most computers and the bulk of the computational effort is spent on diameter bins with the largest M c , so there isn't much reason to use a smaller format for small k j .
binary64 (double precision) is sufficient for ψ in this case up to k j = 1000. The exact values of other terms in β I would determine if binary64 would be safe for that expected multiplicity with β I . The extra emax requirement would additively increase by log 2 max 1, 1 V ρ j λ I,j S I,m,out,j . binary128 and x87 FPU 80-bit floating point numbers which have the same emax [3] would have some headroom for the magnitude of the other coefficients in β I even for k j = 10 4 .
To investigate the required floating point precision to calculate n k , we will look at the required precision in bits required to calculate all U k (d 0 , y, x) for a given M c to within a specified relative tolerance δ of the exact values. We chose which makes U k (d 0 , y, x) a rational number for any M c . For several M c , the U k (d 0 , y, x) were calculated exactly using variable sized rational arithmetic. Then, the smallest floating point precision was found for which the maximum relative error in the U k (d 0 , y, x) calculated in floating point in that precision (note, the binomial coefficents were calculated first as variable sized integers and then converted to floating point) is less than δ using the explicit and recursive formulas for U k (d 0 , y, x). The calculations were done using gmpy2, MPFR, and GMP as before. The required precisions for δ of 10%, 10 PPM (Parts per Million), and 1 PPB (Parts per Billion) are shown in Fig 2 and compared to the precisions of various standard floating point formats.
Using the explicit formula, even quadruple precision (binary128) is insufficient by M c = 80 for this y even for a tolerance of 10%. But using the recursive formula, double precision (binary64) is good enough for a tolerance of 1 PPB even for the largest M c = 5000 that was checked. From this and the recursive solution's number of terms scaling as O M 2 c instead of O M 3 c , the recursive solution is much more amenable to calculations than the explicit solution.
Considering both Fig 1 and 2, double precision (binary64) seems to be suitable, depending on the largest values in the y, for moderate M c of a few hundred. To go to M c of a few thousand, quadruple precision (binary128) or x87 FPU 80-bit floating point numbers are required. Above this, either multi-precision floating point with higher exponents must be used or the model must be solved numerically.
The number of terms to compute scales quadratically in M c for both the numerical solution and recursive analytical solution, and cubically for the explicit analytical solution; and the analytical solutions have the additional problems of avoiding numerical overflow as well as the computational effort to calculate some terms increasing with M c . For any particular fixed size number format, there is an M c above which the analytical result will overflow and one must solve the model numerically instead. Required Precision (bits) Explicit =10 1 , x=0 =10 1 , x=5/13 =10 1 , x=1 =10 5 , x=0 =10 5 , x=5/13 =10 5 , x=1 =10 9 , x=0 =10 9 , x=5/13 =10 9 , x=1 binary16 binary32 binary64 x87 FPU 80-bit binary128  [2] and the x87 FPU 80-bit floating point format [3]. Both panels share the same legend, which is in the left panel.