Practical sampling of constraint-based models: Optimized thinning boosts CHRR performance

Thinning is a sub-sampling technique to reduce the memory footprint of Markov chain Monte Carlo. Despite being commonly used, thinning is rarely considered efficient. For sampling constraint-based models, a highly relevant use-case in systems biology, we here demonstrate that thinning boosts computational and, thereby, sampling efficiencies of the widely used Coordinate Hit-and-Run with Rounding (CHRR) algorithm. By benchmarking CHRR with thinning with simplices and genome-scale metabolic networks of up to thousands of dimensions, we find a substantial increase in computational efficiency compared to unthinned CHRR, in our examples by orders of magnitude, as measured by the effective sample size per time (ESS/t), with performance gains growing with polytope (effective network) dimension. Using a set of benchmark models we derive a ready-to-apply guideline for tuning thinning to efficient and effective use of compute resources without requiring additional coding effort. Our guideline is validated using three (out-of-sample) large-scale networks and we show that it allows sampling convex polytopes uniformly to convergence in a fraction of time, thereby unlocking the rigorous investigation of hitherto intractable models. The derivation of our guideline is explained in detail, allowing future researchers to update it as needed as new model classes and more training data becomes available. CHRR with deliberate utilization of thinning thereby paves the way to keep pace with progressing model sizes derived with the constraint-based reconstruction and analysis (COBRA) tool set. Sampling and evaluation pipelines are available at https://jugit.fz-juelich.de/IBG-1/ModSim/fluxomics/chrrt.


Introduction
Constraint-based modelling of metabolism provides a versatile mathematical framework for integrating genomic and biochemical knowledge with multi-omics data and to interrogate network models [1][2][3]. As such, the constraint-based reconstruction and analysis (COBRA) methodology has established an indispensable tool set with numerous applications in metabolic engineering, biotechnology, microbiology, and systems as well as synthetic biology [4][5][6][7]. Nowadays, genome-scale models (GEMs) are routinely reconstructed from genomic data using automatized workflows [8]. Powerful reconstruction pipelines such as CaveMe [9], AGORA2 [10], or metaGEM [11] generate high-quality models with increasing sizes and at accelerated speed [12]. For instance, recently these tools enabled to generate more than 7,300 context-specific yeast GEMs [10]. Multi-organismal GEM repositories are publicly available, such as the BiGG database [13,14] that contains 108 models, with the largest metabolic model, Recon3D [15], accounting for more than ten thousand reactions. Nowadays, modeling initiatives are starting to target the description of microbial communities, pan-and meta-genomes, and even the whole human body [16][17][18], therewith escalating model sizes further.
GEMs are particularly useful to predict phenotypes, to suggest genetic interventions, e.g., by applying bi-level strain design approaches [19], and to foresee the outcome of experiments [20]. Besides comprehensiveness, the capabilities of GEMs to deliver useful insights is intimately tied to the knowledge of the unknown model parameters, i.e., the intra-and extracellular metabolic reaction rates (fluxes). Here, in particular, enzyme constraints have been shown to effectively improve the models' prediction performances [21]. The stoichiometry-induced mass balances at steady-state and further constraints introduce linear dependencies among the fluxes, implying that all feasible flux configurations are located within a convex hyper-polytope [22,23], with a dimension dictated by the nullspace of the stoichiometric matrix. Interestingly, such convex polytopes as solution spaces arise in many applied modeling contexts, including operations research [24], ecological modeling [25], computational finance [26], astronomy [27], physics [28], and everywhere where network-flow problems are to be solved.
The expansion in GEM sizes has raised the need for developing scalable tools to analyze the high-dimensional flux solution spaces. In this context, uniform convex polytope sampling (UCPS), i.e., drawing representative random numbers from a truncated uniform distribution defined over the bounded flux polytope, is routinely used for characterizing the solution spaces of metabolic [29,30] and gene networks [31], identification of metabolic flux couplings [32], design of experiments [20,33], and assessing the effects of uncertainties in biochemical network formulations [34,35]. A simple UCPS strategy is rejection sampling, a Monte Carlo technique that iteratively generates samples within easily accessible objects such as polytopeenclosing (hyper)cubes, where it keeps only those draws that are located interior to the polytope, until the truncated uniform target distribution is approximately covered [36]. While initial efforts in the field of constraint-based modeling used this technique [37], rejection sampling was quickly recognized to be computationally infeasible for more comprehensive network models, because for every feasible sample the number of rejections grows super-exponentially with polytope (effective model) dimension.
The class of Hit-and-Run (HR) algorithms has been developed to overcome this bottleneck [38]. HR is a Markov chain Monte Carlo (MCMC) technique, which, starting from an initial flux configuration within the polytope, generates a sequence of flux states (a Markov chain) by drawing randomly from inner chords directed in random directions [39]. As a result, the generated states (or a fraction of them) are called MCMC samples. These samples, albeit being correlated by construction, approximate the target distribution asymptotically [40].
How quickly the target distribution is approximated well in terms of wall-clock time, the so-called sampling efficiency, is critical for practical applicability of HR. Sampling efficiency is determined by two components: (1) the statistical efficiency, which specifies how well a set of samples represents a target distribution, and (2) the computational efficiency of the HR sampling algorithm, i.e., how many computational operations are required per generated state. Formally, the statistical efficiency of N samples produced by a Markov chain is given by the effective sample size (ESS) [36]: with ρ t being the autocorrelation of the sequence at lag t. The sampling efficiency of the MCMC implementation is then measured by the ESS normalized to wall clock time t, i.e., ESS/t. According to Eq (1), two strategies to raise MCMC sampling efficiencies are (1) reducing the computational cost per sample, and (2) decreasing the autocorrelation between subsequent states. For HR, the cost per sample is reduced by updating the states in a pure coordinate-wise fashion instead of drawing random directions, a technique also known as Gibbs sampling [41]. To decrease the autocorrelation, the average distance between subsequent states is to be increased. In UCPS, the achievable distance is limited by the geometry of the polytope. Intuitively, in isotropic geometries on average longer steps are possible than in anisotropic spaces. Since heterogeneous parameter scales in GEMs are the rule, an affine transformation is employed to homogenize the flux scales prior to sampling, the so-called rounding transformation [22]. After sampling, the states have to be mapped back to the original space by inverting the rounding transformation. Together, the two strategies culminated in the celebrated Coordinate Hit-and-Run with Rounding (CHRR) algorithm [42], the tried-and-tested workhorse for UCPS in the COBRA domain [4,43,44], with several implementations being available [42,[45][46][47].
CHRR is commonly applied together with thinning, a MCMC post-processing technique, to reduce the memory footprint of the sampling results. The simplest and most widely used form of thinning is fixed frequency thinning with thinning constant τ, which means that only every τ th sample is kept. While thinning indeed reduces the autocorrelation of the samples generated by CHRR, the reduction comes at the cost of producing an approximation of the target distribution that is less accurate. Indeed, thinning is known to be statistically inefficient in all but very few cases [48,49].
Recognizing the "need for speed" to sample large-scale GEMs, in this work, we study the impact of thinning for CHRR systematically from the perspective of practical sampling efficiency, based on principled statistical criteria (cf. Fig 1). By analyzing simplices and GEMs with widely varying dimensionality, we demonstrate that CHRR is one of the rare cases for which thinning boosts sampling efficiencies, measured in terms of ESS/t, and prove current thinning practices computationally wasteful. From our benchmarks, we derive simple, yet effective and statistically rigorous guidelines to effectively and efficiently tune CHRR, which we verify using large-scale GEMs such as Recon3D. Our findings thus guide practical sampling efforts of COBRA models towards achieving more effective samples with less compute resources, thereby updating previous wisdom [42].

Methods and models
We start with a brief background on UCPS, describe the pre-and post-processing of flux polytopes relevant for the CHRR sampling procedure, list test and validation problems, and give details about MCMC diagnostic checks along with implementation details.

Convex GEM flux polytopes and their pre-and post-processing
GEMs are stoichiometric network models that are compiled from genomic information and biochemical knowledge [50]. By mass balancing at steady-state, linear inequality systems are derived from these models for the D unknown metabolic reaction rates (fluxes) n 2 R D : where A 0 eq 2 R n 0 eq �D is the (extended) stoichiometric matrix, and b 0 eq 2 R n 0 eq contains the time derivatives of the (intra-and extracellular) metabolite concentrations. As a convention, we use the symbol "�" for the matrix-vector product. In addition, the fluxes ν are subject to linear inequalities originating from physiologically motivated lower and upper limits on their values: with A 0 in 2 R n 0 in �D the constraint matrix and b 0 in 2 R n 0 in contains the flux bounds. The convex polytope (Eqs 2 and 3) is constructed such that it is always bounded in all flux directions, otherwise the uniform sampling problem is ill-defined. Since A 0 in typically contains redundant constraints that cause numerical instabilities in the sampling workflow, in a first pre-processing step, the redundant inequalities are eliminated from Eq (3), giving with A in 2 R n in �D the constraint matrix and b in 2 R n in Then, inequalities that are effectively equalities (fluxes bound to ranges equal or smaller than 1e-7) are identified and reformulated as equalities, which creates the modified stoichiometric system with A eq 2 R n eq �D and b eq 2 R n eq . Together, Eqs 4 and 5 define the D-dimensional effective convex-bounded flux polytope of the given GEM: We note that by the pre-processing P D is slightly smaller than its original counterpart defined by Eqs (2) and (3). However, because the threshold, below which inequalities are interpreted as equalities, is selected to be orders of magnitude smaller than the achievable flux precision, the impact of the shrinkage on the flux values is in fact marginal. Instead, the beneficial effect on the numerical stability of the sampling algorithm outweighs the risk of losing relevant flux information [44]. Next, the equalities (5) are exploited to reduce the dimension of the sampling problem, without changing the polytope volume, by expressing the polytope P D in terms of independent, still interpretable flux coordinates n indep 2 R d ; d � D [51]. Therefore, we determine a basis K 2 R D�d of the null space of the modified stoichiometric system (5), subject to the inequality constraints (4): where ν 0 is a particular solution within the flux polytope P D . A typical choice for ν 0 is the Chebyshev center of P D . By going from Eqs (4) and (5) to Eq (7), all equalities are eliminated from P D in Eq (6), resulting in the dimension-reduced flux polytope P d : with ν 0 and n in > d in all but toy networks. Hitherto, we call d the effective model dimension. As an example, Recon3D, as downloaded from BiGG, has 10, 600 bounded reactions. After dimension-reduction and flux coordinate transformation, n in = 11, 195 inequality constraints and d = 4, 861 effective dimensions remain. Flux polytopes are known to have anisotropic shapes, a characteristic that slows down the mixing of MCMC sampling schemes (see [22] for details). To curb for this, dimension-preserving so-called rounding transformations have been proposed that transform the polytope P d into a flux polytope P round d having a more isotropic shape [22]. For more technical discussions of the isotropy of convex polytopes, we refer to [39,52,53]. In this work, using the approach in [44], we determine the linear rounding transformation R 2 R d�d by computing the Maximum Volume Ellipsoid (MVE) inscribed in P d [54]. Specifically, the inverse mapping R −1 is determined such that it maps the MVE to the unit sphere. The rounded polytope P round d is then given by: with Finally, any d-dimensional independent flux vector ν round in the rounded space has to be mapped back to a flux vector ν in the original D-dimensional flux space to be biologically interpretable, using the affine back-transformation T : ν round 7 ! ν given by:

Coordinate Hit-and Run with Thinning
Thinning is an integral part of many MCMC algorithms. We hitherto denote our implementation of CHRR [42] that makes deliberate use of thinning by CHRRT. Algorithm 1 gives a pseudocode description of the CHRRT algorithm. With a UCPS task at hand, defined by Eqs (4) and (5), first the rounded polytope P round d is determined (L 1). Starting from an initial flux state, samples in P round d are iteratively generated by constructing chords along the coordinates (L 6-10). Fixed frequency thinning is applied with thinning constant τ (L 11). Lastly, every sample that is not discarded due to thinning is transformed back to the full dimensional flux space P D (L 12).
Algorithm 1 Coordinate Hit-and-Run with Rounding and Thinning (CHRRT)

Input:
Equality constraints: A eq � ν = b eq Inequality constraints: Concerning the computational costs of CHRRT, determining an internal point ν 0 and computing the rounded polytope P round d are one-time costs, both being independent of the numbers of samples. We therefore do not further consider these costs in this work. The most expensive operation per sample is the back-transformation T that maps the generated sample to the original polytope (Algorithm 1, L 12). The back-transformation amounts to a dense matrix-vector multiplication, with worst case costs of Oðn in � dÞ per sample. All other operations are element-wise additions, multiplications, or divisions with per-sample costs of OðdÞ.

Test problems
We consider two types of test problems, simplices representing well-defined and easy to scale problems, and GEMs being our main concern. A d-dimensional simplex S d is defined by: Being easily scalable to different dimensions, simplices are ideal to study the dependency of sampling complexity and polytope dimension. Different to GEMs, simplices are constructed by non-redundant constraints and, therefore, do not need to be dimension-reduced. However, as for GEMs, polytope rounding improves the geometric isotropy of simplices. Thus, Eq (10) remains valid, with the matrix K set to identity. In this work, simplices in 64, 256, 1,024, and 2,048 dimensions were benchmarked. Besides simplices, we selected eleven GEMs of varying sizes for this study, eight for training our thinning guideline, and three (Yeast8, ecYeast8, Recon3D) for validation purposes. Effective polytope dimensions range from 24 to 4,861. In contrast to simplices, for GEMs there is no consistent relationship between the number of reactions, the number of constraints, and the effective polytope dimension d. We refer to Table A in S1 Appendix for more details on the test problems.

MCMC convergence diagnostics
Assessing whether a (sub)sequence of MCMC samples has converged to the target distribution is not only vitally important in practical inference, but also critical for reliable benchmarking sampling performances. The ESS and rank-normalizedR values are computed for each of the D fluxes separately and set to the minimum/maximum value of these, i.e., considering the worst-case. We compute the ESS in the original flux space, because this is the space, for which we are interested in statistics about estimates, such as the mean flux. Particular care must be taken here, since the estimated ESS is typically noisy and may therefore be unreliable for sample sets that are far from being converged. We here follow the advice given by [55] to use four independent chains and compute the rank-normalizedR diagnostic along with the ESS to test for convergence. Vehtari et al. [55] argue that the estimation ofR and ESS requires reliable estimates of the variances and autocorrelations of the samples, that is, the estimation ofR and ESS is only reliable, if the ESS of a set of samples is sufficiently large. Therefore, in our study, the threshold for convergence is set toR < 1:01 and ESS > 400, which is suitable to a setup with four independent Markov chains. The limit of 400 for the ESS is derived by considering that for a single chain, each of the parts of the split chain (50/50 split) is required to contain ESS > 50, which amounts to a total of at least 400 when considering multiple, independent chains [55]. Exemplary ESS and rank normalizedR values are shown in Figs B and C in S1 Appendix.

Implementation details
GEMs were downloaded (Table A in S1 Appendix for details) in SBML (Systems Biology Markup Language) format [56], and plugged into PolyRound v0.1.8, the state-of-the-art python package for polytope rounding [44]. The highly optimized polytope sampling library HOPS v3.0.0 was used for MCMC sampling [46]. CHRRT, as implemented in HOPS, was used for all benchmark runs to ensure their comparability. In each sampling run, four parallel chains were used according to recommendations by Vehtari et al. [55]. Samples, the rounded polytopes, and the accompanying transformations were stored. For each flux, trace plots were visually examined, the maximum rank-normalizedR metric, and the minimal ESS over all fluxes were computed to check that MCMC convergence criteria are met as described in Sec. 2.4. Calculations are performed using arviz v0.12.1 [57]. All numerical experiments were run on an AMD Ryzen 9 5950X 16-Core Processor.

Results and discussion
After motivating the idea of using thinning in CHRR to reallocate computational resources, we quantify achievable sampling efficiencies for the test problems. From that, we derive simple and useful guidelines for optimized thinning choice, which we compare to previous advice using three validation problems.

The role of thinning for CHRR
In constraint-based metabolic modeling, some studies using CHRR report thinning constants of 100, 1000, and 10,000 [4,43]. However, these works neither do provide guidance for a suitable selection of τ for a sampling problem at hand, nor do they discuss the implications of the different choices of τ. On the other hand, Haraldsdóttir et al [42] pointed out that τ should be selected depending on the problem dimension. Specifically, the authors suggest setting τ, as a rule of thumb, to eight times the squared polytope dimension, stating that this choice ensures statistical independence of the thinned samples. However, in this work no derivation or their rule or further underpinning of their argument is given.
Spurred by the seminal theoretic work of Geyer [48], we set out to investigate whether the sampling efficiency of CHRR benefits from thinning. The question is whether, for a fixed computational budget, CHRR with τ > 1 convergences faster than unthinned CHRR with τ = 1. To this end, we consider the ESS of a set of N samples in relation to the time cost of producing the samples, t N,τ . The time cost t N,τ is given by the sum of the time costs t update for producing N � τ states n round Geyer argued that for thinning to be beneficial, the time cost t N,τ has to grow slower with τ than the ESS, indicating that thinning is not advantageous for every combination of sampling problem and MCMC algorithm [48]. In this vein, Link et al. criticize the routine application of thinning for applications in ecology, where thinning is often detrimental to sampling efficiency [49].
To investigate the role of thinning for UCPS using our CHRRT implementation, we measured t update and t transform for a set of synthetic and real test problems (Table A in S1 Appendix). For both time costs, we took the wall-clock times and averaged them over the number of times each operation was applied. The resulting time ratios t transform /t update are shown in Fig 2. Among all test problems, the time ratios for only the two models with the fewest dimensions, i.e., the 64-dimensional simplex and the 24 dimensional E. coli core model, are below ten. For the remaining investigated models, t transform is up to three orders of magnitude larger than t update . Precisely, for the simplices, the time ratio increases quadratically with dimension. We also observed a stark increase in the ratio for GEMs. Different to simplices, however, with GEMs additional factors contribute to the time ratios, such as the number of fluxes D (more fluxes increase t transform ) and the number of constraints n in (more constraints increase t update ), resulting in a less succinct relation between effective model dimensions and time ratios.
Across all test problems, t transform is generally considerably larger than t update , with a time ratio of at least four for the smallest tested models, but which starkly increases with the dimension of the sampling problem. Given such large time ratios, it is plausible to expect thinning to be beneficial for the sampling efficiency of CHRR in the context of UCPS. However, only by considering the relation between the time ratios and the autocorrelation mediated by the ESS, which is specific to the combination of model characteristic and CHRR, we are able to find out if thinning actually is practically beneficial. In conclusion, low autocorrelation should not be the only deciding factor when suggesting rules of thumb for thinning constants.

Benchmarking CHRR from the perspective of thinning
To study how the performance of CHRR depends on thinning for the two classes of test problems, we benchmarked the sampling efficiency, in terms of ESS/t, for a wide variety of thinning constants. To find thinning constants that yield high ESS/t values, pre-runs with τ = d were performed for each test problem. From the pre-runs, we estimated appropriate ranges of problem-specific thinning constants and selected a set of at least five τ's distributed over that range.  Table A  For simplices and GEMs, Fig 3 shows the obtained ESS/t, relative to the unthinned CHRR baseline. The plots fan out in a series of curves for the test problems with different model dimensions. As a rule, for higher dimensions, higher relative ESS/t values are achieved. All curves share the characteristic that with increasing τ, the relative ESS/t first increases before dropping again. The peak indicates the optimal thinning constant choice, specific to the investigated test problem. While for simplices pronounced peaks emerge, the curves for GEMs show plateaus, indicating that a range of thinning constants perform equally well. This apparent insensitivity in ESS/t performance for GEMs is convenient, since it implies that hyperparameter tuning of τ does not need to be precise for achieving near optimal ESS/t values, if its value is set to a suitable order of magnitude. Clearly, thinning constants at the lower end of the plateau are to be preferred due to a better resource usage (see below).
To study how the optimal (in terms of sampling efficiency) thinning constant relates to the problem dimension, we then quantified the speed-up of CHRRT for the apparently best thinning constant, hitherto denotedt, among the set of tested constants. Results are shown in Fig  4A. For simplices, relative speed-ups grow roughly to the square of the polytope dimension. The relative speed-ups achieved for GEMs also grow with effective model dimension, with values ranging from 6 for the E. coli core model up to 770 for Recon1. For the latter, CHHRT witĥ t required only 0.32 s to converge and reach an ESS of 4,954, the unthinned variant required almost 80 times longer (25 s) to converge, while it reached an ESS of only 507. Notably, the speed-ups observed for GEMs surpass those observed for simplices of similar dimensions. Hence, our benchmarks show that the beneficial effect of thinning, when appropriately tuned, is not only plausible in theory, but it is also of high practical relevance for solving high-dimensional UCPS problems.
Since thinning drastically affects performances, our findings imply that the consequences of selecting τ values need to be taken into consideration when benchmarking novel aspiring UCPS algorithms against CHRR, as for example the truncated log-concave sampling with reflective Hamiltonian Monte Carlo [53] or the Riemannian Hamiltonian Monte Carlo in a Constrained Space [58].

Simple thinning guidelines for simplices and GEMs
We use the previously benchmarked GEMs and simplices as training data for deriving practical guidelines on how to configure thinning for CHRR for optimized performance. For ddimensional simplices, the linear relation explains the measured data perfectly. For GEMs, on the other hand, a quadratic relation of the form 0.164 � d 2 matches the data well. Considering Fig 3B, indicating that optimal sampling performance is not overly sensitive to thinning constant choice, as a memorable rule of thumb for efficient GEM sampling, we propose:t which only marginally deviates (less than 1.5%) from the fitted line. Compared to the advice previously given by Haraldsdóttir et al.
[42], i.e., 8 � d 2 , our GEM guideline advocates thinning constants that are a factor of 48 smaller. To put the difference between the former and our updated guideline into perspective, a 48-fold increase in τ means a 48-fold increase in CHRR update steps, of which the vast majority is discarded. As we have argued before, discarding samples can lead to an improvement of sampling efficiency, but only if the loss of information due to dropping samples is over-compensated by information from additional CHRR update steps. In addition, in Fig 3 we show that selecting the thinning constant too large risks missing the performance peak, which then wastes computational resources. In particular, we observe that a 48-fold increase in τ would lead to lower sampling efficiencies for eight out of the twelve test problems, because the performance peak would be clearly missed. For the remaining four test-problems, i.e., GEMs of higher dimensionality, we did not attempt to characterize the plateau around the best thinning constants, since this would be of little practical relevance. It is generally preferable to set the thinning constant to the minimal value on the plateau, because then less work is discarded, and more samples are stored than for larger thinning constants, while achieving the same ESS/t. Nonetheless, it is imaginable that the performance plateaus for the four largest GEMs stretch across an even wider range of thinning constants, meaning that the difference in ESS/t resulting from the previous rule of Haraldsdóttir et al. [42] and our guideline may turn out to be smaller.
While more work could be invested into fine-tuningt for the problem at hand, this risks investing more computational work than is eventually gained, because the estimation of ESS is noisy for short CHRR runs. In summary, we showed that a previous guideline for GEM sampling is suboptimal from the perspective of sampling efficiency. Instead, we promote a new GEM guideline, given by Eq (14), which empirically maximizes sampling performance without unnecessarily discarding computational work (green computing).

Result verification
To verify our proposed guideline for GEM sampling using CHRR, we applied it to three large, out-of-sample GEMs: Yeast8 (1,108 dimensions), ecYeast8 (3,4109 dimensions), as well as Recon3D (4,861 dimensions), the largest GEM currently available in the BiGG database [13]. All validation models were larger than the largest training model (Recon1), cf. Table A in S1 Appendix).
Given the large validation models, it is computationally infeasible to benchmark inefficient thinning constants (especially unthinned CHRR). Therefore, we need alternative ways to test, whether our guideline leads to optimized sampling. For the yeast models, Yeast8 and ecYeast8, we benchmarked our guideline t ¼ d 2 6 , a smaller thinning constant t ¼ d 2 36 , and a larger thinning constant d 2 . This experimental setup allowed us to test, whether our guideline actually finds the (potentially wide) peak ESS/t. After checking for convergence, we measured the ESS/t, see Fig 5. For Yeast8, setting τ according to our guideline clearly produced the best performance. For ecYeast8 a six times smaller thinning constant was slightly more efficient (around 4%). However, choosing a six times larger thinning constant than our guideline decreased performance substantially. Because the ESS/t as a function of τ is not multimodal, we can conclude, that the 48 times larger τ, suggested previously [42], performs worse. The estimated time ratios for Yeast8 (957) and ecYeast8 (2090) let us conclude that very small thinning constants, such as τ = 1, are inefficient, because too much time would be spent on the back-transform. From these two test-cases, it is clear, that thinning should not be too small nor too large and that our guideline leads to practical performance.
For testing our guideline with the largest validation model, Recon3D, we needed to minimize the computational work even further, to achieve results in a reasonable time. We selected two thinning constants to test, namely our guideline, see Eq (14), and previous advice (τ = 8 � d 2 ) [42]. By benchmarking these two thinning constants, we test, whether our guideline leads to an improvement over existing advice.
For the previous advice (τ = 8 � d 2 = 8 � 4, 861 2 = 189, 034, 568), we observed that CHRRT requires 1.1 h to produce and store one sample per chain. Because it is computationally infeasible to run τ = 8 � 4, 861 2 to convergence, i.e., to obtain a representative number of samples, we estimate an upper bound on the efficiency of this thinning constant. Assuming optimistically that each of these samples is uncorrelated, i.e., ESS = #stored samples, we project that it takes approximately 110 h until convergence, because both parts of the split chain are required to reach an ESS of 50 [55]. For four parallel and independent chains, as in our benchmarks above and as advised [55], this corresponds, in the best of all cases, to an ESS/t of at most 4 1:1h ¼ 0:001 s −1 . Using the newly proposed guideline in Eq (14), CHRRT with a thinning factor of � 3, 938, 220, it takes 27 h to generate 1, 000 samples per chain. Using four parallel chains, this setup results in an ESS of 726 with anR of � 1.01. Therewith, our guideline yielded an ESS/t = 0.0075 s −1 , which is 7.5 times more efficient than the overly optimistic upper bound we found for the previously advised rule, when both methods use four chains. In summary, the new guideline is not only https://doi.org/10.1371/journal.pcbi.1011378.g005 more efficient, but it also produces a converged set of samples for storage in a fraction of the time. By using three validation models, we have empirically demonstrated that our guideline leads to efficient sampling, also for larger GEMs.

Conclusion and outlook
CHRR is the leading algorithm for sampling convex polytopes uniformly, having a single adjustable parameter, the thinning constant. In this work, we showed that the choice of the thinning constant has dramatic consequences on sampling efficiencies, which are instrumental for solving high-dimensional UCPS problems. Here, selecting an appropriate thinning constant can make the difference between sampling success and failure, thus, being the key to sample challenging network models to convergence. Especially for contemporary GEM sampling, it is therefore crucial to have a guideline at hand for the selection of the thinning constant that achieves near-optimal performance.
With a range of UCPS problems at hand, we studied the effect of thinning on CHRR sampling performance quantitatively, after explaining the statistical underpinnings of the efficiency metric (ESS/t). From our quantitative benchmarks, we derived simple guidelines for optimal CHRR thinning constant choice for simplices and GEMs, which we validated using three out-of-sample, larger GEMs. Applying these guidelines is not only beneficial for obtaining uniform samples of polytopes faster, but also helps to migrate UCPS towards green computing, as both the CPU time and the storage cost of samples are reduced. Concerning the question of how the computational requirements of CHRR with optimal thinning scale with effective model dimension, our numerical results reveal a quadratic and linear correlation for simplices and GEMs, respectively. Using the methods we have presented in our study, our guideline can be updated as new types of models and data become available.
Benchmarking new UPCS algorithms and comparing their performances with those of leading algorithms, such as CHRR, is important to advance the field and a topic of active research [53,58]. Recognizing the substantial impact of thinning on the performance of CHRR, we advise performing comparisons with tuned thinning, and to report the used thinning constant, which will help their reproduction.
Replacing the sequential per-sample by a one-shot or batched back-transform (Algorithm 1, L 12) could be more efficient, as long as memory issues are not limiting (storing unthinned samples of large models, such as Recon3D, consumes prohibitively much memory). To combat memory bottlenecks, Stein thinning [59], a technique to optimally compress the MCMC output, may be applied before sample back-transformation. However, while this combination of techniques is promising, the implementation is not straightforward. In contrast, fixed-frequency thinning, as discussed in this work, is already implemented in many existing CHRR packages [4,42,46] and immediately boosts performance without requiring additional work.
Despite thinning being a common MCMC practice, it is still controversially discussed. Statisticians have pointed out that thinning is often not necessary and that it typically wastes computational resources, unless the cost of using the samples is high [48,49]. Even then, Geyer argues that a thinning constant of two or three times the problem dimension should be suited in nearly all cases. Consequently, the conventional advice given to the MCMC practitioner is to not thin MCMC outputs, unless memory or post-processing capabilities are practically limiting. By showing that CHRR stands out among MCMC algorithms in that thinning is critical to performance for sampling high-dimensional convex polytopes, such as GEMs, our study encourages researchers to systematically examine conventional MCMC advice in their specific application domain.
Supporting information S1 Appendix. Supporting information. Overview of benchmark problems, measured sampling efficiencies, convergence diagnostics, and exemplary flux distributions for selected GEMs. (PDF)