Obstructions to Sampling Qualitative Properties

Background Sampling methods have proven to be a very efficient and intuitive method to understand properties of complicated spaces that cannot easily be computed using deterministic methods. Therefore, sampling methods became a popular tool in the applied sciences. Results Here, we show that sampling methods are not an appropriate tool to analyze qualitative properties of complicated spaces unless RP = NP. We illustrate these results on the example of the thermodynamically feasible flux space of genome-scale metabolic networks and show that with artificial centering hit and run (ACHR) not all reactions that can have variable flux rates are sampled with variables flux rates. In particular a uniform sample of the flux space would not sample the flux variabilities completely. Conclusion We conclude that unless theoretical convergence results exist, qualitative results obtained from sampling methods should be considered with caution and if possible double checked using a deterministic method.


Introduction
Given a space S R n , we are interested in computing a set of sample points s 1 , . . ., s k 2 S that represent the space S and its properties. Thus, by randomly generating sample points this offers one approach to overcome the curse of dimensionality. For example, if S is a polyhedron, we can compute nearly uniformly distributed sampling points of S in polynomial time [1] and from this approximate the volume of the polyhedron [2,3]. In contrast no deterministic polynomial-time algorithm can compute the volume of convex sets with less than exponential relative error in n [4,5].
Thus, sampling methods are nowadays used in many application areas, for example in the analysis of flux spaces in genome-scale metabolic networks. A genome-scale metabolic network models the chemical reactions possible in an organism. Using the assumption that no internal substance can be over-or under-produced a flow problem is obtained. This then leads to a polyhedron of feasible flows (also called fluxes) through the network (called flux space): Here, S 2 R m×n is called the stoichiometric matrix. It encodes for each reaction r 2 R = {1, . . ., n} which and how much of the metabolites M = {1, . . ., m} are consumed resp. produced. ℓ, u are bounds on the reaction rates. Due to the size of these networks, deterministic methods to enumerate extreme points and thus a representative set of feasible flows [6,7] are unpractical.
Therefore, tools and methods have been developed to sample the whole flux space [8][9][10][11] or only the extreme points [12] and then used to derive biological insights [13][14][15][16]. Typical properties that are analyzed are correlations between fluxes through different reactions [17][18][19][20], or the distribution of flux rates through a given reaction [21] While points in polyhedra can be sampled efficiently (in theory), often additional constraints are added to make the results more biological reasonable. However, this often makes the space of feasible solutions non-convex and associated decision problems NP-hard. This is for example the case with "looplaw"-thermodynamic constraints [21,22], here sloppily represented using the phrase "v thermo. feasible": To test how well sampling works, we consider the scenario that we want to decide if the flux space has a given property. In Practical Obstructions to Sampling we consider the problem of determining if positive resp. negative flux through a reaction is possible. There, we show that with artificial centering hit and run (ACHR) [8,23] we cannot determine this property correctly for several reactions in genome-scale metabolic networks. In particular, the artifacts are not only observed in non-convex flux spaces, but also for polyhedral flux spaces.
In Theoretical Obstructions to Sampling we show that for the non-convex flux space T this problem is not specific to ACHR, but more fundamental. Therefore, we generalize the problem to decide if a space S has a given property (formulated as a decision problem PROB) by using a sampling algorithm. We define the concept of non-trivial polynomial time sampling algorithm and show how it can be used to solve decision problems in randomized polynomial time. We show that if the decision property is NP-hard, then there exists no polynomial time sampling algorithm that samples S in a non-trivial way w.r.t. to the property formulated by PROB unless NP = RP, where RP is the class of problems that can be solved in randomized polynomial time [24].
For the toy network shown in Fig 1 "looplaw"-thermodynamic constraints imply that positive flux through r 1 and r 2 at the same time is not possible, because they form a stoichiometrically balanced internal cycle. Similarly, positive flux through r 1 , r 3 , and r 4 at the same time is also not possible. This implies that the flux space looks as shown in Fig 2. Clearly, this flux space is not convex. Furthermore, we observe that if we sample the flux space uniformly, we would not sample any flux with positive flux through r 1 , since the flux space with positive flux through r 1 is 1-dimensional while the rest of the flux space is 2-dimensional.
For "looplaw"-thermodynamic constraints we showed that deciding if a reaction can carry positive flux is NP-hard [37]. Thus, according to our theoretical results in Theoretical Obstructions to Sampling, it should be harder to sample flux spaces with thermodynamic constraints than without. Therefore, we test how well we can predict the reactions with variable flux rates by sampling fluxes through the network. Finding all reactions with variable flux rate is called flux variability analysis (FVA). If it is applied without thermodynamic constraints, it can be solved efficiently using linear programming [38,39]. With thermodynamic constraints, we can typically solve it in practice using mixed integer linear programming techniques (MILP) [21,37]. For a good sampling algorithm we expect that for every reaction that can have positive flux (as determined by FVA) we also get samples whith positive flux through the reaction. Therefore, we count for how many of the reactions we fail this goal, i.e. where we do not even obtain a single sample with a positive flux through the reaction.   Flux space of toy network. The gray area denotes the flux space. In this example it was assumed that input/output flux values are constrained to at most 1. It can be seen that flux v 1 through r 1 is exclusive to fluxes v 2 and v 3 through r 2 and r 3 respectively. Since fluxes through r 2 can be combined with fluxes through r 3 , the flux space with v 2 , v 3 > 0 is two-dimensional, while the flux space with v 1 > 0 is only one-dimensional. Hence, a uniform sample of the flux space would almost surely have zero flux through r 1 . Of particular interest are the reactions R C of reactions contained in internal cycles, because they are affected by thermodynamic constraints. The reader is referred to [37] for more details and a precise definition. The reactions not contained in internal cycles R NC on the other hand should not be affected by "looplaw"-thermodynamic constraints [37] Method To verify the impact of the theoretical results, we implemented the following computational experiment to analyze the difference between sampling with thermodynamic constraints and without thermodynamic constraints. For a given metabolic network with flux space P (with or without thermodynamic constraints) we do the following: 1. Sample n points in the flux space P.
2. Run flux variability analysis (FVA) on P and define: • R + ≔ reactions that can have positive flux, • R − ≔ reactions that can have negative flux.
3. From this we define the following 4 reaction classes: For each reaction class A R + , we count the number of reactions for which we never sampled positive flux n P A and then compute the ratio r P A ≔ n P A jAj .
5. For each reaction class A R − , we count the number of reactions for which we never sampled negative flux n P A and then compute the ratio r P A ≔ n P A jAj .
We do this for the steady-state flux space F (without thermodynamic constraints) and for the thermodynamically constrained flux space T. Since positive lower bounds and negative upper bounds for reactions in internal cycles make it already NP-hard to find a thermodynamically feasible flux distribution, we set all positive lower bounds and all negative upper bounds to 0.
For the sampling method we chose to use the ACHR method implemented in the COBRA toolbox [40], since it is one of the most established tools for sampling flux spaces. They also offer a flag to activate thermodynamic constraints. Unfortunately this flag has no effect in the current version (2.0.5). Hence, we implemented a simple post-processing step to turn thermodynamically infeasible fluxes into thermodynamically feasible fluxes by deleting internal cycles [37]. To check that our results are not an artifact of our post-processing step, we also implemented the post-processing method suggested by Schellenberger et al. [21], where for each sample point a thermodynamically feasible flux vector is computed that minimizes the L 1norm distance to the sample point. We remark that this method solves an MILP in the postprocessing step and hence, cannot be considered a polynomial-time sampling method.
The sampling method was run with default parameters, except that the number of points per file is half the number of output points. This means that for each sample point ACHR performed at least 200 steps and potentially biased samples from the beginning were dropped. We choose 10000 output points, since this allowed to run the analysis in a couple of hours. In contrast, the variability of reactions (with and without thermodynamic constraints) can be computed deterministically using the FVA-method in [37] in a few minutes.
We selected a set of genome-scale metabolic networks based from the BiGG-database [41] as a test-set, since these networks are well curated and well established test-networks. We did not select Human Recon 1., since we were not able to run thermodynamically constrained flux variability analysis on it. Instead, we also added the more recent E. coli iJO1366 network [42]. Also, we did not select the M. barkeri network because the sampling algorithm from the COBRA toolbox crashed. The matlab scripts for the computations can be found in S1 code.

Results
The computed results for 10000 samples using the cycle deletion method [37] can be seen in Fig 3. The L 1 -minimization method [21] was not practically applicable for most of the test networks, because it took more than 10 minutes to compute the closest thermodynamically feasible flux vector for even the first sample point. Only for H. pylori iIT341 and S. cerevisiae iND750 this was a feasible approach. For these networks, however, already the simple cycle deletion method already managed to sample non-zero fluxes for all reactions in internal cycles that can have non-zero flux. Hence, these results are not separately shown.
We observe that the ratio of reactions where no positive/negative fluxes were sampled is larger for the case with thermodynamic constraints than without. Also, as expected, the results for R NC are the same for F and T. However, we are surprised to find that even without thermodynamic constraints, we miss about 5% of all possible reaction directions. This shows us that ACHR might not be a very good approach for sampling steady-state flux spaces of genome-scale metabolic networks, as has also been observed in [43].
One potential cause for the bad performance of ACHR is that the flux space of metabolic networks is badly conditioned, i.e., that there exist reactions with very low variability and reactions with high variability. While this is indeed the case (see Fig 4a and

Theoretical Obstructions to Sampling
Let PROB:J ! {0, 1} be an NP-hard decision problem on a set J of inputs (commonly we use the set of words over the alphabet {0, 1} as input, i.e., J = {0, 1} Ã and the length of an input is just the length of the word). To solve PROB by sampling, we require that the structure of the sampling space represents PROB in a certain way. This we encode using a function X that maps every input I 2 J into a subset of R n , i.e. an element of the powerset of R n (P(R n )). This space X(I) will be the space from which we will draw samples. Note that we will make no assumptions on the size of n compared to the input size jIj. Additionally, we use a test-function f that tests whether a point of the sample-space x 2 X(I) has a non-trivial property, i.e. f(I, x) > 0.
Definition 1 (Sampling space) Given a decision problem PROB:J ! {0, 1}, we call (X, f) a sampling space for PROB if X : J ! P(R n ) and f : J × R n ! R satisfy: • f(I, x) is continuous in x for all x 2 R n .
• f(I, x) can be computed in time polynomial in the encoding length of I and x. For x 2 R without a finite encoding length, we assume that f(I, x) is well defined, but its computation does not terminate.
• It holds for all I 2 J that Let (O, F, P) be a probability space, i.e., a sample space O, events F, and probability function P. It will serve us as the space from which we draw the seeds for the sampling algorithm. Here, we assume that the sampling method is given as a function S : J × N × O ! R n , i.e., for every time point we get a sample. With this formalism we want to capture the behavior of randomwalk sampling methods that do a random walk through X(I) and can be run for arbitrarily long times to improve the sampling result. Classical sampling algorithms can also be captured by this formalism by iteratively running the sampling method and computing a consensus value. If the sampling algorithm did not produce a result for an (early) time point, it could simply return a default value. Since we will only consider asymptotic behavior, this will not be of any importance. • S(I, k, Á) ! X in distribution for k ! 1, and • S(I, k, Á) converges to X in polynomial time, i.e., for every closed set A R n holds jPðSðI; k; ÁÞ 2 AÞ À PðX 2 AÞj < ε for all k > q(jIj, ε −1 ). Assume there exists such a sampling method S : J × N ! R n that samples the feasibility space X(I) of the NP-hard optimization problem PROB for each given instance I 2 J in a nontrivial way, i.e., without losing any features (represented by f): Definition 4 (Non-trivial Sampling Algorithm) S : J × N × O ! R n is a non-trivial sampling algorithm w.r.t. f : J × R n ! R if for every I 2 J there exists a random variable X : O ! R n such that • S(I, k, Á) ! X in distribution for k ! 1.
• If 9x 2 X(I) with f(I, x) > 0, then 1Àt pðjIjÞ for a polynomial p. We can then use S to construct a probabilistic algorithm that will decide PROB. The probabilistic algorithm that we are going to construct will belong to the class RP (randomized polynomial time) [24].

Definition 5 (Complexity Class RP) A decision problem p is in RP if there exists a probabilistic algorithm that
• runs in polynomial time, • if the answer to p is NO, it outputs NO, and • if the answer to p is YES, it outputs YES with probability at least 1 2 . Since RP = NP is an open problem in theoretical computer science, it is very unlikely that a given probabilistic polynomial-time sampling algorithm of the thermodynamically constrained flux space actually solves the RP = NP problem. Hence, it is much more likely that the sampling algorithm samples the feasible flux space incompletely.
Theorem 1 Let PROB: J ! {0, 1} be an NP-hard decision problem with sampling space (X, f). Unless RP = NP, there exists no feasible, non-trivial, polynomial time sampling algorithm S : PROOF Assume there exists such a sampling algorithm. We construct an algorithm in RP for PROB.
For I 2 J define t(I) ≔ P(X 2 A(I)) for A(I) ≔ {x 2 R n : f(I, x) 0}. Since f(I, Á) is continuous, it follows that A(I) is closed and Borel-measurable. Hence, t(I) is well defined.
To prove that the problem would be in RP, we still have to increase the probability of YES in the positive case. This can be done by re-running the algorithm.
By Def. 4 there exists a polynomial p with 1 1ÀtðIÞ pðjIjÞ. We choose t ≔ pðjIjÞÀ1 pðjIjÞ and it follows that 1 1Àt ¼ pðj I jÞ and t(I) t. Hence, we can apply Lemma 1 without having to know t(I). By construction of Alg. 1 the computation of X k takes time Oðqðj I j; 2 1Àt ÞÞ. We observe that the encoding for the computed sample X k is bounded by the computation time Oðqðj I j; 2 1Àt ÞÞ. Hence, by Def. 1 there exists a polynomial g such that the runtime of Alg. 1 is bounded by O g j I j; qðj I j; 2 1Àt Þ À Á À Á . To obtain a correct result if the correct answer is YES with probability at least 1 2 , we re-run the algorithm at least 1 Since the probability of NO in one run is at most tþ1 2 , it follows that the probability for NO in all runs is at most We can estimate the number of iterations by observing that Using the Theorem of l'Hopital we have Hence, we can bound the number of iterations by ¼ OðpðjIjÞÞ: Thus, we get a YES if the correct answer is YES with probability at least 1 2 after a running time of O g jIj; q jIj; 2 1 À t 1 O ðg ðjIj; qðjIj; 2pðjIjÞÞÞpðjIjÞÞ: We have shown under the assumption of the existence of a sampling algorithm with the given properties that PROB is in RP. Since PROB is also NP-hard, the existence of such a sampling algorithm implies RP = NP. Hence, no such sampling algorithm can exist if RP 6 ¼ NP.

Discussion
We observe that the conditions that we require for Thm. 1 on the sampling algorithm are very weak. We do not require uniform distribution, we only require that with some polynomially small probability we also sample fluxes unequal to zero in our target distribution and that we converge in polynomial time to this target distribution.
Assuming RP 6 ¼ NP, it follows that for every sampling algorithm on the thermodynamic flux space there exist networks where the algorithm has one of the following properties: • The sampling algorithm does not converge in polynomial time to the target distribution, or • the target distribution is trivial (i.e., the probability of sampling 0 is 1).
Of course, we may be lucky and the algorithm actually samples a non-trivial distribution for the input networks. However the result says that there are networks for which the sampling algorithm will only sample 0 fluxes for some reactions and indeed, we saw that this happens also in practice not only for sampling the thermodynamically constrained flux space but also the ordinary steady-state flux space. It might be a property of the ACHR method that even for the ordinary steady-state flux space only 0 fluxes for some reactions are sampled, because in theory classical hit and run sampling (with appropriate rounding to remove the heterogeneous scales of metabolic networks) is guaranteed to sample uniformly [1]. De Martino et al. [43] showed that indeed ACHR seems to have problems with high-dimensional instances, like 500 dimensional uniform hypercubes. Other sampling methods, e.g. loopy-belief propagation [10,11] or poling-based methods [9] might not have these problems.
Unreliable sampling is very critical, since we then may be led to the false assumption that the reaction is never used, although it actually could be. To make sure that such results are true, it is essential to verify them with a deterministic method. In the case of deciding whether flux is possible through a given reaction, we can decide this by solving an optimization problem [21,37].
We have shown that sampling artifacts happen for the flux variability problem with thermodynamic constraints (and in practice they even happen without thermodynamic constraints with ACHR sampling). However, sampling is used to check a wide variety of different properties. Although the result does not directly imply that sampling results for these other properties are unreliable as well, caution is highly advised. For example, consider correlation / flux coupling analysis [20]. If a reaction always carries zero flux in all samples by an artifact, although it can also carry non-zero flux, it follows that this reaction seems uncorrelated to all other reactions. However, it may very well be correlated / coupled. In Fig 1, we see such an example. Assume the flux space (see Fig 2) is sampled using a uniform distribution. Then, we will almost surely never sample non-zero flux through reaction r 1 . Correlation analysis would yield that flux through r 1 is uncorrelated (they are even independent) to flux through r 2 and r 3 , although the fluxes are actually exclusive (e.g. r 1 and r 2 cannot carry flux at the same time). On the other hand, if reaction r 2 would be removed (because it is simply the aggregation of r 3 and r 4 ) the result would change significantly. Then, with uniform sampling, positive fluxes through r 1 and through r 3 would be sampled with equal probability.

Conclusion
Although sampling has been used successfully for the analysis of many different kind of problems, the results obtained by sampling should be used with caution. In the field of metabolic network analysis in particular, we showed (assuming RP 6 ¼ NP) that for every polynomialtime sampling method obeying thermodynamic constraints there exist networks for which the sampling method will produce artifacts. Hence, qualitative results obtained by sampling complex spaces should always be double checked by a different method.
Supporting Information S1 Code. Source code. Matlab scripts used to compute the results in the manuscript. (ZIP)

Author Contributions
Conceived and designed the experiments: ACR. Performed the experiments: ACR. Analyzed the data: ACR. Wrote the paper: ACR.