SPOTting Model Parameters Using a Ready-Made Python Package

The choice for specific parameter estimation methods is often more dependent on its availability than its performance. We developed SPOTPY (Statistical Parameter Optimization Tool), an open source python package containing a comprehensive set of methods typically used to calibrate, analyze and optimize parameters for a wide range of ecological models. SPOTPY currently contains eight widely used algorithms, 11 objective functions, and can sample from eight parameter distributions. SPOTPY has a model-independent structure and can be run in parallel from the workstation to large computation clusters using the Message Passing Interface (MPI). We tested SPOTPY in five different case studies to parameterize the Rosenbrock, Griewank and Ackley functions, a one-dimensional physically based soil moisture routine, where we searched for parameters of the van Genuchten-Mualem function and a calibration of a biogeochemistry model with different objective functions. The case studies reveal that the implemented SPOTPY methods can be used for any model with just a minimal amount of code for maximal power of parameter optimization. They further show the benefit of having one package at hand that includes number of well performing parameter search methods, since not every case study can be solved sufficiently with every algorithm or every objective function.


Introduction
Ecological models are often very complex and contain many parameters that need to be optimized prior to model application. Reliable parameter estimation is highly dependent on various criteria, including the selected algorithm, the objective function and the definition of the prior parameter distribution. Difficulties involved in calibrating for example hydrological models have been partly attributed to the lack of robust optimization tools [1]. Numerous parameterization methods have been developed in the past (e.g. [2][3][4][5][6]), often published without access to the source code. They are widely accepted to determine the values of non-measureable parameters for a model [7]. Many of the methods have been established as part of the parameterization problem in hydrological modeling as early as in the 1990s [8,9]. The application of these methods has now become more widespread in other ecological disciplines and therefore, 4. Scalability: This requirement is an extension of the portability claim [26]. While we agree, that published codes should run both on Windows PC for method testing, as well as on Linux based high performance computing (HPC) systems, scalability goes beyond the portability claim. Scalability means on the one hand, a simple parallelization of the parameter search, where the algorithm allows parallel computation. A package should allow using the parallel power of HPC systems without the need for extensive knowledge of parallel systems. On the other hand, scalability means the possibility to optimize the computational performance of the model. The runtime of models that are fast to evaluate, like e.g. HBV [29] is often dominated by the time needed to load the parameters and input data from disk, and not by the CPU time. Tweaking the model to accept input data through memory can speed up the model evaluation by a magnitude. A scalable package should therefore allow such optimizations and not rely on input file manipulation as an interface between the parameter estimation method and the model alone, as it is the case for most model independent estimation packages. 5. Accessibility: Since a broad, modular package for parameter-estimation carries already a generalized infrastructure for parameter estimation, publishing the package as a free software enables method developers to extend it, without the need to reinvent for example likelihood definitions or parallelization structures. As such, new methods using the existing infrastructure can easily use all existing methods without further development. However, making the source code available for the public is not sufficient for accessibility. The source must also be modular in its structure and well documented, to simplify the adoption of the underlying infrastructure.
We have developed the parameter-spotting package SPOTPY in agreement with these five criteria. We have implemented and tested a wide range of commonly used algorithms into SPOTPY, to allow a user-friendly access to these powerful techniques, and to give an overview, which algorithms and which objective functions can be useful under specific parameter search problems.

Concept of SPOTPY
SPOTPY is broad as it comes along with different global optimization approaches. We included the Monte Carlo (MC), Latin Hyper Cube Sampler (LHS) [3] and Robust Parameter Estimation (ROPE) [5] methods that belong to the first group of stochastic probabilistic methods. They are all-around algorithms, applicable for uncertainty and calibration analysis. MC and LHS can furthermore be utilized within the GLUE methodology. Simulated Annealing (SA) is a heuristic subgroup of the stochastic probabilistic methods. We included a version by Kirckpatrick et al. [6]. The Maximum Likelihood Estimation method (MLE) belongs to the subgroup of hill climbing algorithms and is suited for monotonic response surfaces. Markov Chain Monte Carlo (MCMC) methods, a subgroup of the probabilistic methods, support the ability to jump away from local minima. We implemented the standard Metropolis MCMC sampler [4]. To cover the second group of probabilistic methods (evolutionary algorithms) we included the evolution strategy of SCE-UA. It is suited to calibrate models with high parameter space. Furthermore, the Differential Evolution Markov Chain (DE-MC Z ) was included to provide a Bayesian solution suited for optimization problems in high parameter space.
SPOTPY is modular since prior parameter distributions, model inputs, evaluation data and objective functions can be selected and combined by the user. The user-defined combination of the inputs can be run with the parameter search algorithms and results are saved either on the working storage or in a csv file. The database structure enables the analyses of the results in SPOTPY with pre-build plotting functions and statistical analyses like Gelman-Rubin diagnostic [30] or the Geweke test [31]. The database can also be used for any other external statistical software or computer language.
SPOTPY is independent as the model is wrapped in a "black box". One parameter set is defined as input; the model results are defined as output. Both deterministic and stochastic models can be analyzed. SPOTPY scalability is realized by using the Python programming language, since it has an increasing support from the scientific community and is a recommended programming language for scientific research [32]. Pure Python code can run on every operating system without any complicated building mechanism. Parallel computing on HPC systems is supported by using a Message Passing Interface (MPI) code. Five of the eight implemented algorithms are suitable to for parallel computing (MC, LHS, SCE-UA, DE-MC Z , ROPE). The MPI code depends on the open source python package mpi4py [33]. A sequential run does not have any dependencies to non-standard python libraries.
SPOTPY is accessible as open-source on the Python package index PyPI and comes along with tutorials to allow a user-friendly start without the need of a graphical user interface and the benefit that everyone can use the most recent version of the code [34]. The code follows object-orientated style, where it supports modularity and is conform to the Open Source Definition [35].

Structure of SPOTPY
The design of SPOTPY brings different parameter estimation approaches within one set-up to allow users testing a variety of different combinations and methods. Fig 1 shows the main processes of this package, consisting of six consecutive steps when applying SPOTPY.
The different steps included are the following: Step 1) Parameter distribution. Let θ = {p 1 , p 2 ,. . .p n } be the initial input set of parameters of a (ecological-) model M. The fp i g n i¼1 random variables are selected from a joint probability prior distribution. This can be any user-defined distribution. We have pre-built the distributions Uniform, (log-) Normal, Chi-square, Exponential, Gamma, Wald and Weilbull with NumPy [36]. Each parameter p i is marked with a user defined name, step size and optimal guess (initial parameter set), which are used as prior information by the algorithms and the database. The parameter name is used by the database, while the step size is an information needed for MCMC, MLE and SA to jump to the next point of the prior distribution. The optimal guess is the start point for all algorithms. The better this value is chosen, the faster convergence can be achieved.
Step 2) Simulation and evaluation. The output of M given a parameter set θ i is defined as simulation S. The observed data X is characterized as evaluation. The simulation function is designed to call a model, returning a list of simulated values. The observation data is loaded in the evaluation function. One can also analyze a model with SPOTPY, which is only returning an objective function. Both functions and the following objective function offer the user flexibility to analyze almost every model with SPOTPY.
Step 3) Objective function. The objective function (also known as cost-function or goodness-of-fit-measure) quantifies how well the simulated data fits the evaluation data. Various objective functions are available (e.g. [37,38]) and have been proposed to account different sorts of errors in the simulation [39,40]. A guidance, which objective function to take under specific circumstances, is given by [41]. Hence, SPOTPY comes along with a wide set of objective functions, from which the user can select one or more for a specific issue (BIAS; Nash-Sutcliff efficiency (NSE); logarithmized Nash-Sutcliff efficiency (logNSE); Correlation Coefficient (r); Coefficient of Determination (r 2 ); Covariance (cov); Decomposed (dRMSE), Relative (rRMSE) and Root Mean Squared Error (RMSE); Mean Absolute Error (MAE); Wilmott Agreement Index (AI)). The user has the option to combine different objective functions as only one function can be inaccurate [42]. A detailed description of the objective functions implemented in SPOTPY can be found for example in [43].
Step 4) Parameter estimation methods. The algorithms included in SPOTPY cover widely used parameter estimation methods from different approaches in recent publications. They can be connected with setup files containing the above-mentioned information about parameter distribution, simulation-and evaluation data as well as the objective functions. The simplest automatic parameter estimation method included is the MC method. It is used to sample random parameter values from a prior distribution. The structural LHS algorithm subdivides the distribution of each parameter into m equally probable non-overlapping intervals and creates a matrix by sampling from all created intervals. The algorithm has shown good projection properties [44][45][46][47]. MC and LHS can form the basis for the GLUE method [10,25], to get information about the posterior distribution of input parameters. GLUE has been widely applied in hydrology, but also in many other ecological disciplines, such as biogeochemistry or crop growth modeling [48][49][50]. If one is just interested in a fast calibration of a simple model (with nearly monotonically response function), the MLE is an efficient choice. To test whether the MLE algorithm is applicable for calibrating the desired model, it is recommend to test the model with MC first [51]. MLE maximizes the likelihood during the sampling, by adapting the parameter only in directions with an increasing likelihood. The famous Metropolis MCMC method can also deal with non-monotonically response functions. Nevertheless, it works similar as MLE. After each sampling, the likelihood is compared with last one. If the likelihood is better, the sampler jumps to the new sampled point. If not, it samples from the old position. Depending on a Metropolis decision, the sampler can also accept worse likelihoods (in order to avoid trapping at local optima). The MCMC algorithm can find a (quasi-) global optimum, but with a still remaining risk to stuck in local minima. The risk can be reduced by starting several chains/complexes that evolve individually in the parameter space. This technique is used in the global optimization strategy SCE-UA [1]. Each complex evolves independently to optimize the parameter. The population is periodically shuffled and new complexes are created with information from the previous complex. SCE-UA has found to be very robust in finding the global optimum of hydrological models and is one the most widely used algorithm in hydrological applications today [44]. Another robust method is SA. Thyer et al. [52] reported SA to be not as robust as the SCE-UA algorithm, but SA can be very efficient, when it is adopted to an optimization problem. After each step, a better objective function results in a new position. A worse objective function can be accepted with a Boltzman decision. If the new point is not accepted, the sampler jumps to a new parameter value. A variable controls a decreasing possibility to accept worse objective functions with increasing iterations. Thus, the risk to jump away from a global optimum is reduced. One of the most recent algorithms we present here is the DE-MC Z . It requires a minimal number of three chains that learn from each other during the sampling. It has the same Metropolis decision as the MCMC algorithm and has found to be quite efficient compared with other MCMC techniques [53]. Like SCE-UA and SA, DE-MC Z does not require any prior distribution information. Another non-Bayesian approach is to determine parameter uncertainty estimations with the concept of data depth. This has the benefit, that the resulting parameter sets have proven to be more likely giving good results when space or time period of the model changes, e.g. for validation [54]. This approach is realized in the ROPE algorithm.
Step 5) Database. The database can store results from every parameter estimation method. Either in the working storage, which is fast, or in a csv file, which is comfortable. Saved information for every iteration are the objective function (-s), every parameter setting, optional the simulation results and the chain number (for algorithms with multiple threads like SCE-UA and DE-MC Z ). The database can be analysed in any statistical software, programming language or the SPOTPY extension Analyser.
Step 6) Analyser. The Analyser module is an optional, but very powerful extension, which can read the SPOTPY database. Prebuild plots are provided for objective function and parameter traces, parameter interactions and best model runs. Posterior parameter sets can be selected and basic statistical analysis of the samples can be performed with Gelman-Rubin diagnostic [30] or the Geweke test [31].
To install SPOTPY, one just has to type pip install spotpy into the OS console. After that, SPOTPY can be used from any Python console: import spotpy #Import the package from spotpy_setup_rosenbrock import spotpy_setup #Import an example setup sampler = spotpy.algorithms.sceua (model_setup()) #Initialize an algorithm sampler.sample(10000) #Run the model 10000 times results = sampler.getdata() #Load the results from spotpy import analyser #Import optional extension spotpy.analyser.plot_parametertrace (results) #Plot the results

Set up of algorithms
The setting of the algorithms for the following case studies are depicted in Table 1. Two things are changed during the case studies: 1) The number of repetitions. 2) For efficiency reasons the set-up of the algorithms was slightly changed when, sampling from the Ackley function in the third case study: SA with Tini = 30, Ntemp = 30, SCE-UA with ngs = 2 and DE-MC Z with nChains = dim.
All settings of the algorithms should be adjusted, when dealing with other optimization problems.

Case Studies
We show five different case studies to depict the capability of the different algorithms integrated in SPOTPY under different parameter optimization problems. Three of these case studies cover classical numerical optimization problems with a known posterior target distribution, one a hydrological model simulating real-world measured soil moisture values and one a biogeochemistry model where we tested the influence of different objective functions.

Rosenbrock function
The Rosenbrock function [55] is often used to test and compare the performance of optimization methods [56][57][58][59]. It can be described as a flat parabolic valley (Fig 2) and is defined by where we set the parameter space of the control variables to x 2 [−10,10] and y 2 [−10,10]. The global minimum is located at (x opt ,y opt ) = (1,1). At this point the function value is f Rosen (x,y) = 0. Due to its shape, it is an easy playground for optimization algorithms to find the flat valley, but it is hard to find the deepest point. Trace plots were created after sampling n = 5,000 times from parameter space of the Rosenbrock function. Fig 3 depicts the behavior of the algorithms. MC and LHS sample from the complete parameter distribution over the whole time. These algorithms find a few parameter

Griewank function
The two dimensional Griewank function [60] is defined as where we selected the parameter space for x 2 [−50,50] and y 2 [−50,50]. One of the characteristics of the function is that it has many regularly distributed local minima (Fig 4), which makes it challenging to find the global optimum located at (x opt ,y opt ) = (0,0). The demanding function has been used for algorithm performance testing by others [61][62][63]. The surface of this function allows the investigation the algorithm performance under equifinality. The different algorithms were applied to the Griewank function (n = 5,000 iterations). The parameter interactions are shown as combined dotty plots (Fig 5). We added a surface plot of the Griewank function to show the locations of the various local minima. We conducted the GLUE methodology to MC and LHS by selecting the 10% best runs. One can see samples for MC and LHS on almost every local minima and the global optimum. The random walk of the MLE jumps between three local minima after the burn in, without finding the global optimum. The MCMC algorithm reaches several local minima in intermediate steps and found the global minimum. Nevertheless, the samples orientate not on the local minima and form clouds around the optimum. The SCE-UA samples parameter combinations from the whole range and reduces the range more and more to the global optimum. It stops the search after 4,000 iterations; nevertheless, the remaining parameter uncertainty is still high. SA did not find the optimal value and samples only negative values for the parameter y. DE-MC Z found many local minima and the global optimum, which is representing the hilly response surface very good. ROPE reduced the investigated parameter range gradually centered to the optimal point.

Ackley function
The Ackley function is defined as where x = (x i ,..,x d ) and the domain is defined as x i 2 [−32.768,32.768] [64]. The function has many regularly distributed local minima in the outer region, and a large funnel as the global  SPOTting Model Parameters Using a Ready-Made Python Package optimum in the center located at f Ackley (0,. . .,0) = 0 (Fig 6). The function is widely used for algorithm testing [65][66][67]. We used setups with 2, 3, 5, 10, 20, 30 and 50 domains to investigate the algorithms behavior when dealing with an increasing number of parameters, while finding a very small global optimum. We used n = 15,000 iterations for every setup testing the algorithm's performance (Fig 7). All algorithms perform worse with increasing dimensions. MC and LHS struggle even with two domains to find the exact optimum. With five domains, MLE, MCMC, SA and ROPE get close to the global optimum but do not find the exact position. With 10 domains DE-MC Z does not reach the exact global optimum during the 15,000 iterations, but got close with a remaining RMSE of 2-5. With 20 and 30 domains MLE, MCMC and DE-MC Z still give reasonable results, and can gather information during the iterations to get close to the optimum. Only SCE-UA is able to find the global optimum of the Ackley function with 50 domains during the given number of iterations.

Catchment Modelling Framework
We used the Catchment Modelling Framework (CMF) developed by [68] to investigate the performance of the algorithms when dealing with a real measured world optimization problem. CMF is a toolbox to build water transport models from a set of pre-built process descriptions. The toolbox has been used before to model different catchments in one and two dimensions [45,49,69,70] and enables the test of hypotheses in hydrology [71]. In the application presented here, CMF is set up to simulate soil moisture in a one-dimensional soil column. Evapotranspiration is predicted by the Shuttleworth-Wallace method and soil water fluxes are modeled with the Richards equation. We searched for parameter sets to describe the shape of the water retention curve according to van Genuchten-Mualem [72] with four parameters: alpha, porosity, n and k sat . The prior parameter distributions are based on results from [49], where soil moisture was simulated with CMF for an agricultural site in Muencheberg. We used data from a Free Air Carbon dioxide Enrichment (FACE) grassland study site A1 in Linden, Germany [73]. The soil is classified as a Fluvic Gleysol. Meteorological data was used for the weather simulation and groundwater table data for the groundwater influence on this site. For the model evaluation, we utilized daily measured soil moisture data from the topsoil layer (0-0.1 m). The simulation time was from 01/06/1998 to 01/01/1999 as burn-in and simulation results until 01/01/ 2000 were used for evaluation.
We started 10,000 iterations with a MPI structure. Twenty parallel threads on a HPC were used, resulting in a nearly linear speed up. The minimal RMSE was used to evaluate model performance. The best model runs of CMF found with the different algorithms are shown in Fig 8. All algorithms performed almost equally well. The ROPE, SCE-UA and MLE found the best parameter sets for predicting soil moisture with an RMSE as low as 3.2096. All other algorithms performed only slightly worse with RMSE between 3.2098 and 3.2153. Overall, the model simulations follow the main trend of the observations, especially during the first seven months when soil moisture decreased from 45 to 20%. The following flashy soil moisture curve is indicating that the model has deficiencies in simulating rapid changes in soil moisture of the uppermost soil layer, at least with the given forcing precipitation data and available information on soil parameters. This is a problem, which cannot be solved with parameter calibration and needs further investigation, e.g. by improving the model structure, adding more prior information into the process based model, or by testing other models. Fig 9 shows the parameter distribution of the best performing parameter sets as well as the prior and posterior distribution (derived by selecting the best 10% of the sampling). The calibration algorithms MLE and SCE-UA resulted in a small posterior distribution. MCMC and DE-MC Z reduced the parameter uncertainty of the posterior distribution by over 90% for parameter n and by 20% for parameter k sat . The other algorithms failed in reducing the parameter ranges. The optimal parameter setting for k sat was found on a wide range from 0.8 (MLE) to 1.9 (MC) m day -1 and not in the center of the posterior distributions. Optimal settings for the parameter porosity were found in the upper range of the prior distribution, with small posterior distributions. The optimal parameter settings found for alpha, porosity and n are close to the center of the posterior distribution.
We do not know the true optimal parameter set of our hydrological model, or whether it exists at all. The optimal parameter sets we found differ from each other, indicating a high equifinality of the model. The optimal parameter settings for porosity were found in a small range from 0.6 to 0.63 for all algorithms. This values are in line with measured porosity of 0.60 SPOTting Model Parameters Using a Ready-Made Python Package to 0.65 [74]. The tested algorithms resulted all in similar best fits, with an RMSE = 3.2 Vol. % soil moisture. A direct comparison to other models is not possible, as this is the first study modelling soil moisture on the Linden FACE site. Nevertheless, results are not as good as others, e.g. [75] who used SCE-UA and found after 6,000 HYDRUS simulations remaining errors of RMSE = 0.03 Vol. % soil moisture on a different site. However, we attribute our relatively high remaining error to model deficiencies in capturing all natural effects, which might be a changing k sat in the upper most soil layer after heavy rainfall on this site [76].

LandscapeDNDC
We used LandscapeDNDC (LDNDC) developed by Haas et al. [69] to investigate the influence of the chosen objective function on the best selected model run. LDNDC is a biogeochemistry model to simulate greenhouse gas emissions and nutrient turn over processes. We used the model to simulate CO 2 emissions from the soil of the Linden FACE site. The emissions were measured with the closed chamber method [74]. We setup the model with a warm-up period of one year and simulated the time from 01/01/1999 to 13/06/2006. Thirty parameters were sampled in a LHS with 50,000 runs. We selected four different widely used objective functions from SPOTPY to quantify the fit of the resulting simulations to the observations (Fig 10). The selected objective functions were the BIAS (ranging from -1 to +1, with 0 indicating an unbiased simulation), coefficient of determination (r 2 ranging from 0 total disagreement, to 1 perfect regression), Root Mean Squared Error (RMSE ranging from -1 total disagreement to 0 perfect fit) and the Wilmott Agreement Index (AI, ranging from 0 total disagreement to 1 perfect fit). The best BIAS found has a value of 0.03, which is close to its optimum of zero. However, soil emissions are overestimated in winter with 20 kg C ha -1 and underestimated in the summer months with 20 kg C ha -1 . Looking at the distribution of the residuals, over-and underestimations are nearly Gaussian, resulting in a mean error near zero over the whole simulation period. The simulation with the best r 2 has a relative high value of 0.75, but the simulations substantially underestimate the emissions from the soil during the whole model run. Nevertheless, the simulations follow the seasonal trend well, reflecting a reasonable timing of the model (Fig 10). To improve the fit of absolute emissions with the model, RMSE and AI are good options in SPOTPY. The distribution of the residual errors RMSE are narrower than the ones for AI, which indicates that the observations are better represented by the RMSE optimized model. In contrast, AI optimized simulations are superior in matching the absolute peaks of observed emissions.

Discussion
All algorithms work well in SPOTPY, which was shown by the different case studies. Our intention was not to accept or reject algorithms but rather show their functionality within SPOTPY. Our results show reasonable effects, which have been reported in other algorithm comparison papers. The Rosenbrock case study showed us well performing algorithms when searching for a single optimal parameter set, like MLE and SCE-UA. Vrugt et al. [42] tested SCEM-UA (similar to SCE-UA) and MCMC on the Rosenbrock function, and reported that the first algorithm was faster in convergence. We found SA struggling in finding the optimum of the Rosenbrock, an observation also reported by Wang et al. [56]. When dealing with many SPOTting Model Parameters Using a Ready-Made Python Package local minima like it is true for the Griewank function, we got good results, when we conducted MC and LHS with the GLUE concept. They represent best the surface of the function. SCE-UA needed 4,000 iterations to stop the parameter search on the function, Jung et al. [28] found the optimum during 40,000 iterations. This difference in efficiency is most likely due to the setting of the algorithm. With an increasing amount of parameters on the Ackley function, we have seen good results for MLE, MCMC and DE-MC Z and very good results for SCE-UA. Karaboga et al. [77] tested the swarm intelligence algorithm ABC on the Ackley function with 30 domains. They found after the optimum after 1,000 iterations, which is even better than the best performing algorithm of SPOTPY (SCE-UA). This algorithm could be a nice extension for the SPOTPY package. Behrangi et al. [78] used SCE-UA in a similar set up and found the optimum of a 30 dimensional Ackley function after around 4,000 iterations, exactly as we found it. Genetic algorithms give poor results on the Ackley function with 30 domains [77]. Madsen et al. [79] calibrated a hydrological model with SCE-UA and SA, showing that the first one worked better-similar to our case studies. Huang et al. [80] recommend MCMC to deal with many parameters. Our findings on the Ackley function show that evolution algorithms are even better suited for higher dimensional search problems. Ter Braak and Vrugt [2] showed that the evolution algorithm DE-MC Z can be 5-26 times more efficient than MCMC. Gong [81] come to the same conclusion when testing the evolution algorithm SCEM-UA against the stochastic algorithm MLE. Good results were reported when using MC on a hydrological model with small parameter space [80]. We found that the rather simple MC and LHS often performed worse when searching the exact global optimum, but give reliable results under equifinality, like it is the case for our hydrological model build with CMF. We recommend using these simple search algorithms with the GLUE concept.
The LDNDC case study revealed that conclusions based on the model performance can be flawed when it is analysed with a not well-suited objective function. For example, the BIAS can reduce the overall model error, but it does not guarantee that the model fits the temporal variations of the observed data. The r 2 is suited to find good parameter sets to predict timing of the system, but this objective function does not take the absolute values into account. RMSE and AI are well suited to find model realizations fitting the absolute values of the observed data. Legates and McCabe [40] pointed out that the coefficient of determination (r 2 ) is inappropriate for model quantification because it is oversensitive to high flow but insensitive to additive and proportional differences between model simulations and observations. They recommended RMSE as the model evaluation tools.
Guinot et al. [82] generally classified objective functions into two types: distance-based objective function (e.g. RMSE) and weak form-based objective function (e.g. BIAS and r 2 ). They concluded that although the distance-based objective functions have the advantage to search an identifiable model-parameter set, they may cause the local extremes in the response surface and lead to mis-calibration i.e. being trapped around secondary optima. By contrast, the weak form-based objective functions are more monotone than the distance-based objective functions. Depending on the aim of the model approach, it can be beneficial to combine several objective functions to find reliable posterior simulations [49]. While this is not a surprising or new result, the advantage of SPOTPY is, that it facilitates an easy comparison of currently eleven objective functions in a pre-and post-processing mode.
In general, the findings reveal that not every algorithm is suited for every parameter search problem. Even more, every algorithm has its advantages and disadvantages. Therefore, the overview in Table 2 showing the main capabilities of the algorithms might help the end-user to select a suited and efficient algorithm, without the need to understand and test every possible optimization technique. The approximate Bayesian compute techniques MC and LHS are very well suited to calibrate the model on multiple outputs with different objective functions. Nevertheless, they are very inefficient in high parameter space, like shown in the Ackley case study. Contrasting, the Metropolis MCMC method can be very efficient. However, it has the disadvantage that it is not possible to be used in parallel computing systems. DE-MC Z is suited to be used in parallel, but gets inefficient when too many chains need to converge. ROPE is fully parallelizable but the generation of the parameter space after each subset needs a long computation time. All implemented non-Bayesian techniques (MLE, SCE-UA and SA) search only for one optimal parameter set, which makes them in general more efficient than the Bayesian approaches, but the outcome is very dependent on the used objective function and the parameter space, which is why they have to be chosen carefully. Furthermore, SCE-UA and SA need a pre-testing of the algorithm settings. They should not be used, without an adaption to a specific parameter search problems. MLE can be used straightforward, but the user has a higher risk to get stuck in a local optima. Unfortunately, there is no perfect algorithm and no perfect objective function. It depends. In this regard, SPOTPY was developed to help users to find their specific optimal solution.

Conclusion
As a final aspect, we want to check, if our five defined criteria are met by SPOTPY. We conclude that SPOTPY is a broad package, combining several optimization approaches. We hope that it is helpful to users, as no other parameter estimation package provides such a wide range of implemented techniques and is so easy to use. Optimization experts can still accessed and adopted the complexity of the algorithms. Modularity is given as the entire package is coded in Python. The independency of SPOTPY makes it applicable to every model; in contrast to other packages, e.g. the presented toolbox of the SWAT [20]. The scalability claim of SPOTPY is valid. The straightforward MPI support results in a nearly linear time boost when analyzing time-consuming model runs and is as easy as tipping: parallel = 'mpi'. Finally, the open-source accessibility of SPOTPY makes it available for everyone to every field of science, where parameter optimization is useful. We will maintain the code at least for the next two years and expand the functionality systematically. For instance, the most recent version comes along with a sensitivity analysis algorithm (FAST) and more possibilities to structure the simulation data in the database. Finally yet importantly, we welcome new contributors to share their results or to provide new ideas for features.