PLoSWiki
http://topicpages.ploscompbiol.org/wiki/Main_Page
MediaWiki 1.17.0
first-letter
Media
Special
Talk
User
User talk
PLoSWiki
PLoSWiki talk
File
File talk
MediaWiki
MediaWiki talk
Template
Template talk
Help
Help talk
Category
Category talk
Approximate Bayesian computation
92
545
2012-04-11T11:32:58Z
Mikaelsunnaker
13
Created page with "=Introduction= =Approximate Bayesian Computation= ==The ABC Rejection Algorithm== ==Sufficient Summary Statistics== ==Model Comparison with Bayes Factors== ==Quality Control..."
=Introduction=
=Approximate Bayesian Computation=
==The ABC Rejection Algorithm==
==Sufficient Summary Statistics==
==Model Comparison with Bayes Factors==
==Quality Controls==
==Further Methodological Developments==
=Recent Debate About ABC=
=General Criticisms of Bayesian Methods=
==Small Number of Models==
==Prior Distribution and Parameter Ranges==
==Large Data Sets==
=ABC Specific Criticisms=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
=Outlook=
=Acknowledgements=
=References=
547
2012-04-11T11:39:50Z
Mikaelsunnaker
13
Added a first rough abstract
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
<math>x_1 = 1</math>
=Approximate Bayesian Computation=
==The ABC Rejection Algorithm==
==Sufficient Summary Statistics==
==Model Comparison with Bayes Factors==
==Quality Controls==
==Further Methodological Developments==
=Recent Debate About ABC=
=General Criticisms of Bayesian Methods=
==Small Number of Models==
==Prior Distribution and Parameter Ranges==
==Large Data Sets==
=ABC Specific Criticisms=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
=Outlook=
=Acknowledgements=
=References=
548
2012-04-11T12:18:38Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In this paper we first introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D)=(p(D|H)p(H))/(p(D)</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. (2) we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
==Sufficient Summary Statistics==
==Model Comparison with Bayes Factors==
==Quality Controls==
==Further Methodological Developments==
=Recent Debate About ABC=
=General Criticisms of Bayesian Methods=
==Small Number of Models==
==Prior Distribution and Parameter Ranges==
==Large Data Sets==
=ABC Specific Criticisms=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
=Outlook=
=Acknowledgements=
=References=
549
2012-04-11T12:20:08Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In the first part of this paper, we introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D)=(p(D|H)p(H))/(p(D)</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. (2) we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
==Sufficient Summary Statistics==
==Model Comparison with Bayes Factors==
==Quality Controls==
==Further Methodological Developments==
=Recent Debate About ABC=
=General Criticisms of Bayesian Methods=
==Small Number of Models==
==Prior Distribution and Parameter Ranges==
==Large Data Sets==
=ABC Specific Criticisms=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
=Outlook=
=Acknowledgements=
=References=
550
2012-04-11T12:25:33Z
Mikaelsunnaker
13
ABC rejection method text added and formatted
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In the first part of this paper, we introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D)=(p(D|H)p(H))/(p(D)</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. (2) we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\ra(\hat{D},D) < \epsilon</math>
where the distance measure <math>ρ(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D} = D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
==Model Comparison with Bayes Factors==
==Quality Controls==
==Further Methodological Developments==
=Recent Debate About ABC=
=General Criticisms of Bayesian Methods=
==Small Number of Models==
==Prior Distribution and Parameter Ranges==
==Large Data Sets==
=ABC Specific Criticisms=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
=Outlook=
=Acknowledgements=
=References=
551
2012-04-11T12:34:32Z
Mikaelsunnaker
13
edited the rejection algorithm
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In the first part of this paper, we introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D)=(p(D|H)p(H))/(p(D)</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. (2) we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
==Model Comparison with Bayes Factors==
==Quality Controls==
==Further Methodological Developments==
=Recent Debate About ABC=
=General Criticisms of Bayesian Methods=
==Small Number of Models==
==Prior Distribution and Parameter Ranges==
==Large Data Sets==
=ABC Specific Criticisms=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
=Outlook=
=Acknowledgements=
=References=
552
2012-04-11T12:43:22Z
Mikaelsunnaker
13
Sufficient summary statistics
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In the first part of this paper, we introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D)=(p(D|H)p(H))/(p(D)</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
==Quality Controls==
==Further Methodological Developments==
=Recent Debate About ABC=
=General Criticisms of Bayesian Methods=
==Small Number of Models==
==Prior Distribution and Parameter Ranges==
==Large Data Sets==
=ABC Specific Criticisms=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
=Outlook=
=Acknowledgements=
=References=
553
2012-04-11T12:46:18Z
Mikaelsunnaker
13
Added/edited quality controls
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In the first part of this paper, we introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D)=(p(D|H)p(H))/(p(D)</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
==Quality Controls==
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
==Further Methodological Developments==
=Recent Debate About ABC=
=General Criticisms of Bayesian Methods=
==Small Number of Models==
==Prior Distribution and Parameter Ranges==
==Large Data Sets==
=ABC Specific Criticisms=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
=Outlook=
=Acknowledgements=
=References=
554
2012-04-11T12:48:13Z
Mikaelsunnaker
13
Further Methodological Developments added/edited
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In the first part of this paper, we introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D)=(p(D|H)p(H))/(p(D)</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
==Quality Controls==
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
==Further Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Recent Debate About ABC=
=General Criticisms of Bayesian Methods=
==Small Number of Models==
==Prior Distribution and Parameter Ranges==
==Large Data Sets==
=ABC Specific Criticisms=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
=Outlook=
=Acknowledgements=
=References=
556
2012-04-11T13:10:10Z
Mikaelsunnaker
13
editing of Model comparison with Bayes factors
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In the first part of this paper, we introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
==Quality Controls==
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
==Further Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Recent Debate About ABC=
=General Criticisms of Bayesian Methods=
==Small Number of Models==
==Prior Distribution and Parameter Ranges==
==Large Data Sets==
=ABC Specific Criticisms=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
=Outlook=
=Acknowledgements=
=References=
557
2012-04-11T13:18:47Z
Mikaelsunnaker
13
Change of section structure
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In the first part of this paper, we introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
== More General Criticisms ==
= Small Number of Models =
=Prior Distribution and Parameter Ranges=
=Large Data Sets=
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
=Acknowledgements=
=References=
558
2012-04-11T13:21:07Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In the first part of this paper, we introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
==More General Criticisms==
===Small Number of Models===
===Prior Distribution and Parameter Ranges===
===Large Data Sets===
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
=Acknowledgements=
=References=
559
2012-04-11T13:27:05Z
Mikaelsunnaker
13
Pitfalls and Controversies around ABC added (to be modified)
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
In the first part of this paper, we introduce the ABC rejection algorithm, emphasizing the underlying assumptions and approximations, and review recent improvements of the method. We then review the recent criticisms and sort them into invalid ones, valid general ones, and valid ABC-specific ones. Finally, we discuss the consequence of valid criticisms on model-based analysis with ABC and comment upon the challenges associated with the trend toward ever more complex models — a trend accelerated by the availability of ABC based methods.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, εis set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
==More General Criticisms==
===Small Number of Models===
===Prior Distribution and Parameter Ranges===
===Large Data Sets===
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
=Acknowledgements=
=References=
560
2012-04-11T13:34:18Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, εis set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
==More General Criticisms==
===Small Number of Models===
===Prior Distribution and Parameter Ranges===
===Large Data Sets===
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
=Acknowledgements=
=References=
561
2012-04-11T13:35:45Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
==More General Criticisms==
===Small Number of Models===
===Prior Distribution and Parameter Ranges===
===Large Data Sets===
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
=Acknowledgements=
=References=
565
2012-04-11T21:11:45Z
Mikaelsunnaker
13
Text added to Pitfalls and controversies around ABC
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in [?] and [?]). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) [?]. This is a well-known phenomenon usually referred to as the curse-of-dimensionality [?]. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids [?], which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect [?]. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate [?] (see also Section ?).
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form [?]. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation [?] and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics [?]. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand [?].
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior [?]. Another method was proposed in [?], which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately [?]. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest [?]. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest [?]. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above asseses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
=Acknowledgements=
=References=
566
2012-04-12T08:10:27Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in [?] and [?]). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) [?]. This is a well-known phenomenon usually referred to as the curse-of-dimensionality [?]. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids [?], which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect [?]. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate [?] (see also Section ?).
==Approximation of the Posterior==
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form [?]. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation [?] and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics [?]. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand [?].
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior [?]. Another method was proposed in [?], which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately [?]. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest [?]. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest [?]. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above asseses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
=Acknowledgements=
=References=
567
2012-04-12T08:38:53Z
Mikaelsunnaker
13
New sections added
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in [?] and [?]). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) [?]. This is a well-known phenomenon usually referred to as the curse-of-dimensionality [?]. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids [?], which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect [?]. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate [?] (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in [?]. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported [?]. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in [?]. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications [?], that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in [?]. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in [?], together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form [?]. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation [?] and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics [?]. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand [?].
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior [?]. Another method was proposed in [?], which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately [?]. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest [?]. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest [?]. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above asseses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic [?]. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form [?]
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in [?] (previously discussed in [?] and in [?]). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models [?]. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models [?].
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived [?], which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see [?]) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem [?].
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
=Acknowledgements=
=References=
568
2012-04-12T08:42:42Z
Mikaelsunnaker
13
Acknowledgements added
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in [?] and [?]). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) [?]. This is a well-known phenomenon usually referred to as the curse-of-dimensionality [?]. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids [?], which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect [?]. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate [?] (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in [?]. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported [?]. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in [?]. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications [?], that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in [?]. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in [?], together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form [?]. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation [?] and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics [?]. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand [?].
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior [?]. Another method was proposed in [?], which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately [?]. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest [?]. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest [?]. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above asseses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic [?]. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form [?]
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in [?] (previously discussed in [?] and in [?]). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models [?]. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models [?].
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived [?], which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see [?]) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem [?].
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction [3] or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
569
2012-04-12T08:44:55Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in [?] and [?]). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) [?]. This is a well-known phenomenon usually referred to as the curse-of-dimensionality [?]. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids [?], which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect [?]. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate [?] (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in [?]. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported [?]. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in [?]. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications [?], that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in [?]. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in [?], together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form [?]. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation [?] and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics [?]. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand [?].
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior [?]. Another method was proposed in [?], which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately [?]. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest [?]. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest [?]. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above asseses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic [?]. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form [?]
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in [?] (previously discussed in [?] and in [?]). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models [?]. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models [?].
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived [?], which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see [?]) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem [?].
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction [3] or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
570
2012-04-12T08:56:09Z
Mikaelsunnaker
13
List of references added
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in [?], the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in [?] and [?]). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) [?]. This is a well-known phenomenon usually referred to as the curse-of-dimensionality [?]. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids [?], which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect [?]. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate [?] (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in [?]. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported [?]. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in [?]. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications [?], that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in [?]. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in [?], together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form [?]. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation [?] and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics [?]. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand [?].
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior [?]. Another method was proposed in [?], which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately [?]. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest [?]. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest [?]. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above asseses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic [?]. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form [?]
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in [?] (previously discussed in [?] and in [?]). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models [?]. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models [?].
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived [?], which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see [?]) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem [?].
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction [3] or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
1. Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.
2. Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.
3. Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.
4. Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172.
5. Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.
6. Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.
7. Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.
8. Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.
9. Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.
10. Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.
11. Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.
12. Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.
13. Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.
14. Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.
15. Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.
16. Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.
17. Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.
18. Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.
19. Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.
20. Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.
21. Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.
22. Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.
23. Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.
24. Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.
25. Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.
26. Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.
27. Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.
28. Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.
29. Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.
30. Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.
31. Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.
32. Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.
33. Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.
34. Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.
35. Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.
36. Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.
37. Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.
38. Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.
39. Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.
40. Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.
41. Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.
42. Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.
43. Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.
44. Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.
45. Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.
46. Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.
47. Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.
48. Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.
49. Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.
50. Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.
51. Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.
52. Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
571
2012-04-12T09:45:37Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in [?]). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see [?]).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography [?]. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics [?]. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community [?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood [?], which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data [?]. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in [?] (see also [?]), and has been used in a number of studies [?]. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in [?], such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in ?. Beyond that, cross-validation techniques [?] and predictive checks [?] represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously [?], and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters [?]. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in [?] and [?]). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) [?]. This is a well-known phenomenon usually referred to as the curse-of-dimensionality [?]. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids [?], which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect [?]. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate [?] (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in [?]. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported [?]. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in [?]. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications [?], that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in [?]. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in [?], together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form [?]. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation [?] and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics [?]. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand [?].
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior [?]. Another method was proposed in [?], which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately [?]. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest [?]. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest [?]. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above asseses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic [?]. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form [?]
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in [?] (previously discussed in [?] and in [?]). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models [?]. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models [?].
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived [?], which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see [?]) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem [?].
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction [3] or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref>
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref>
<ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref>
<ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref>
<ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref>
<ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref>
<ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref>
<ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref>
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
578
2012-04-12T11:41:24Z
Mikaelsunnaker
13
References added
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />[?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in [?] and [?]). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) [?]. This is a well-known phenomenon usually referred to as the curse-of-dimensionality [?]. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids [?], which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect [?]. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate [?] (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in [?]. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported [?]. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in [?]. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications [?], that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in [?]. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in [?], together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form [?]. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation [?] and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics [?]. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand [?].
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior [?]. Another method was proposed in [?], which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately [?]. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest [?]. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest [?]. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above asseses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic [?]. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form [?]
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in [?] (previously discussed in [?] and in [?]). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models [?]. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models [?].
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived [?], which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see [?]) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem [?].
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC [?], which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior [?], or relatively poor parallelizability [?].
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting [?]. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively [?].
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in [?]. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model [?]. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in [?].
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction [3] or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref>
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref>
<ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref>
<ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref>
<ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref>
<ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref>
<ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref>
<ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref>
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
585
2012-04-12T11:54:51Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />[?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in [?] and [?]). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) [?]. This is a well-known phenomenon usually referred to as the curse-of-dimensionality [?]. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids [?], which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect [?]. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate [?] (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in [?]. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported [?]. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in [?]. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications [?], that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in [?]. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in [?], together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form [?]. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation [?] and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics [?]. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand [?].
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior [?]. Another method was proposed in [?], which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately [?]. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest [?]. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest [?]. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above asseses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic [?]. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form [?]
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in [?] (previously discussed in [?] and in [?]). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models [?]. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models [?].
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived [?], which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see [?]) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem [?].
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction [3] or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref>
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref>
<ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref>
<ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref>
<ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref>
<ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref>
<ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref>
<ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref>
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
590
2012-04-12T12:03:23Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />[?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in [?]. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported [?]. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in [?]. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications [?], that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in [?]. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in [?], together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form [?]. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation [?] and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics [?]. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand [?].
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior [?]. Another method was proposed in [?], which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately [?]. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest [?]. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest [?]. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above asseses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic [?]. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form [?]
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in [?] (previously discussed in [?] and in [?]). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models [?]. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models [?].
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived [?], which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see [?]) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem [?].
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref>
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref>
<ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref>
<ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref>
<ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref>
<ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref>
<ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref>
<ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref>
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
596
2012-04-12T12:20:26Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />[?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Majoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic [?]. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form [?]
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in [?] (previously discussed in [?] and in [?]). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models [?]. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models [?].
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived [?], which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see [?]) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem [?].
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref>
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref>
<ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref>
<ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref>
<ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref>
<ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref>
<ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref>
<ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref>
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
597
2012-04-12T12:35:46Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />[?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study [?]. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space [?]. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options [?]. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead [?]. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis [?]? As pointed out in [?], there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref>
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref>
<ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref>
<ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref>
<ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref>
<ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref>
<ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref>
<ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref>
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
598
2012-04-12T12:47:57Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />[?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study <ref name="Bertorelle" />. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space <ref name="Templeton2009a" />. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options <ref name="Bertorelle" />. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead <ref name="Csillery" />. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis <ref name="Templeton2009a" />? As pointed out in <ref name="Beaumont2010" />, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” [?], which is connected to classical objections of Bayesian approaches [?].
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy [?].
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref>
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref>
<ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref>
<ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref>
<ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref>
<ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref>
<ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref>
<ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref>
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
599
2012-04-12T12:52:36Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />[?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study <ref name="Bertorelle" />. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space <ref name="Templeton2009a" />. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options <ref name="Bertorelle" />. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead <ref name="Csillery" />. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis <ref name="Templeton2009a" />? As pointed out in <ref name="Beaumont2010" />, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” <ref name="Templeton2010" />, which is connected to classical objections of Bayesian approaches <ref name="Beaumont2010b" />.
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy <ref name="Jaynes" />.
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in [?] that part of the data had to be omitted in the ABC based analysis presented in [?]. Although a number of authors claim that large data sets are not a practical limitation [?], this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics [?], which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref>
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref>
<ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref>
<ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref>
<ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref>
<ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref>
<ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref>
<ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref>
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
600
2012-04-12T12:59:16Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />[?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study <ref name="Bertorelle" />. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space <ref name="Templeton2009a" />. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options <ref name="Bertorelle" />. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead <ref name="Csillery" />. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis <ref name="Templeton2009a" />? As pointed out in <ref name="Beaumont2010" />, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” <ref name="Templeton2010" />, which is connected to classical objections of Bayesian approaches <ref name="Beaumont2010b" />.
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy <ref name="Jaynes" />.
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in <ref name="Templeton2009a" /> that part of the data had to be omitted in the ABC based analysis presented in <ref name="Fagundes" />. Although a number of authors claim that large data sets are not a practical limitation <ref name="Bertorelle" /><ref name="Beaumont2010b" />, this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics <ref name="Feng" />, which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref>
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref>
<ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref>
<ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref>
<ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref>
<ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref>
<ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref>
<ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref>
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
618
2012-04-12T17:41:46Z
Mikaelsunnaker
13
Commented out unused references
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />[?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study <ref name="Bertorelle" />. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space <ref name="Templeton2009a" />. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options <ref name="Bertorelle" />. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead <ref name="Csillery" />. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis <ref name="Templeton2009a" />? As pointed out in <ref name="Beaumont2010" />, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” <ref name="Templeton2010" />, which is connected to classical objections of Bayesian approaches <ref name="Beaumont2010b" />.
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy <ref name="Jaynes" />.
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in <ref name="Templeton2009a" /> that part of the data had to be omitted in the ABC based analysis presented in <ref name="Fagundes" />. Although a number of authors claim that large data sets are not a practical limitation <ref name="Bertorelle" /><ref name="Beaumont2010b" />, this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics <ref name="Feng" />, which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<!-- <ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref> -->
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<!-- <ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref> -->
<!-- <ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref> -->
<!-- <ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref> -->
<!-- <ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref> -->
<!-- <ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref> -->
<!-- <ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref> -->
<!-- <ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref> -->
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
619
2012-04-12T19:48:39Z
Mikaelsunnaker
13
Table with potential error sources added
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />[?]. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study <ref name="Bertorelle" />. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space <ref name="Templeton2009a" />. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options <ref name="Bertorelle" />. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead <ref name="Csillery" />. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis <ref name="Templeton2009a" />? As pointed out in <ref name="Beaumont2010" />, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” <ref name="Templeton2010" />, which is connected to classical objections of Bayesian approaches <ref name="Beaumont2010b" />.
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy <ref name="Jaynes" />.
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in <ref name="Templeton2009a" /> that part of the data had to be omitted in the ABC based analysis presented in <ref name="Fagundes" />. Although a number of authors claim that large data sets are not a practical limitation <ref name="Bertorelle" /><ref name="Beaumont2010b" />, this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics <ref name="Feng" />, which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Tables=
{| class="wikitable"
|+ Table 1: Error sources in ABC-based statistical inference
|-
! Error source
! Potential issue
! Solution
! Section
|-
| Non-zero tolerance ε
| The computed posterior distribution is biased.
| Theoretical/practical studies of the sensitivity of the posterior distribution to the tolerance. / Noisy ABC.
| ?
|-
| Non-sufficient statistics
| Inflated posterior distributions due to information loss.
| Automatic selection/semi-automatic identification of sufficient statistics. / Model validation checks (e.g., see [22]).
| ?
|-
| Small nr of models/Mis-specified models
| The investigated models are not representatative/lack predictive power.
| Careful selection of models./ Evaluation of the predictive power.
| ?
|-
| Priors and parameter ranges
| Conclusions may be sensitive to the choice of priors. / Model choice may be meaningless.
| Check sensitivity of Bayes factors to the choice of priors. / Some theoretical results regarding choice of priors are available. / Use alternative methods for model validation.
| ?
|-
| Curse-of-dimensionality
| Low acceptance rates. / Model errors cannot be distinguished from an insufficient exploration of the parameter space. / Risk of overfitting.
| Methods for model reduction if applicable. / Methods to speed up the parameter exploration. / Quality controls to detect overfitting.
| ?
|-
| Model ranking with summary statistics
| The computation of Bayes factors on summary statistics may not be related to the Bayes factors on the original data, and therefore meaningless.
| Only use summary statistics that fulfill the necessary and sufficient conditions to produce a consistent Bayesian model choice. / Use alternative methods for model validation.
| ?
|-
| Implementation
| Low protection to common assumptions the in simulations and the inference process.
| Sanity checks of results. / Standardization of software.
| ?
|}
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<!-- <ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref> -->
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<!-- <ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref> -->
<!-- <ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref> -->
<!-- <ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref> -->
<!-- <ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref> -->
<!-- <ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref> -->
<!-- <ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref> -->
<!-- <ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref> -->
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
624
2012-04-13T08:15:31Z
Mikaelsunnaker
13
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems in biology. However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study <ref name="Bertorelle" />. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space <ref name="Templeton2009a" />. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options <ref name="Bertorelle" />. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead <ref name="Csillery" />. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis <ref name="Templeton2009a" />? As pointed out in <ref name="Beaumont2010" />, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” <ref name="Templeton2010" />, which is connected to classical objections of Bayesian approaches <ref name="Beaumont2010b" />.
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy <ref name="Jaynes" />.
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in <ref name="Templeton2009a" /> that part of the data had to be omitted in the ABC based analysis presented in <ref name="Fagundes" />. Although a number of authors claim that large data sets are not a practical limitation <ref name="Bertorelle" /><ref name="Beaumont2010b" />, this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics <ref name="Feng" />, which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Tables=
{| class="wikitable"
|+ Table 1: Error sources in ABC-based statistical inference
|-
! Error source
! Potential issue
! Solution
! Section
|-
| Non-zero tolerance ε
| The computed posterior distribution is biased.
| Theoretical/practical studies of the sensitivity of the posterior distribution to the tolerance. / Noisy ABC.
| ?
|-
| Non-sufficient statistics
| Inflated posterior distributions due to information loss.
| Automatic selection/semi-automatic identification of sufficient statistics. / Model validation checks (e.g., see [22]).
| ?
|-
| Small nr of models/Mis-specified models
| The investigated models are not representatative/lack predictive power.
| Careful selection of models./ Evaluation of the predictive power.
| ?
|-
| Priors and parameter ranges
| Conclusions may be sensitive to the choice of priors. / Model choice may be meaningless.
| Check sensitivity of Bayes factors to the choice of priors. / Some theoretical results regarding choice of priors are available. / Use alternative methods for model validation.
| ?
|-
| Curse-of-dimensionality
| Low acceptance rates. / Model errors cannot be distinguished from an insufficient exploration of the parameter space. / Risk of overfitting.
| Methods for model reduction if applicable. / Methods to speed up the parameter exploration. / Quality controls to detect overfitting.
| ?
|-
| Model ranking with summary statistics
| The computation of Bayes factors on summary statistics may not be related to the Bayes factors on the original data, and therefore meaningless.
| Only use summary statistics that fulfill the necessary and sufficient conditions to produce a consistent Bayesian model choice. / Use alternative methods for model validation.
| ?
|-
| Implementation
| Low protection to common assumptions the in simulations and the inference process.
| Sanity checks of results. / Standardization of software.
| ?
|}
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<!-- <ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref> -->
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<!-- <ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref> -->
<!-- <ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref> -->
<!-- <ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref> -->
<!-- <ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref> -->
<!-- <ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref> -->
<!-- <ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref> -->
<!-- <ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref> -->
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
653
2012-05-01T09:39:26Z
Cdessimoz
12
Added links to wikipedia
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in [[wp:Bayesian statistics|Bayesian statistics]]. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems such as those arising in [[wp:Biology|biological sciences]], in particular population [[wp:Genetics|genetics]], [[wp:ecology|ecology]], [[wp:epidemiology|epidemiology]], and [[wp:systems biology|systems biology]] (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). However, although ABC seems to offer a promising computational speedup compared to conventional approaches, the scope of applications and the intrinsic limitations of ABC are still not fully understood.
ABC comprises a class of well-founded computational methods, but also one that is based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of parameter estimation and model selection.
=Introduction=
Approximate Bayesian computation (ABC) methods have rapidly established themselves as indispensable tools for statistical inference of highly complex biological models. Likelihood functions in those settings may be costly to evaluate, or even intractable. ABC performs Bayesian parameter estimation and model learning by approximating the likelihood function with stochastic simulations, and it is therefore applicable to any parametric model that can be simulated efficiently.
ABC has recently gained popularity in biological applications, due to the complexity of the modeled systems, and has been applied in areas such as population genetics, ecology, epidemiology, and systems biology (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). Since its advent in <ref name="Rubin" />, the spread of ABC has triggered the scientific community to develop improved versions of the basic method, which further increased the computational efficiency (e.g., see <ref name="Marjoram" /><ref name="Sisson" /><ref name="Wegmann" />).
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study <ref name="Bertorelle" />. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space <ref name="Templeton2009a" />. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options <ref name="Bertorelle" />. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead <ref name="Csillery" />. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis <ref name="Templeton2009a" />? As pointed out in <ref name="Beaumont2010" />, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” <ref name="Templeton2010" />, which is connected to classical objections of Bayesian approaches <ref name="Beaumont2010b" />.
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy <ref name="Jaynes" />.
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in <ref name="Templeton2009a" /> that part of the data had to be omitted in the ABC based analysis presented in <ref name="Fagundes" />. Although a number of authors claim that large data sets are not a practical limitation <ref name="Bertorelle" /><ref name="Beaumont2010b" />, this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics <ref name="Feng" />, which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Tables=
{| class="wikitable"
|+ Table 1: Error sources in ABC-based statistical inference
|-
! Error source
! Potential issue
! Solution
! Section
|-
| Non-zero tolerance ε
| The computed posterior distribution is biased.
| Theoretical/practical studies of the sensitivity of the posterior distribution to the tolerance. / Noisy ABC.
| ?
|-
| Non-sufficient statistics
| Inflated posterior distributions due to information loss.
| Automatic selection/semi-automatic identification of sufficient statistics. / Model validation checks (e.g., see [22]).
| ?
|-
| Small nr of models/Mis-specified models
| The investigated models are not representatative/lack predictive power.
| Careful selection of models./ Evaluation of the predictive power.
| ?
|-
| Priors and parameter ranges
| Conclusions may be sensitive to the choice of priors. / Model choice may be meaningless.
| Check sensitivity of Bayes factors to the choice of priors. / Some theoretical results regarding choice of priors are available. / Use alternative methods for model validation.
| ?
|-
| Curse-of-dimensionality
| Low acceptance rates. / Model errors cannot be distinguished from an insufficient exploration of the parameter space. / Risk of overfitting.
| Methods for model reduction if applicable. / Methods to speed up the parameter exploration. / Quality controls to detect overfitting.
| ?
|-
| Model ranking with summary statistics
| The computation of Bayes factors on summary statistics may not be related to the Bayes factors on the original data, and therefore meaningless.
| Only use summary statistics that fulfill the necessary and sufficient conditions to produce a consistent Bayesian model choice. / Use alternative methods for model validation.
| ?
|-
| Implementation
| Low protection to common assumptions the in simulations and the inference process.
| Sanity checks of results. / Standardization of software.
| ?
|}
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<!-- <ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref> -->
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<!-- <ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref> -->
<!-- <ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref> -->
<!-- <ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref> -->
<!-- <ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref> -->
<!-- <ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref> -->
<!-- <ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref> -->
<!-- <ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref> -->
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
654
2012-05-01T09:52:14Z
Cdessimoz
12
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in [[wp:Bayesian statistics|Bayesian statistics]]. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems such as those arising in [[wp:Biology|biological sciences]], in particular population [[wp:Genetics|genetics]], [[wp:ecology|ecology]], [[wp:epidemiology|epidemiology]], and [[wp:systems biology|systems biology]] (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). By offering considerable computational speed-up compared to conventional approaches, ABC methods widen the realm of statistical inference. Although they are well-founded statistical methods, ABC methods are based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of [[wp:Estimation Theory|parameter estimation]] and [[wp:Model Selection|model selection]].
=Approximate Bayesian Computation=
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
In Eq. ? we note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
==The ABC Rejection Algorithm==
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
==Sufficient Summary Statistics==
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
==Model Comparison with Bayes Factors==
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
=Quality Controls=
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
=Example=
=Pitfalls and Controversies around ABC=
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
As all statistical methods, a number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
==Curse-of-Dimensionality==
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
==Approximation of the Posterior==
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
==Choice and Sufficiency of Summary Statistics==
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
==Bayes Factor with ABC and Summary Statistics==
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
==Indispensable Quality Controls==
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study <ref name="Bertorelle" />. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
==More General Criticisms==
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
===Small Number of Models===
Model-based methods have been criticized for not exhaustively covering the hypothesis space <ref name="Templeton2009a" />. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options <ref name="Bertorelle" />. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead <ref name="Csillery" />. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis <ref name="Templeton2009a" />? As pointed out in <ref name="Beaumont2010" />, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
===Prior Distribution and Parameter Ranges===
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” <ref name="Templeton2010" />, which is connected to classical objections of Bayesian approaches <ref name="Beaumont2010b" />.
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy <ref name="Jaynes" />.
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
===Large Data Sets===
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in <ref name="Templeton2009a" /> that part of the data had to be omitted in the ABC based analysis presented in <ref name="Fagundes" />. Although a number of authors claim that large data sets are not a practical limitation <ref name="Bertorelle" /><ref name="Beaumont2010b" />, this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics <ref name="Feng" />, which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
=History=
==Beginnings==
==Recent Methodological Developments==
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
=Outlook=
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
=Tables=
{| class="wikitable"
|+ Table 1: Error sources in ABC-based statistical inference
|-
! Error source
! Potential issue
! Solution
! Section
|-
| Non-zero tolerance ε
| The computed posterior distribution is biased.
| Theoretical/practical studies of the sensitivity of the posterior distribution to the tolerance. / Noisy ABC.
| ?
|-
| Non-sufficient statistics
| Inflated posterior distributions due to information loss.
| Automatic selection/semi-automatic identification of sufficient statistics. / Model validation checks (e.g., see [22]).
| ?
|-
| Small nr of models/Mis-specified models
| The investigated models are not representatative/lack predictive power.
| Careful selection of models./ Evaluation of the predictive power.
| ?
|-
| Priors and parameter ranges
| Conclusions may be sensitive to the choice of priors. / Model choice may be meaningless.
| Check sensitivity of Bayes factors to the choice of priors. / Some theoretical results regarding choice of priors are available. / Use alternative methods for model validation.
| ?
|-
| Curse-of-dimensionality
| Low acceptance rates. / Model errors cannot be distinguished from an insufficient exploration of the parameter space. / Risk of overfitting.
| Methods for model reduction if applicable. / Methods to speed up the parameter exploration. / Quality controls to detect overfitting.
| ?
|-
| Model ranking with summary statistics
| The computation of Bayes factors on summary statistics may not be related to the Bayes factors on the original data, and therefore meaningless.
| Only use summary statistics that fulfill the necessary and sufficient conditions to produce a consistent Bayesian model choice. / Use alternative methods for model validation.
| ?
|-
| Implementation
| Low protection to common assumptions the in simulations and the inference process.
| Sanity checks of results. / Standardization of software.
| ?
|}
=Acknowledgements=
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
=References=
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<!-- <ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref> -->
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<!-- <ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref> -->
<!-- <ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref> -->
<!-- <ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref> -->
<!-- <ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref> -->
<!-- <ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref> -->
<!-- <ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref> -->
<!-- <ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref> -->
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
655
2012-05-01T10:02:45Z
Cdessimoz
12
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in [[wp:Bayesian statistics|Bayesian statistics]]. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems such as those arising in [[wp:Biology|biological sciences]], in particular population [[wp:Genetics|genetics]], [[wp:ecology|ecology]], [[wp:epidemiology|epidemiology]], and [[wp:systems biology|systems biology]] (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). By offering considerable computational speed-up compared to conventional approaches, ABC methods widen the realm of statistical inference. Although they are well-founded statistical methods, ABC methods are based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain for ABC exacerbates the challenges of [[wp:Estimation Theory|parameter estimation]] and [[wp:Model Selection|model selection]].
==Approximate Bayesian Computation==
===Motivation===
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
Note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
===The ABC Rejection Algorithm===
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
===Sufficient Summary Statistics===
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
===Model Comparison with Bayes Factors===
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
==Quality Controls==
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
==Example==
==Pitfalls and Controversies around ABC==
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
As all statistical methods, a number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
===Curse-of-Dimensionality===
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
===Approximation of the Posterior===
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
===Choice and Sufficiency of Summary Statistics===
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
===Bayes Factor with ABC and Summary Statistics===
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
===Indispensable Quality Controls===
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study <ref name="Bertorelle" />. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
===More General Criticisms===
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
====Small Number of Models====
Model-based methods have been criticized for not exhaustively covering the hypothesis space <ref name="Templeton2009a" />. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options <ref name="Bertorelle" />. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead <ref name="Csillery" />. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis <ref name="Templeton2009a" />? As pointed out in <ref name="Beaumont2010" />, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
====Prior Distribution and Parameter Ranges====
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” <ref name="Templeton2010" />, which is connected to classical objections of Bayesian approaches <ref name="Beaumont2010b" />.
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy <ref name="Jaynes" />.
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
====Large Data Sets====
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in <ref name="Templeton2009a" /> that part of the data had to be omitted in the ABC based analysis presented in <ref name="Fagundes" />. Although a number of authors claim that large data sets are not a practical limitation <ref name="Bertorelle" /><ref name="Beaumont2010b" />, this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics <ref name="Feng" />, which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
==History==
===Beginnings===
===Recent Methodological Developments===
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
==Outlook==
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
==Tables==
{| class="wikitable"
|+ Table 1: Error sources in ABC-based statistical inference
|-
! Error source
! Potential issue
! Solution
! Section
|-
| Non-zero tolerance ε
| The computed posterior distribution is biased.
| Theoretical/practical studies of the sensitivity of the posterior distribution to the tolerance. / Noisy ABC.
| ?
|-
| Non-sufficient statistics
| Inflated posterior distributions due to information loss.
| Automatic selection/semi-automatic identification of sufficient statistics. / Model validation checks (e.g., see [22]).
| ?
|-
| Small nr of models/Mis-specified models
| The investigated models are not representatative/lack predictive power.
| Careful selection of models./ Evaluation of the predictive power.
| ?
|-
| Priors and parameter ranges
| Conclusions may be sensitive to the choice of priors. / Model choice may be meaningless.
| Check sensitivity of Bayes factors to the choice of priors. / Some theoretical results regarding choice of priors are available. / Use alternative methods for model validation.
| ?
|-
| Curse-of-dimensionality
| Low acceptance rates. / Model errors cannot be distinguished from an insufficient exploration of the parameter space. / Risk of overfitting.
| Methods for model reduction if applicable. / Methods to speed up the parameter exploration. / Quality controls to detect overfitting.
| ?
|-
| Model ranking with summary statistics
| The computation of Bayes factors on summary statistics may not be related to the Bayes factors on the original data, and therefore meaningless.
| Only use summary statistics that fulfill the necessary and sufficient conditions to produce a consistent Bayesian model choice. / Use alternative methods for model validation.
| ?
|-
| Implementation
| Low protection to common assumptions the in simulations and the inference process.
| Sanity checks of results. / Standardization of software.
| ?
|}
==Acknowledgements==
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
==References==
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<!-- <ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref> -->
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<!-- <ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref> -->
<!-- <ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref> -->
<!-- <ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref> -->
<!-- <ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref> -->
<!-- <ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref> -->
<!-- <ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref> -->
<!-- <ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref> -->
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
656
2012-05-01T10:05:47Z
Cdessimoz
12
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in [[wp:Bayesian statistics|Bayesian statistics]]. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems such as those arising in [[wp:Biology|biological sciences]], in particular population [[wp:Genetics|genetics]], [[wp:ecology|ecology]], [[wp:epidemiology|epidemiology]], and [[wp:systems biology|systems biology]] (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). By offering considerable computational speed-up compared to conventional approaches, ABC methods widen the realm of statistical inference. Although they are well-founded statistical methods, ABC methods are based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain of ABC exacerbates the challenges of [[wp:Estimation Theory|parameter estimation]] and [[wp:Model Selection|model selection]].
==Approximate Bayesian Computation==
===Motivation===
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
:<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
:<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
Note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
===The ABC Rejection Algorithm===
All ABC based methods approximate the likelihood function by simulations that are compared to the observational data. More specifically, with the ABC rejection algorithm—the most basic form of ABC — a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
:<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
===Sufficient Summary Statistics===
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
:<math>p(D|S(D)) = p(D|S(D),\theta)</math>
i.e. given the sufficient statistic, the parameter θ is irrelevant for the conditional distribution of the data <ref name="Didelot" />. Sufficient summary statistics can be used to replace the acceptance criterion in Eq. (?), so that <math>\theta</math> is accepted if
:<math>\rho(S(\hat{D}),S(D))<\epsilon</math>
As we elaborate below, it is typically impossible, outside of the exponential families, to identify a set of sufficient statistics. Nevertheless, informative, but possibly non-sufficient, summary statistics are often used in applications approached with ABC methods.
===Model Comparison with Bayes Factors===
ABC can also be instrumental for the evaluation of the plausibility of two models <math>M_1</math> and <math>M_2</math> for which the likelihoods are intractable. A useful approach to compare the models is to compute the Bayes factor, which is defined as the ratio of the model evidences given the data <math>D</math>
:<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)} = \frac{\int_{\Theta_1}p(D|\theta,M_1)p(\theta|M_1)d\theta}{ \int_{\Theta_2}p(D|\theta,M_2)p(\theta|M_2)d\theta} </math>
where <math>\theta_1</math> and <math>\theta_2</math> are the parameter spaces of <math>M_1</math> and <math>M_2</math>, respectively. Note that we need to marginalize over the uncertain parameters through integration to compute <math>B_{1,2}</math> in Eq. ?. The posterior ratio (which can be thought of as the support in favor of one model) of <math>M_1</math> compared to <math>M_2</math> given the data is related to the Bayes factor by
:<math>\frac{p(M_1 |D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}</math>
Note that if <math>p(M_1)=p(M_2)</math>, the posterior ratio equals the Bayes factor.
A table for interpretation of the strength in values of the Bayes factor was originally published in <ref name="Jeffreys" /> (see also <ref name="Kass" />), and has been used in a number of studies <ref name="Didelot" /><ref name="Vyshemirsky" />. However, the conclusions of model comparison based on Bayes factors should be considered with sober caution, and we will later discuss some important ABC related concerns.
==Quality Controls==
Quality control is an important part of ABC based inference for assessing the validity and robustness of results and conclusions drawn from them. A number of heuristic approaches to quality control of ABC are listed in <ref name="Bertorelle" />, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims to assess whether or not the inference yields valid results, regardless of the observational data. For instance, models for fixed parameter sets, which are typically drawn from the prior or posterior distributions, are simulated to generate a large number of artificial pseudo-observed data sets (PODS). In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC method recovers the true model mechanisms and parameter values.
Another class of methods assesses whether the inference was successful in light of the given observational data. For example, comparison of the posterior predictive distribution of summary statistics to the summary statistics observed was suggested in <ref name="Bertorelle" />. Beyond that, cross-validation techniques <ref name="Arlot" /> and predictive checks <ref name="Dawid" /><ref name="Vehtari" /> represent a promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observational data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization.
Interestingly, fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observational data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as used in the acceptance criterion in Eq. ?. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously <ref name="Ratmann" />, and model inconsistency is detected from conflicting and codependent summaries. Another quality-control based method for model selection employs ABC to approximate the effective number of model parameters and the deviances of the posterior predictive distributions of summaries and parameters <ref name="Francois" />. The deviance information criterion is then used as measure of model fit. Interestingly, it was also shown that the models preferred based on this criterion can conflict with those supported by Bayes factors. For this reason it is useful to combine different methods for model selection to obtain correct conclusions.
==Example==
==Pitfalls and Controversies around ABC==
A sharp criticism has recently been directed at the ABC methods, in particular within the field of phylogeography <ref name="Templeton2008" /><ref name="Templeton2009a" /><ref name="Templeton2009b" />. However, it was pointed out that a significant portion of the criticism is not directly aimed at ABC, but more generally at methods rooted in Bayesian statistics <ref name="Beaumont2010" /><ref name="Berger" />. A large part was also shown to originate from misunderstanding of the mathematical foundations and the semantics of Bayesian statistics, the difference between a model and the underlying system, or between the ABC method and the usage thereof. However, fundamental and currently unsolved issues were exposed by the arguments as well. Concerns have lately also been raised within the ABC community <ref name="Didelot" /><ref name="Robert" />. Yet it might be difficult for many readers to differentiate ABC specific criticisms from general ones, or well-founded criticisms from misconceptions.
As all statistical methods, a number of assumptions and approximations are inherently required for the application of ABC-based methods to typical problems. For example, setting <math>\epsilon = 0</math> in Eqs. ? or ?, yields an exact result, but would typically make computations prohibitively expensive. Thus, instead, <math>\epsilon</math> is set above zero, which introduces a bias. Likewise, sufficient statistics are typically not available; instead, other summary statistics are used, which introduces an additional bias. However, much of the recent criticism has neither been specific to ABC, nor relevant for ABC based analysis. This motivates a careful investigation, and categorization, of the validity and relevance of the arguments.
===Curse-of-Dimensionality===
In principle, ABC may be used for inference problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in <ref name="Ratmann" /> and <ref name="Francois" />). However, the probability of accepting a simulation for a given tolerance with ABC typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion) <ref name="Csillery" />. This is a well-known phenomenon usually referred to as the curse-of-dimensionality <ref name="Bellman" />. In practice the tolerance may be adjusted to account for this issue, which can lead to an increased acceptance rate at the price of a less accurate posterior distribution.
Although no computational method seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids <ref name="Gerstner" />, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect <ref name="Singer" />. For certain problems it may therefore be difficult to know if the model is incorrect or if the explored region of the parameter space is inappropriate <ref name="Templeton2009a" /> (see also Section ?).
===Approximation of the Posterior===
A non-negligible <math>\epsilon</math> comes with the price that we sample from <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> is trusted to approximate the actual target distribution <math>p(\theta|D)</math>. On the other hand, a tolerance that is large is enough for every point to be accepted, results in the prior distribution. The difference between <math>p(\theta|\rho(\hat{D},D)<\epsilon)</math> and <math>p(\theta|D)</math>, as a function of <math>\epsilon)</math>, was empirically studied in <ref name="Sisson" />. Theoretical results for an upper <math>\epsilon</math> dependent bound for the error in parameter estimates have recently been reported <ref name="Dean" />. The accuracy in the posterior (defined as the expected quadratic loss) of ABC as a function of <math>\epsilon</math> is also investigated in <ref name="Fearnhead" />. However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that should be investigated in greater detail. Methods to distinguish the error of this approximation, from the errors due to model mis-specifications <ref name="Beaumont2010" />, that would make sense in the context of actual applications, would be valuable.
Finally, statistical inference with a positive tolerance in ABC was theoretically justified in <ref name="Fearnhead" /><ref name="Wilkinson" />. The idea to add noise to the observed data for a given probability density function, since ABC yields exact inference under the assumption of this noise. The asymptotic consistency for such “noisy ABC”, was established in <ref name="Dean" />, together with the asymptotic variance of the parameter estimates for a fixed tolerance. Both results provide theoretical justification for ABC based approaches.
===Choice and Sufficiency of Summary Statistics===
Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Sufficient statistics, defined in Eq. ?, are optimal for this purpose representing the maximum amount of information in the simplest possible form <ref name="Csillery" />. However, we are often referred to heuristics to identify sufficient statistics, and the sufficiency can be difficult to assess for many problems. Using non-sufficient statistics may lead to inflated posterior distributions due to the potential loss of information in the parameter estimation <ref name="Csillery" /> and this may also bias the discrimination between models.
An intuitive idea to capture most information in <math>D</math> would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics <ref name="Beaumont2010" /><ref name="Csillery" />. Instead, a better strategy consists in focusing on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand <ref name="Nunes" />.
An algorithm was proposed for identifying a representative subset of summary statistics, by iteratively assessing if an additional statistic introduces a meaningful modification of the posterior <ref name="Joyce" />. Another method was proposed in <ref name="Nunes" />, which decomposes into two principal steps. First a reference approximation of the posterior is constructed by minimizing the entropy. Sets of candidate summaries are then evaluated by comparing the posteriors computed with ABC to the reference posterior.
With both of these strategies a subset of statistics is selected from a large set of candidate statistics. On the other hand, the partial least squares regression approach uses information from all the candidate statistics, each being weighted appropriately <ref name="Wegmann" />. Recently, a method for constructing summaries in a semi-automatic manner has attained much interest <ref name="Fearnhead" />. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, is the posterior mean of the parameters, which is approximated with a pilot run of simulations.
Methods for the identification of summary statistics that also assess the influence on the approximation of the posterior would be of great interest <ref name="Marjoram" />. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models, and may also lead to incorrect model predictions. It is essential to be aware that none of the methods above assesses the choice of summaries for the purpose of model selection.
===Bayes Factor with ABC and Summary Statistics===
Recent contributions have demonstrated that the combination of summary statistics and ABC for the discrimination between models can be problematic <ref name="Didelot" /><ref name="Robert" />. If we let the Bayes factor on the summary statistic <math>S(D)</math> be denoted by <math>B_{1,2}^s</math>, the relation between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> takes the form <ref name="Didelot" />
:<math>B_{1,2}=\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} \frac{p(S(D)|M_1)}{p(S(D)|M_2)}=\frac{p(D|S(D),M_1)}{p(D|S(D),M_2)} B_{1,2}^s</math>
In Eq. ? we see that a summary statistic <math>S(D)</math> is sufficient for comparing two models <math>M_1</math> and <math>M_2</math>, if and only if,
:<math>p(D|S(D),M_1)=p(D|S(D),M_2)</math>
which results in that <math>B_{1,2}=B_{1,2}^s</math>. It is also clear from Eq. ? that there may be a huge difference between <math>B_{1,2}</math> and <math>B_{1,2}^s</math> if Eq. ? is not satisfied, which was demonstrated with a small example model in <ref name="Robert" /> (previously discussed in <ref name="Didelot" /> and in <ref name="Grelaud" />). Crucially, it was shown that sufficiency for <math>M_1</math>, <math>M_2</math>, or both does not guarantee sufficiency for ranking the models <ref name="Didelot" />. However, it was also shown that any sufficient summary statistic for a model <math>M</math> in which both <math>M_1</math> and <math>M_2</math> are nested, can also be used to rank the nested models <ref name="Didelot" />.
The computation of Bayes factors on <math>S(D)</math> is therefore meaningless for model selection purposes, unless the ratio between the Bayes factors on <math>D</math> and <math>S(D)</math> is available, or at least possible to approximate. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived <ref name="Marin" />, which may be useful.
However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC based inference in which actual data sets are compared, such as for typical systems biology applications (e.g., see <ref name="Toni" />) circumvents this problem. It is even doubtful if the issue is truly ABC specific since importance sampling techniques suffer from the same problem <ref name="Robert" />.
===Indispensable Quality Controls===
As the above makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot be based on general rules, but the effect of these choices should be evaluated and tested in each study <ref name="Bertorelle" />. Thus, quality controls are achievable and indeed performed in many ABC based works, but for certain problems the assessment of the impact of the method-related parameters can unfortunately be an overwhelming task. However, the rapidly increasing use of ABC should render a more thorough understanding of the limitations and applicability of the method.
===More General Criticisms===
We now turn to criticisms that we consider to be valid, but not specific to ABC, and instead hold for model-based methods in general. Many of these criticisms have already been well debated in the literature for a long time, but the flexibility offered by ABC to analyse very complex models makes them highly relevant.
====Small Number of Models====
Model-based methods have been criticized for not exhaustively covering the hypothesis space <ref name="Templeton2009a" />. It is true that model-based studies often revolve around a small number of models and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space.
An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options <ref name="Bertorelle" />. It could be argued that this is due to lack of commonly accepted ABC-specific procedure for model construction, such that experience and prior knowledge are used instead <ref name="Csillery" />. However, although more robust procedures for a priori model choice and formulation would be of beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain.
But if only few models—subjectively chosen and probably all wrong—can be realistically considered, what insight can we hope to derive from their analysis <ref name="Templeton2009a" />? As pointed out in <ref name="Beaumont2010" />, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context (also see Section ?).
====Prior Distribution and Parameter Ranges====
The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators” <ref name="Templeton2010" />, which is connected to classical objections of Bayesian approaches <ref name="Beaumont2010b" />.
With any computational method it is necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding a suitable (e.g., non-biased) choice of the prior distribution are available, which are based on the principle of maximum entropy <ref name="Jaynes" />.
We stress that the purpose of the analysis is to be kept in mind when choosing the priors. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, might yet yield good parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.
====Large Data Sets====
Large data sets may constitute a computational bottle-neck for model-based methods. It was for example pointed out in <ref name="Templeton2009a" /> that part of the data had to be omitted in the ABC based analysis presented in <ref name="Fagundes" />. Although a number of authors claim that large data sets are not a practical limitation <ref name="Bertorelle" /><ref name="Beaumont2010b" />, this depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computational power this issue will potentially be less important. It has been demonstrated that parallel algorithms may significantly speedup MCMC based inference in phylogenetics <ref name="Feng" />, which may be a tractable approach also for ABC based methods. It should still be kept in mind that any realistic models for complex systems are very likely to require intensive computation, irrespectively of the chosen method of inference, and that it is up to the user to select a method that is suitable for the particular application in question.
==History==
===Beginnings===
===Recent Methodological Developments===
We end this methodological overview of ABC with recent developments. Instead of sampling parameters for each simulation from the prior, it was proposed to combine the Metropolis-Hastings algorithm with ABC <ref name="Marjoram" />, which resulted in a higher acceptance rate for ABC-MCMC than plain ABC. Naturally this method inherits the general burdens of MCMC methods, such as the difficulty to ascertain convergence, correlated samples of the posterior <ref name="Sisson" />, or relatively poor parallelizability <ref name="Bertorelle" />.
Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting <ref name="Sisson" /><ref name="Beaumont2009" />. Their general idea consists in iteratively approaching the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively <ref name="DelMoral" />.
The usage of local linear weighted regression with ABC to reduce the variance of the estimator was suggested in <ref name="Beaumont2002" />. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones, and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of non-linear regression using a feed-forward neural network model <ref name="Blum2010" />. However, it was shown that the posterior distribution obtained with these approaches are not always consistent with the prior distribution, and a reformulation of the regression adjustment that respects the prior distribution was proposed in <ref name="Leuenberger2009" />.
==Outlook==
In the past the evaluation of a single hypothesis constituted the computational bottle-neck in many analyzes. With the introduction of ABC the focus has been shifted to the number of models and size of the parameter space. With a faster evaluation of the likelihood function it may be tempting to attack high-dimensional problems. However, ABC methods do not yet address the additional issues encountered in such studies. Therefore, novel appropriate methods must be developed. There are in principle four approaches available to tackle high-dimensional problems. The first is to cut the scope of the problem through model reduction, e.g., dimension reduction <ref name="Csillery" /> or modularization. A second approach is a more guided search of the parameter space, e.g., by development of new methods in the same category as ABC-MCMC and ABC-SMC. The third approach is to speed up the evaluation of individual parameters points. ABC only avoids the cost of computing the likelihood, but not the cost of model simulations required for comparison with the observational data. The fourth option is to resort to methods for polynomial interpolation on sparse grids to approximate the posterior. We finally note that these methods, or a combination thereof, can in general merely improve the situation, but not resolve the curse-of-dimensionality.
The main error sources in ABC based statistical inference that we have identified are summarized in Table 1, where we also suggest possible solutions. A key to overcome many of the obstacles is a careful implementation of ABC in concert with a sound assessment of the quality of the results, which should be external to the ABC implementation.
We conclude that ABC is an appropriate method for model-based inference, keeping in mind the limitations shared with approaches for Bayesian inference in general. Thus, there are certain tasks, for instance model selection with ABC, that are inherently difficult. Also, open problems such as the convergence properties of the ABC based algorithms, as well as methods for determining summary statistics in lack of sufficient ones, deserve more attention.
==Tables==
{| class="wikitable"
|+ Table 1: Error sources in ABC-based statistical inference
|-
! Error source
! Potential issue
! Solution
! Section
|-
| Non-zero tolerance ε
| The computed posterior distribution is biased.
| Theoretical/practical studies of the sensitivity of the posterior distribution to the tolerance. / Noisy ABC.
| ?
|-
| Non-sufficient statistics
| Inflated posterior distributions due to information loss.
| Automatic selection/semi-automatic identification of sufficient statistics. / Model validation checks (e.g., see [22]).
| ?
|-
| Small nr of models/Mis-specified models
| The investigated models are not representatative/lack predictive power.
| Careful selection of models./ Evaluation of the predictive power.
| ?
|-
| Priors and parameter ranges
| Conclusions may be sensitive to the choice of priors. / Model choice may be meaningless.
| Check sensitivity of Bayes factors to the choice of priors. / Some theoretical results regarding choice of priors are available. / Use alternative methods for model validation.
| ?
|-
| Curse-of-dimensionality
| Low acceptance rates. / Model errors cannot be distinguished from an insufficient exploration of the parameter space. / Risk of overfitting.
| Methods for model reduction if applicable. / Methods to speed up the parameter exploration. / Quality controls to detect overfitting.
| ?
|-
| Model ranking with summary statistics
| The computation of Bayes factors on summary statistics may not be related to the Bayes factors on the original data, and therefore meaningless.
| Only use summary statistics that fulfill the necessary and sufficient conditions to produce a consistent Bayesian model choice. / Use alternative methods for model validation.
| ?
|-
| Implementation
| Low protection to common assumptions the in simulations and the inference process.
| Sanity checks of results. / Standardization of software.
| ?
|}
==Acknowledgements==
We would like to thank Elias Zamora-Sillero, Sotiris Dimopoulos, Joachim M. Buhmann, and Joerg Stelling for useful discussions about various topics covered in this review. MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. This article started as assignment for the graduate course ‘Reviews in Computational Biology’ (263-5151-00L) at ETH Zurich.
==References==
<references>
<ref name="Beaumont2010">Beaumont MA (2010) Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics 41: 379-406.</ref>
<ref name="Bertorelle">Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Molecular Ecology 19: 2609-2625.</ref>
<ref name="Csillery">Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution 25: 410-418.</ref>
<ref name="Rubin">Rubin DB (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applies Statistician. The Annals of Statistics 12: pp. 1151-1172. </ref>
<ref name="Marjoram">Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.</ref>
<ref name="Sisson">Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 104: 1760-1765.</ref>
<ref name="Wegmann">Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207-1218.</ref>
<ref name="Templeton2008">Templeton AR (2008) Nested clade analysis: an extensively validated method for strong phylogeographic inference. Molecular Ecology 17: 1877-1880.</ref>
<ref name="Templeton2009a">Templeton AR (2009) Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Molecular Ecology 18: 319-331.</ref>
<ref name="Templeton2009b">Templeton AR (2009) Why does a method that fails continue to be used? The answer. Evolution 63: 807-812.</ref>
<ref name="Berger">Berger JO, Fienberg SE, Raftery AE, Robert CP (2010) Incoherent phylogeographic inference. Proceedings of the National Academy of Sciences of the United States of America 107: E157-E157.</ref>
<ref name="Didelot">Didelot X, Everitt RG, Johansen AM, Lawson DJ (2011) Likelihood-free estimation of model evidence. Bayesian Analysis 6: 49-76.</ref>
<ref name="Robert">Robert CP, Cornuet J-M, Marin J-M, Pillai NS (2011) Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A 108: 15112-15117.</ref>
<ref name="Busetto2009a">Busetto A, Buhmann J. Stable Bayesian Parameter Estimation for Biological Dynamical Systems.; 2009. IEEE Computer Society Press pp. 148-157.</ref>
<ref name="Busetto2009b">Busetto A, Ong C, Buhmann J. Optimized Expected Information Gain for Nonlinear Dynamical Systems. Int. Conf. Proc. Series; 2009. Association for Computing Machinery (ACM) pp. 97-104.</ref>
<ref name="Jeffreys">Jeffreys H (1961) Theory of probability: Clarendon Press, Oxford.</ref>
<ref name="Kass">Kass R, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.</ref>
<ref name="Vyshemirsky">Vyshemirsky V, Girolami MA (2008) Bayesian ranking of biochemical system models. Bioinformatics 24: 833-839.</ref>
<ref name="Arlot">Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistical surveys 4: 40-79.</ref>
<ref name="Dawid">Dawid A Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society Series A 1984: 278-292.</ref>
<ref name="Vehtari">Vehtari A, Lampinen J (2002) Bayesian model assessment and comparison using cross-validation predictive densities. Neural Computation 14: 2439-2468.</ref>
<ref name="Ratmann">Ratmann O, Andrieu C, Wiuf C, Richardson S (2009) Model criticism based on likelihood-free inference, with an application to protein network evolution. . Proceedings of the National Academy of Sciences of the United States of America 106: 10576-10581.</ref>
<ref name="Francois">Francois O, Laval G (2011) Deviance Information Criteria for Model Selection in Approximate Bayesian Computation. Stat Appl Genet Mol Biol 10: Article 33.</ref>
<ref name="Beaumont2009">Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983-990.</ref>
<ref name="DelMoral">Del Moral P, Doucet A, Jasra A (2011 (in press)) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing.</ref>
<ref name="Beaumont2002">Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian Computation in Population Genetics. Genetics 162: 2025-2035.</ref>
<ref name="Blum2010">Blum M, Francois O (2010) Non-linear regression models for approximate Bayesian computation. Stat Comp 20: 63-73.</ref>
<ref name="Leuenberger2009">Leuenberger C, Wegmann D (2009) Bayesian Computation and Model Selection Without Likelihoods. Genetics 184: 243-252.</ref>
<ref name="Beaumont2010b">Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Molecular Ecology 19: 436-446.</ref>
<!-- <ref name="Csillery2010">Csilléry K, Blum MGB, Gaggiotti OE, Francois O (2010) Invalid arguments against ABC: Reply to AR Templeton. Trends in Ecology & Evolution 25: 490-491.</ref> -->
<ref name="Templeton2010">Templeton AR (2010) Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107: 6376-6381.</ref>
<ref name="Fagundes">Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104: 17614-17619.</ref>
<!-- <ref name="Gelfand">Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptotics and exact calculations. J R Statist Soc B 56: 501-514.</ref> -->
<!-- <ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref> -->
<!-- <ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref> -->
<!-- <ref name="Excoffier">Excoffier L, Foll M (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332-1334.</ref> -->
<!-- <ref name="Wegmann2010">Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.</ref> -->
<!-- <ref name="Cornuet">Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713-2719.</ref> -->
<!-- <ref name="Templeton2010b">Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Molecular Ecology 19: 431-435.</ref> -->
<ref name="Jaynes">Jaynes ET (1968) Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics 4.</ref>
<ref name="Feng">Feng X, Buell DA, Rose JR, Waddellb PJ (2003) Parallel Algorithms for Bayesian Phylogenetic Inference. Journal of Parallel and Distributed Computing 63: 707-718.</ref>
<ref name="Bellman">Bellman R (1961) Adaptive Control Processes: A Guided Tour: Princeton University Press.</ref>
<ref name="Gerstner">Gerstner T, Griebel M (2003) Dimension-Adaptive Tensor-Product Quadrature. Computing 71: 65-87.</ref>
<ref name="Singer">Singer AB, Taylor JW, Barton PI, Green WH (2006) Global dynamic optimization for parameter estimation in chemical kinetics. J Phys Chem A 110: 971-976.</ref>
<ref name="Dean">Dean TA, Singh SS, Jasra A, Peters GW (2011) Parameter estimation for hidden markov models with intractable likelihoods. arXiv:11035399v1 [mathST] 28 Mar 2011.</ref>
<ref name="Fearnhead">Fearnhead P, Prangle D (2011) Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC. ArXiv:10041112v2 [statME] 13 Apr 2011.</ref>
<ref name="Wilkinson">Wilkinson RD (2009) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:08113355.</ref>
<ref name="Nunes">Nunes MA, Balding DJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol 9: Article34.</ref>
<ref name="Joyce">Joyce P, Marjoram P (2008) Approximately sufficient statistics and bayesian computation. Stat Appl Genet Mol Biol 7: Article26.</ref>
<ref name="Grelaud">Grelaud A, Marin J-M, Robert C, Rodolphe F, Tally F (2009) Likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 3: 427-442.</ref>
<ref name="Marin">Marin J-M, Pillai NS, Robert CP, Rosseau J (2011) Relevant statistics for Bayesian model choice. ArXiv:11104700v1 [mathST] 21 Oct 2011: 1-24.</ref>
<ref name="Toni">Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M (2007) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.
</ref>
<references />
657
2012-05-01T10:17:51Z
Cdessimoz
12
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in [[wp:Bayesian statistics|Bayesian statistics]]. ABC has rapidly increased in popularity over the last years and in particular for the analysis of complex problems such as those arising in [[wp:Biology|biological sciences]], in particular population [[wp:Genetics|genetics]], [[wp:ecology|ecology]], [[wp:epidemiology|epidemiology]], and [[wp:systems biology|systems biology]] (reviewed in <ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" />). By offering considerable computational speed-up compared to conventional approaches, ABC methods widen the realm of statistical inference. Although they are well-founded statistical methods, ABC methods are based on assumptions and approximations whose impact needs to be carefully assessed. Furthermore, by allowing the consideration of substantially more complex models, the wider application domain of ABC exacerbates the challenges of [[wp:Estimation Theory|parameter estimation]] and [[wp:Model Selection|model selection]].
==Approximate Bayesian Computation==
===Motivation===
When statistical uncertainty is cast into the form of a set of alternative hypotheses, Bayes’ theorem relates the conditional probability of a particular hypothesis <math>H</math> given data <math>D</math> to the probability of <math>D</math> given <math>H</math> following the rule
:<math>p(H|D) = \frac{p(D|H)p(H)}{p(D)}</math>
where <math>p(H|D)</math> denotes the posterior, <math>p(D|H)</math> the likelihood, <math>p(H)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the marginal likelihood). Assume that for a given model <math>M</math> the hypothesis space takes the form of a (possibly continuous) parameter domain <math>\Theta</math>. If we are only interested in the relative plausibility of different parameter values, the evidence constitutes a normalizing constant, which can be excluded from the analysis
:<math>p(\theta|D)\propto p(D|\theta)p(\theta),\theta\in \Theta</math>
Note that it is necessary to evaluate the likelihood <math>p(D|\theta)</math> in order to update the prior belief represented by <math>p(\theta)</math> to the corresponding posterior belief <math>p(\theta|D)</math>. However, for numerous applications it is computationally expensive, or even infeasible, to evaluate the likelihood <ref name="Busetto2009a" /><ref name="Busetto2009b" />, which motivates the use of ABC to circumvent this issue.
===The ABC Rejection Algorithm===
All ABC based methods approximate the likelihood function by simulations that are compared to the [[wp:Observational study|observational data]]. More specifically, with the ABC rejection algorithm—the most basic form of ABC—a set of parameter points is sampled from the prior distribution. Given a sampled parameter point <math>\theta</math> a data set <math>\hat{D}</math> is simulated under model <math>M</math>, and accepted with tolerance <math>\epsilon \le 0</math> if
:<math>\rho (\hat{D},D)<\epsilon</math>
where the distance measure <math>\rho(\hat{D},D)</math> gives the deviation between <math>\hat{D}</math> and <math>D</math> for a given metric (e.g., the Euclidian distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincide exactly with the data (event <math>\hat{D}=D</math>) is negligible for all but trivial applications of ABC. The outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution, and, crucially, obtained without the need of explicitly computing the likelihood function.
===Sufficient Summary Statistics===
The probability of generating a data set <math>\hat{D}</math> with a small distance to <math>D</math> typically decreases with the dimensionality of the observational data. A common approach to improve the acceptance rate of ABC algorithms is to replace <math>D</math> with a set of lower dimensional summary statistics <math>S(D)</math>, which are selected to capture the relevant information in <math>D</math>. Summary statistics are said to be sufficient for the model parameters <math>\theta</math> if all information in <math>D</math> about <math>\theta</math> is captured by <math>S(D)</math>. Formally, this corresponds to the relation
:<math>p(D|S(D)) = p(D|S(D),\theta)&l