A Review of fMRI Simulation Studies

Simulation studies that validate statistical techniques for fMRI data are challenging due to the complexity of the data. Therefore, it is not surprising that no common data generating process is available (i.e. several models can be found to model BOLD activation and noise). Based on a literature search, a database of simulation studies was compiled. The information in this database was analysed and critically evaluated focusing on the parameters in the simulation design, the adopted model to generate fMRI data, and on how the simulation studies are reported. Our literature analysis demonstrates that many fMRI simulation studies do not report a thorough experimental design and almost consistently ignore crucial knowledge on how fMRI data are acquired. Advice is provided on how the quality of fMRI simulation studies can be improved.


Introduction
Twenty years ago, functional magnetic resonance imaging (fMRI) was established as a method to measure brain activity [1,2]. In these past twenty years, this technique has been used increasingly and has pioneered the search to map and connect the brain that has caused a world-wide collaboration of scientists from different disciplines. Engineers and physicists are intrigued by the acquisition of the fMRI data, while physicians and psychologists are challenged to adapt their behavioural experimental protocols to the scanner environment. Last but not least, the analysis of fMRI data has been, and still is, a topic of numerous discussions among statisticians. The latter are confronted with the fact that the data acquired through fMRI have no ground truth. This ground truth is needed to ensure validation of the statistical methods that are used to analyse the data and to assess statistical properties such as sensitivity, specificity, bias and robustness. Great efforts to establish this ground truth have gone into the development of mechanical models [3], while direct measuring of the neural activity with intracranial EEG (iEEG) offers another solution [4]. However, for most studies iEEG may not be feasible and simulations may be the only realistic approach to establish the ground truth of fMRI data.
NeuroImage, one of the flagship journals in the neuroimaging community, celebrated the 20th anniversary of the first fMRI publications with a special volume that consisted of 103 reviews about the early beginnings, developments in acquisition, software, processing and methodology, and prospectives for the future of fMRI [5]. Although the advances in statistical methods for fMRI data are discussed in several of these reviews, simulations an sich are not mentioned. In general, it appears that simulation studies are still not standard practice for fMRI methods validation. A possible explanation is that it can be quite challenging to simulate fMRI data. Not only is the coupling between the neural activity and the Blood Oxygenation Dependent Level (BOLD) not completely understood [6], fMRI data are also characterised by a great deal of noise coming from multiple sources [7]. Consequently, no common data generating process for fMRI data is available and the data generation in fMRI simulation studies is mostly defined ad hoc.
The goal of this review is to provide an overview of the most common data generation methods used in fMRI simulation studies. An established and accepted data generating process does not yet exist and therefore an investigation of the existing published models is called for. In particular, the validity of these data generating methods is analysed and the overall reporting and conduct of fMRI simulation studies is critically reviewed. The rest of the paper is organised as follows: In the Methods section the article selection criteria are reported that were applied to establish a database of fMRI simulation studies literature, and the focus points of the article evaluation are discussed. The Results section focuses on different aspects of the simulation studies, namely, the goals of the studies, the experimental design under investigation, the simulation parameters and the data generation models. Finally, in the Discussion, best practice recommendations are provided to increase the reliability and generalisability of fMRI simulation studies.

Article selection
Articles were selected from the Web of Science database using the following query: ''fmri AND simulation AND (statistics OR data analysis)''. By excluding articles labelled as reviews or proceedings, this search resulted in 318 hits (Result as of January, 1st 2013). All these articles were manually inspected on content and relevance. This screening resulted in excluding articles based on the following criteria: the conducted simulations were for another modality (e.g. PET, EEG, MEG, …); no time series were simulated (e.g. inference methods are often validated on simulated statistical maps); non-human fMRI was simulated; and no simulation study was conducted (e.g. papers presenting simulation software). After exclusion, the remaining 119 articles were taken into account in this analysis. Full bibliographic details of our sample can be found in the Supporting Information (Table S1). These articles were published in 39 peer-reviewed academic journals (Table 1) over a period of 16 years (Figure 1). In this sample, most simulation studies were published in NeuroImage (37), Human Brain Mapping (11), IEEE Transactions on Medical Imaging (10), Magnetic Resonance Imaging (7), IEEE Transaction on Biomedical Engineering (6) and the Journal of Magnetic Resonance Imaging (6).
During our article selection, we focused on simulation studies conducted to validate or compare analysis procedures for BOLD-fMRI data. In order to perform this validation, a data generating process results in artificial data that reflect to some degree the characteristics of real measured fMRI data. From a statistical perspective, scanning parameters that influence magnetic properties of the data (e.g. flip angle) are of less importance since they mainly have an effect on the signal-to-noise ratio. For instance, when these scanning parameters are optimised, the baseline signal might increase while the noise level decreases. The crucial aspect is to determine the components in the data that are expected to have an effect on the data analysis and model these components while generating the simulated fMRI data.

Article evaluation
In the present study, we analysed the sections describing the simulation study for the selected papers. Where necessary the appendices or supplementary materials were also included and whenever there was still missing information after screening these sections, the whole paper was searched for this information. Only the reported methodology was evaluated (i.e. no authors were contacted for more information). There might be a discrepancy between the conducted and reported simulation studies (e.g. not all details are mentioned), however, to ensure reproducible science all critical elements should be reported. It may not always be feasible to report everything in the main text, but academic journals allow for crucial content to be described in appendices or through online supplementary materials. For each study we evaluated the goal of the simulation study, the simulation parameters and the data generating process. In the case that multiple simulation studies were present in the article, this information was retrieved from the most complex case that was described. In the Results section, summarised results are presented. For a detailed results list on the individual study level, the reader is referred to Table 2.

Study goals
Simulation studies are conducted to evaluate statistical models based on a given experimental design. For each article we assessed which statistical technique was validated. Six categories of statistical models were distinguished (see Figure 2, left panel). Most simulation studies are conducted for signal decomposition models like Principal Components Analysis (PCA), Independent Component Analysis (ICA) and Wavelet analysis. This group of methods is closely followed by General Linear Model (GLM) analysis, Likelihood Ratio Tests (LRT) and t-tests. 11.8% of the simulation studies investigate properties of classification techniques using for example Support Vector Machines or cluster analysis. Methods that are less represented in our sample are connectivity analyses or preprocessing methods like motion correction and spatial smoothing. All studies that did not use any of the previous methods were gathered in a ''rest'' category. In this category are included, for example, HRF estimation methods, spatio-temporal models, bootstrapping and nonparametric techniques.

Experimental design
The methods described above are validated using a given experimental design (Table 3a). The majority of simulation studies report using a block design for the generation of the BOLD activity. When this design is not used, modelled activation is based on an event-related design or it concerns a resting-state study.

Simulation parameters
The general goal of a simulation study is to research a certain outcome (e.g. power, bias, …) under several conditions (e.g. noise level, HRF variability, …). The most common method to achieve this goal is by conducting a Monte Carlo experiment. The simulation reports in our database were evaluated on the dimensions of the simulated data, the number of replications and parameter variation.
Data dimensions. fMRI data have in essence four dimensions (i.e. coordinates in an xyz-space and time). However, the majority of articles in our sample published results for 3D data where time series are simulated for all voxels in a single slice (Table 3b), while one fifth considered full 4D fMRI data. On the other hand, many of the articles reported simulating fMRI time series only with no spatial context. In this case, mass-univariate techniques were mostly evaluated that also regard fMRI data as being multiple measurements of single time series. A very small proportion considered two-dimensional data. This was reported exclusively in an ICA validation context, where the fMRI data are organised as voxels | timepoints.
Replications. The overall majority of the selected articles considered single-subject data, while the remaining articles simulated data for multiple subjects. In these last studies, the number of subjects that was simulated corresponded typically with sample sizes reported in real fMRI studies (e.g. 4 to 20 subjects) and the data for these subjects were mostly simulated once (with a few notable exceptions, see Figure 3, right panel). For the singlesubject simulation studies, the number of repetitions was higher in the majority of the studies, while about one third of the articles reported 1 replication of the simulated data for each setting of the manipulated parameters. It should be noted that simulating 3D or 4D datasets without any spatial correlations is equal to the simulation of fMRI time series with n replications where n is the number of voxels. This was true for 22 of the 37 studies that reported using 1 replication. However, for the remaining studies conclusions are based on 1 realisation of the data. Two studies reported simulating time series just once for each setting of the simulation parameters. Parameter variation. Other possible parameters taken into account in the simulations were, for example, strength of the modelled activation, number of time points, noise level, repetition time (TR), etc. The relevance of a simulation study is highly dependant on the representativeness of these chosen parameter values. To ensure that the parameters are characteristic for fMRI data, it is recommended that a range of values is evaluated. Additionally, a justification is expected on why specific values of certain parameters are chosen. In our sample both requirements were assessed (see Table 4 for an overview). A study was classified as using varying parameters as soon as more than one value of a specific parameter was considered. Whenever a reason for choosing a specific parameter value was reported, the simulation study was evaluated as positive on the justification of the chosen parameters. About one third of the studies reported a variation in the values and gave a justification for their choices. Frequently reported variations were several noise levels and activation strengths that were taken into account. As for the choices of the values, authors mostly justified these as being realistic values in real fMRI data or being estimated from real data. However, about one third of the studies reporting variation of the parameters did not give any justification, ten percent did justify the choice of the parameter values but only used one specific value for each parameter, while one fifth of the studies in our sample did neither.

Data generation models
Of all simulation studies investigated, 84% were pure synthetic simulations while the other 16% adopted a hybrid simulation strategy. In hybrid simulations, a resting-state dataset is acquired and synthetically generated activation is added to these data. As such, knowledge of the ground truth is assured while the noise is representative for real data. However, manipulation of the noise in the simulated fMRI data is not possible and replicating the data will be a costly process. Therefore, in most simulations the fMRI data are generated completely artificially.
All synthetic simulation studies adopted an additive data generation model (e.g. [8]) in which three main components can be distinguished: (1) a baseline signal, (2) BOLD activation and (3) noise. However, half of the studies did not report using a baseline for the data, so we could assume that this is zero for these studies. For the other half, 47% used a static baseline, for example a constant when simulating time series and a template slice or volume that was repeated for each time point in the case of simulating 3D or 4D fMRI data. A minority of the studies (3%) used a varying baseline, meaning that the baseline values were varied over time, e.g. to model thermal shifts [9].
BOLD response. An important component in the simulated fMRI data is the BOLD response because this signal defines the ground truth in the simulation studies. Despite the fact that the coupling between the neural activation and the BOLD response is still not completely understood [6], several models are available to generate a haemodynamic response function (HRF). See Figure 4 (left) for an overview of the models used in the selected articles. Those methods are, for example, a gamma function [10,11], a difference of two gamma functions, also known as the canonical HRF [12,13] or the Balloon model [14,15]. The different shapes of these models are illustrated in Figure 4 (right). Nevertheless, one third of the reported simulation studies disregarded any BOLD characteristics and chose a square wave (i.e. a boxcar function) to represent the BOLD activation in the simulated fMRI data. When no experimental task was simulated, resting-state activation was predominantly modelled as a set of sinusoidal functions, although a few of the selected studies did not simulate any BOLD activation. The shape of the HRF varies immensely from brain region to brain region and also from subject to subject. One fifth of the simulation studies reported modelling this variation in the HRF parameters, while the majority considered a fixed HRF in all simulations (Table 4b).
Noise model. Noise is not only characteristic for fMRI data but also ensures generalisability of the conclusions based on simulations. All simulation studies incorporated some noise generating process (see Figure 5, left panel, for an overview). The vast majority of the synthetic simulation studies (i.e. 75%) selected the noise randomly from a Gaussian distribution. An additional 9% added also some drift function to this noise, while about 7% of the studies considered a skewed noise distribution (e.g. Rician or super Gaussian distribution). The remainder of the studies used a very specific noise model (for example by adding physiological noise, using a uniform distribution or adding motion correlated noise), because they focused on the effects of these noise sources. fMRI noise is also known to be spatially and temporally correlated. However, the majority of the selected articles did not report modelling any correlations in the noise (Table 3c). Temporal correlation was almost exclusively modelled as an auto-regressive autocorrelation process. Typically this process was of order 1, but there are exceptions that used a model order of 3 or 4. Spatial correlations were typically created by spatial smoothing of the generated noise. A small fraction of the studies modeled both spatial and temporal correlations.

Discussion
Whenever statistical models are validated based on simulations, the model that is used for the data generation is of utmost importance. In this paper, a survey was conducted to list currently used data generation models. Based on 119 research articles we described the simulation type, use and justification of simulation parameters and the different components in the fMRI data generating process. The survey results showed that current fMRI simulation studies sometimes lack a thorough experimental manipulation. The parameters in the simulation study (e.g. noise level, TR, HRF delay, etc.) are not always varied, while representative values of some of these parameters are not known. Using the results from iEEG could guide many of these parameter choices and make simulation more realistic in general. Further, the  Table 2. Detailed results for the analysis of the fMRI simulation database, the ID numbers refer to the references in Table S1.        number of replications is a major topic of concern. We observed that the conclusions of some of the simulation studies were based on only one replication of the random data generating process. When a simulation study is used to evaluate the expectation of random variables, the external validity of the study is threatened if only a few replications of the data generating process are used.

Model-based versus data-based simulation
In about 60% of the synthetic simulation studies, the fMRI data were generated based on a model similar to the model being validated (e.g. generating time series from a VAR model to evaluate Granger causality). As such, the simulation is entirely model-based and the assumptions of the model under investigation are completely met. Consequently, the conclusions of these simulation studies give only partial information on the applicability of these models as an analysis tool for fMRI data, since fMRI data generally do not meet the assumptions of most statistical models. A better practice would be to start from the data themselves and to define a data generating process that models the different sources that are present in fMRI data. By using data-based simulations, the properties of the analysis techniques can be assessed in more realistic circumstances.
In this context, it should be noted that the data generating process used in most current simulation studies is not compatible with the knowledge on how fMRI data are constructed. For instance, it is well-known that the BOLD response is the result of a haemodynamic coupling to neural activity. Although the precise dynamics are perhaps still debatable, there is consensus about the BOLD signal being a delayed response with varying dynamics over the brain regions and between subjects. However, about one third of the reported simulation studies in our database did not model any of these characteristics and used a simple boxcar function to distinguish stimulus induced activation from rest ( Figure 4). About the same number did model the slow emergence of the BOLD signal by using a canonical HRF, but only a small fraction (i.e. two studies) did also model BOLD nonlinearities by means of the Balloon model. In the case of spontaneous neural activation (for example in resting-state studies), BOLD fluctuations were mostly modelled through sinusoidal functions with frequencies that are commonly observed in resting-state studies. However, describing these spontaneous fluctuations by sinusoids stems from the tradition to use ICA to analyse these data and is again more compatible with the model under investigation than being representative for the data. Further, variability of the BOLD response was taken into account only in about one fifth of the simulation studies (Table 4b). With regard to modelling BOLD activation, in a data-based simulation context at least some form of HRF should be used that takes into account the basic characteristics of the BOLD signal, while any variation of the parameters of this model will enhance the generalisability of the simulation results.    The generation of fMRI noise alsocauses a discrepancy between simulated and real fMRI data. The noise in fMRI consists of several sources [7,16], for example thermal noise, motion related noise, physiological noise and task-related noise. Nevertheless, the vast majority of simulation studies investigated here have only used a white Gaussian noise model to generate fMRI noise ignoring its multiple-source character. In some cases, spatial or temporal correlations are added. Again, this noise model is consistent with many of the statistical models for fMRI data (e.g. GLM). Unfortunately, the Gaussian noise model only accounts for a fraction of the noise in real data. One solution is to use hybrid simulations in which using real noise acquired in a resting-state study increases the realistic character of the simulated data. However, it is impossible to manipulate noise related parameters and unwanted activation in resting-state data can influence the simulation results. Moreover, multiple replications (i.e. acquiring resting-state data from multiple subjects) are costly. Perhaps the better solution is to model more than only Gaussian noise (i.e. thermal noise) and also include, as has been demonstrated in several simulation studies, motion noise, physiological noise, signal drift, etc. In some simulation studies, the results will not be altered under a full noise model. It may not always be necessary to include all noise sources (e.g. if a certain noise source is removed or the influence of a source is assumed to be equal in all conditions), but this should be motivated at least. To assure generalisability of the simulation results, a more complex noise model, compared to the one that is generally adopted now, might be imperative.

Recommendations for simulating fMRI data
Based on these results we present some recommendations to improve the reliability and generalisability of fMRI simulation studies.

All parameters for which a value is chosen in the simulation
experiments should be thoroughly justified. If a single value is not agreed upon, a range of values should be evaluated (see [8,[17][18][19] for some examples). 2. The conditions in the simulation study, (e.g. statistical model, parameter values,…), have to be combined in an experimental design. The construction of this experimental design is essential [20]. Factors that can be considered in the experiment are, for example, variations of parameter levels, analysis methods and number of replications. The most complete design is the fullfactorial design, although there might be reasons to adopt fractional designs. Based on the experimental design, the simulation experiment will have external validity (i.e. the results can be generalised beyond a given experiment). 3. A Monte Carlo experiment has to be repeated to exclude random influences on the simulation results. Therefore, a sufficient number of replications of the experiment has to be performed. In the case of time series simulations, at least 10000 replications might be necessary, while for the simulation of 3D or 4D fMRI data a total of 100 might be enough. In general, the more replications, the better. For example, [19] generated 10000 replications of 3D datasets, and [17] simulated 4D multi-subject datasets to represent twin data using 500 replications of each paired dataset. In practice, this number can be limited due to time or computational constraints. When in doubt, the convergence of the results should be tested. 4. The simulated task-related activation signal should reflect known properties of the BOLD response. This includes, but is not limited to, response delay, nonlinearities and inter-region and -subject variability. Either the canonical HRF or the Balloon model can be used (see [21] for an example using the Balloon model). 5. fMRI noise is partially white (i.e. system noise) and this part can be modelled by random Gaussian noise. However, in  addition one should account for (residual) motions, heart rate and respiratory rate fluctuations, task-related noise and spatial and temporal correlations (see, for example, [8,22,23]). 6. If either the BOLD model or the noise model is simplified, this should be duly motivated.

Conclusion
The use of simulation studies to validate statistical techniques for fMRI data should be highly encouraged, because simulation experiments are a fast and cheap tool to assess the quality and applicability of the analysis techniques. However, our survey of the fMRI simulation literature raised several concerns with respect to simulation studies as they are conducted now. The observed decrease in the number of fMRI simulation studies in recent years is troubling. Furthermore, it was demonstrated that the data generating process used to simulate fMRI data is often modelbased and parameter variation in the data generating process is not implemented in a standard manner.
A possible reason for the absence of a common fMRI data generation model might be the lack of established software packages. Current simulation studies are mainly conducted using in-house software routines that have no common programming language and are not widely available. Recently, developments to fill this gap have resulted in the release of software packages that provide a flexible and fast framework for fMRI simulations [24,25]. Using these software packages can be an important step in the right direction. Additionally, by taking into account the different sources present in fMRI data and adopting a complete simulation design with sufficient replications, conclusions from fMRI simulation studies can be expected to be more reliable.
Researchers that conduct fMRI simulation studies are encouraged to consider the recommendations presented in this paper in order to increase the reliability and generalisability of the conclusions from simulation studies.

Supporting Information
Table S1 Bibliographic details of the selected articles. (PDF)