Animal models of chemotherapy-induced peripheral neuropathy: a machine-assisted systematic review and meta-analysis

Background and aims Chemotherapy-induced peripheral neuropathy (CIPN) can be a severely disabling side-effect of commonly used cancer chemotherapeutics, requiring cessation or dose reduction, impacting on survival and quality of life. Our aim was to conduct a systematic review and meta-analysis of research using animal models of CIPN to inform robust experimental design. Methods We systematically searched 5 online databases (PubMed, Web of Science, Citation Index, Biosis Previews and Embase (September 2012) to identify publications reporting in vivo CIPN modelling. Due to the number of publications and high accrual rate of new studies, we ran an updated search November 2015, using machine-learning and text mining to identify relevant studies. All data were abstracted by two independent reviewers. For each comparison we calculated a standardised mean difference effect size then combined effects in a random effects meta- analysis. The impact of study design factors and reporting of measures to reduce the risk of bias was assessed. We ran power analysis for the most commonly reported behavioural tests. Results 341 publications were included. The majority (84%) of studies reported using male animals to model CIPN; the most commonly reported strain was Sprague Dawley rat. In modelling experiments, Vincristine was associated with the greatest increase in pain-related behaviour (−3.22 SD [−3.88; −2.56], n=152, p=0). The most commonly reported outcome measure was evoked limb withdrawal to mechanical monofilaments. Pain-related complex behaviours were rarely reported. The number of animals required to obtain 80% power with a significance level of 0.05 varied substantially across behavioural tests. Overall, studies were at moderate risk of bias, with modest reporting of measures to reduce the risk of bias. Conclusions Here we provide a comprehensive summary of the field of animal models of CIPN and inform robust experimental design by highlighting measures to increase the internal and external validity of studies using animal models of CIPN. Power calculations and other factors, such as clinical relevance, should inform the choice of outcome measure in study design.


Introduction
Certain commonly used and effective cancer chemotherapeutic agents are neurotoxic to peripheral nerves. Treatment with these agents can result in distal symmetrical sensory polyneuropathy, and resultant dose limitation or reduction. Chemotherapy-induced peripheral neuropathy (CIPN) is a disabling side-effect, known to impair daily function and diminish quality of life [32]. Commonly used chemotherapeutic agents reported to cause neurotoxic effects include platinum derivatives, taxanes [40], vinca alkaloids, epothilones and newer agents (e.g thalidomide and bortezomib) [48]. The predominant sensory phenotype in patients treated with oxaliplatin or docetaxel is sensory loss affecting both upper and lower extremities. This includes bilateral paraesthesiae, often described as numbness or tingling. Other less common sensory symptoms include burning pain, pricking pain and cold allodynia [44]. CIPN can present clinically in two distinct forms: an acute, chemotherapy dose-related, and often dose-limiting, polyneuropathy, which in many cases resolves in many patients once the chemotherapy ceases. In some patients this will persist, with other patients developing symptoms after treatment has finished; and a chronic, often painful, distal sensory neuropathy is still present in 33% of patients one year after completion of treatment [39].
The pathogenesis of CIPN is thought to involve diverse mechanisms depending on the chemotherapeutic agent used. These mechanisms include chemotoxicity in the dorsal root ganglion (DRG) and DNA within the cell body of the DRG, alterations to transmembrane receptors and channels, interference with microtubular structure, damage to mitochondria, myelin degeneration and modulation of ion channels [2]. It is proposed that such mechanisms lead to the "dying-back" axonal degeneration characteristic of many sensory polyneuropathies.
Animal models of CIPN are used to investigate the pathophysiology of CIPN, and test potential therapies [19]. Administration of the chemotherapeutic agents can lead to the development of a sensory neuropathy, and behavioural tests are used to assess the sensory symptoms such as evoked pain and locomotor activity. Behavioural tests are less commonly used to attempt to measure sensory paraesthesia observed in patients [19].
Systematic review involves systematically locating and appraising all evidence relevant to a pre-defined research question, to provide a complete and unbiased summary of available evidence, and can be an invaluable tool to help make sense of the vast scientific literature.
Our aim was to provide a systematic overview of research in the field of in vivo animal modelling of CIPN, with a focus on the reporting pain-related behavioural outcome measures. The ultimate aim was to provide useful information for preclinical researchers to guide improvements in in vivo modelling.

Methods
This review forms part of a larger review of all in vivo models of neuropathic pain and the full protocol can be found here: (www.dcn.ed.ac.uk/camarades/research.html#protocols). All methods were pre-specified in a study protocol.

Search Strategy
We originally systematically searched 5 online databases (PubMed, Web of Science, Biosis Citation Index, Biosis Previews and Embase) in September 2012 to identify publications reporting in vivo modelling of CIPN, and the use of a pain-related behavioural outcome measure was reported. The search terms used for each database are detailed in Appendix 1. Search results were limited to animal studies using search filters, adapted for use in all databases [11; 22]. Because of the large number of publications and high accrual rate of publication of new studies, we ran an updated search in November 2015, using machinelearning and text-mining. This updated search included 4 online databases (PubMed, Web of Science, Bisosis Citation Index and Embase) and used an updated animal filter [10]. Biosis Previews was no longer available.

Machine-learning and text mining
We used machine-learning to facilitate the screening of publications reporting animal models of CIPN.
The screening stage of a systematic review involves categorising records identified from the search as either 'Included' or 'Excluded' in the systematic review, and was performed by two independent reviewers. In our original search of 33,184 unique publications, the screening stage took 18 person months in total. The publications from our original search (with included/excluded decision based on initial screening) were used as a training set for machine-learning approaches: 13 classifiers were created and applied to the updated search (11,880 unique publications). We used the metrics of sensitivity (criteria set at: >95%) and specificity to evaluate the performance of machine-learning and used these measures to choose the classifier that performed best for our data set. Sensitivity was defined as the proportion of included publications that were correctly identified by the machine-learning algorithm, and specificity as the proportion of excluded publications that were correctly identified by the machine-learning algorithm (human reviewers' final decision taken as correct). A 10% random sample of the machine-learning decisions was generated using a random number generator and the unique publication identifying numbers for the updated search, were checked for inclusion/exclusion by two independent investigators. Text mining was used on the publications identified by machine-learning to identify studies reporting animal models of CIPN. Briefly, this involved searching for specific chemotherapy terms within the title and abstract of the identified publications, which were then screened for inclusion by two independent reviewers.

Inclusion and exclusion criteria
Studies that reported the use of animal models of neuropathy induced by chemotherapeutic agent administration and testing for a pain-related behavioural outcome measure, characterisation of a model or pharmacological intervention studies that report a pain-related behavioural outcome measure with a suitable control, and the number of animals per group and the mean and its variance (standard error of the mean (SEM) or standard deviation (SD)), were included in this analysis.
Studies that did not report testing for a pain-related behavioural outcome measure or that reported administration of an intervention before model induction, administration of cotreatments, transgenic studies or in vitro studies were excluded from this analysis.

Measures to reduce the risk of bias and measures of reporting
We assessed methodological quality by looking at reporting of 5 measures to reduce the risk of bias; blinded assessment of outcome, random allocation to group, allocation concealment, animal exclusions and a sample size calculation. We also assessed the reporting of statement of potential conflicts of interest and of compliance with animal welfare regulations [27; 29] .

Data abstraction
Data were abstracted to the CAMARADES Data Manager (Microsoft Access). For all included studies, we abstracted details of; publication, animal husbandry, animal species and strain, model, intervention, and other experiment details (Table 1). Data presented graphically was abstracted using digital ruler software (Universal Desktop Ruler, AVPSoft.com or Adobe ruler) to determine values. Where multiple time points were presented we abstracted the time point that showed the greatest difference between model and control groups, or the greatest difference between treatment and control groups. If the type of variance (e.g. SEM or SD) was not reported it was recorded as SEM as this provides a more conservative estimate and avoids the study being given undue weight in the metaanalysis. All data were abstracted by two independent reviewers.

Data reconciliation
Publication level data abstracted by two independent reviewers were compared and any discrepancies were checked and corrected. Outcome level data were compared and discrepancies were checked and corrected. Individual comparison effect sizes were calculated for both reviewer's data sets and those that differed by ≥10% were checked and corrected. Where individual comparisons differed by <10% we took a mean of the two effect sizes and errors (SEM/SD).

Data analysis
We assessed the impact of animal species in stratified meta-analysis. We specified a priori that if species accounted for a significant proportion of heterogeneity each species would be analysed separately. If species did not account for a significant proportion of heterogeneity then all species (e.g. mouse and rat) would be analysed together.
We analysed all behavioural outcome measures reported. Behavioural outcome measures were split into pain-related and other behavioural outcome measures, and then grouped by subtype of outcome measure (full list of behavioural outcome measures and behavioural tests can be found in Tables 2 and 3).
Pain-related outcome measures included evoked limb withdrawal to stimuli (mechanical stimuli/heat/cold/dynamic mechanical touch), evoked limb withdrawal or vocalisation to pressure stimuli, evoked tail withdrawal to stimuli (cold/heat/pressure), and complex behaviour). Other outcome measures included assessment of locomotor function, memory, reward, and attention. Behaviours were then nested by subtype for analysis. For each comparison we calculated a standardised mean difference (SMD) effect size, and combined these using a random effects meta-analysis with Restricted Maximum-Likelihood estimation of heterogeneity. When a single control group was used for multiple model or treatment groups, the impact was adjusted by dividing the number of animals in the control group by the number of treatment groups served. The Hartung and Knapp method is used to adjust test statistics and confidence intervals, which calculates the confidence intervals using: effect size + t(0.975,k-1)*standard error. The impact of study design factors (sex, species, therapeutic intervention, therapeutic intervention dose, time of administration, methods to induce the model including the chemotherapeutic agent, dose and route of administration and type of outcome measure) and study quality (as measured by reporting of measures to reduce bias and reporting measures) was assessed with stratified meta-analysis. The impact of time to assessment (defined as the interval between the first administration of chemotherapeutic agent and outcome measurement) and time to intervention administration (defined as the interval between the first administration of chemotherapeutic agent and the administration of intervention) were assessed using meta-regression. Stratified metaanalysis and meta-regression was performed using a Meta-analysis Online Platform (code available here: https://github.com/qianyingw/meta-analysis-app). A recent study from our group has indicated that using SMD estimates of effect sizes with stratified meta-analysis has a low statistical power to detect the effect of a variable of interest. This means that although we may not have sufficient power to detect an effect, we can have confidence that any significant results observed are true [47].
We applied a Bonferroni-Holmes correction as follows: study design in the modelling experiments (p<0.01), study design in the intervention experiments (p<0.007), the reporting of measures to reduce the risk of bias and of measures of reporting (p<0.007).

Power analysis
To guide sample size estimation, we performed power calculations for the six most commonly reported behavioural tests. To do this, we ranked the observed mean difference effect sizes and pooled standard deviation, and used the 20th, 50th and 80th centile of each to calculate the number of animals required in hypothetical treatment and control groups.
Calculations were based on the two sample two sided t-test, with 80% power and an alpha value of 0.05.

Publication bias
Potential publication bias was assessed by visual inspection of funnel plots for asymmetry; Duval and Tweedie's trim and fill analysis [12; 49] and Egger's regression analysis [13] for small study effects were then performed. These analyses used modelling-individual comparisons and intervention-individual comparisons, as well as both pain-related and other outcome measures.

Ranking of intervention efficacy
In a clinical systematic review of neuropathic pain [16], selected analgesic agents were ranked according efficacy, as measured by Numbers Needed to Treat (NNT) for 50 % pain relief. If preclinical studies included in this review reported use of these, or analogues of these agents, the interventions were ranked according to SMD effect size for attenuation of pain-related behaviour. The correlation between clinical and preclinical rank was assessed by Spearman's rank correlation coefficient.

Identification of Publications
Of the 33,184 unique publications using in vivo models of neuropathic pain that were identified by our original systematic search (September 2012), 181 were identified by independent reviewers as reporting CIPN, Figure 1.
Of the 11,880 unique publications identified by the updated search (November 2015), 6,108 were identified using machine-learning as reporting CIPN. In the random 10% sample of screened publications (n = 1,188), the machine-learning approach with the best fit had a screening performance of 97% sensitivity 97%, 67% specificity, and 56% precision. Of the 359 studies identified by text mining as reporting animal models of CIPN, 160 met our inclusion criteria, as determined by two independent reviewers.
A total of 341 unique publications were identified from the two searches and included in this review. The rate of new publications per year are shown in Figure 2. Details of the 341 publications included in this study are available in Appendix 2.

Outcome Measures
Across the 341 publications included, the most commonly reported pain-related outcome measure was evoked limb withdrawal to mechanical stimuli, most commonly assessed using mechanical monofilaments ( Figure 3, table 2). The most frequently reported other behavioural outcome measures were those that assessed locomotor function, with rotarod apparatus used in the majorities of cases ( Figure 4, table 3).

Interventions
A total of 307 different interventions were tested ( Figure 5). The majority of interventions (80%) were only tested by one publication, and the most commonly reported interventions were gabapentin, morphine and pregabalin, reported in 26, 23 and 12 publications, respectively.

Risk of bias
The reporting of measures to reduce the risk of bias was moderate across all included CIPN studies (n = 341): 51.3% (175) reported blinded assessment of outcome, 28.2% (96) reported randomisation to group, 17.6% (60) reported animal exclusions, 2.1% (7) reported the use of a sample size calculation, and 1.5% (5) reported allocation concealment. In terms of measures of reporting, 49.6% (168) of studies reported a conflict of interest statement and 96.8% (330), reported compliance with animal welfare regulation.
The details of methods used to implement randomisation and blinding and the methods and assumptions for sample size calculation were rarely reported: six publications reported that animals were randomly allocated to experimental groups using randomly generated number sequences and two publications reported this was done by block randomisation. One publication reported randomisation was performed by "picking animals randomly from a cage", which is known not to be a valid form of randomisation [43]. Nine publications reported that blinded outcome assessment of outcome was achieved by using a different experimenter to perform assessment, and two publications reported that a group code was used. One study reported allocation concealment was achieved using a coded system.
Methods of sample size calculation were reported by five publications: three used published or previous results from the group and two performed a pilot study to inform sample size calculations.

Modelling Experiments
Animal This systematic review identified twelve different chemotherapeutic agents used to model CIPN in animals ( Table 4). The chemotherapeutic agent used accounted for a significant proportion of the heterogeneity observed (p=0). Vincristine was associated with the greatest increase in pain-related behaviour (Figure 6b).
Sex accounted for a significant proportion of the heterogeneity (p=0). Studies that reported using both males and females reported an increase in pain-related behaviour compared to studies that did not report the sex of animals used or used only male or female animals ( Figure 6c).
The time to assessment did not account for a significant proportion of heterogeneity (p=1).
A post hoc analysis found strain of animal accounted for a significant proportion of the heterogeneity(p=0, Figure 6d). The most commonly reported were Sprague Dawley rats - Sex (p=0.70) and time to intervention administration (p=1.00) did not account for a significant proportion of the heterogeneity in modelling experiments using other behavioural outcomes.
A post hoc analysis found that strain accounted for a significant proportion of the heterogeneity (p=5.3x10 -04 , Figure 7c).

Power analysis
The number of animals required to obtain an 80% power with a significance level of 0.05 varied substantially across the behavioural tests. For each of mechanical monofilaments, Randall-Selitto paw pressure test, Electronic "von Frey", acetone test/ethylchloride spray, cold plate, and Plantar Test (Hargreave's method), we calculated the number of animals required in both model and sham groups.
When both mean difference effect sizes and pooled standard deviation were the 50% centile, the number of animals required ranged from 3 (Randall-Selitto and electronic "von reporting of blinded assessment of outcome (p=0) and animal exclusions accounted for a significant proportion of the heterogeneity (p=6x10 -08 ), with failure to report these measures associated with greater estimates of effect. Reporting of randomisation, allocation concealment and sample size calculation did not account for a significant proportion of the heterogeneity (Figure 9).
Regarding reporting measures, compliance with animal welfare regulations accounted for a significant proportion of the heterogeneity (p=9.9x10 -06 ), with failure to report this was associated with a decreased estimate of effect ( Figure 10). However, reporting of a COI statement did not account for a significant proportion of the heterogeneity (Table 5, Figure   10).
In modelling experiments reporting other behavioural outcome measures, reporting of randomisation, blinding, allocation concealment, sample size calculation, or animal exclusions did not account for a significant proportion of the heterogeneity (Figure 9 and Sex of animal did not account for a significant proportion of the heterogeneity (p=0.35).
Time to assessment accounted for a significant proportion of the heterogeneity, with a longer interval associated with greater attenuation of pain-related behaviour (p=1x10 -03 , Figure   14c). However, time of intervention administration did not account for a significant proportion of the heterogeneity, p=1.
A post hoc analysis found strain of animal accounted for a significant proportion of the heterogeneity (p=0, Figure 15). The most commonly reported were Sprague Dawley rats (n=763).

Ranking of drug efficacy
A post hoc analysis was conducted to compare the ranking of drugs common between a clinical systematic review [15] and our preclinical systematic review. In the clinical systematic review, NNTs were reported for 28 drugs, covering drug class or individual drug. Seventeen of these drugs were also reported by animal studies included in our preclinical systematic review. A Spearman's rank correlation coefficient found no correlation between clinical and preclinical rank, (r s = -0.0446, p = 0.8652, Table 7 and Figure 16). In intervention experiments using other behavioural outcome measures, strain account for a significant proportion of the heterogeneity (p=0.001, Figure 18b).

Power analysis
In intervention studies, the number of animals required to obtain 80% power with a significance level of 0.05 varied substantially across pain-related behavioural tests. For each of mechanical monofilaments, Randall-Selitto paw pressure test, Electronic "von Frey", acetone test/ethylchloride spray, cold plate and Plantar Test (Hargreave's method), we calculated the number of animals required in model and sham groups.
When both mean difference effect size and pooled standard deviation were the 50% centile, the number of animals required ranged from 4 (Randall-Selitto paw pressure test/Hargreave's thermal test) to 10 (mechanical monofilament), Figure 19. Keeping the mean difference effect size in the 50% centile and increasing pooled standard deviation to the 80% centile increased the number of animals required per group across all behavioural tests and for some tests more than others; from 9 (Hargreave's) to 104 ( mechanical monofilaments). Reducing the mean difference effect size to the 20% centile and pooled standard deviation to the 50% centile, dramatically increased the number of animals required per group, from 21 (Hargreave's) to 584 (mechanical monofilaments), Figure 19. Both reporting of compliance with animal welfare regulations (p=2.8x10 -03 ) and reporting of a conflict of interest statement (p=6.x10 -03 ) accounted for a significant proportion of the heterogeneity, (Figure 10). Failure to report this information was associated with decreased estimates of effect (Table 8).
In intervention studies using other behavioural outcome measures, reporting of randomisation, allocation concealment, animal exclusions or sample size calculation did not account for a significant proportion of the heterogeneity. Blinded assessment of outcome accounted for a significant proportion of the heterogeneity (p=0.0044), with those that did not report blinded assessment of outcome were associated with larger effect sizes (Figure 9).
Reporting of conflict of interest statement or compliance with animal welfare regulations did not account for a significant proportion of the heterogeneity ( Table 9).  Table 1).

Discussion
The results of our systematic review indicate that the publication rate for reports of years later in 2015. The high publication accrual rate is not unique to this field but is the case across clinical [7] and preclinical research; making it challenging for researchers and consumers of research to keep up-to-date with the literature in their field. This is the first systematic review of preclinical models of pain to use machine-learning and text mining, and the first to demonstrate the usefulness of these automation tools in this field.

Misalignment between animal models and the clinical population
In the clinic, the chemotherapeutic agents included in this analysis are frequently used to treat women cancer patients. However the majority of studies (285/341 (84%)) identified by this systematic review used only male animals to model CIPN, reducing the external validity of the findings from these models to the clinical population.
We must consider whether we are modelling the same condition as that observed in the clinic. Acute CIPN is estimated to affect 68.1% of patients within the first month of chemotherapy cessation, 60% at 3 months, and 30% of patients at 6 months [39]. Severe acute CIPN may require dose reduction or cessation of chemotherapy [5] however many patients symptoms improve after chemotherapy cessation. Taxane chemotherapeutic agents led to CIPN in up to 80% of exposed patients 2 years after treatment [18] and oxaliplatin chemotherapy led to peripheral neurotoxicity in 79.2% of patients at 25 months post treatment [34]). A long-term study into oxaliplatin showed treatment was associated with peripheral neurotoxicity at 6 years follow-up [25]. Therefore, the models of CIPN identified in To address the misalignment between outcome measures used to assess pain in patients in clinical trials and those commonly reported in animal models of CIPN, we have called for the development of sensory profiling for rodent models of neuropathy that reflect the clinical methods [35].
Others have shown that external validity can be increased by using multi-centre studies to create more heterogeneous study samples, an approach that may be useful in pain modelling [46] Internal validity of studies using animal models of CIPN There was moderate reporting of measures to reduce the risk of bias.
Statistical modelling and meta-analysis have demonstrated that the exclusion of animals can distort true effects, where random loss of samples decreased statistical power and biased removal dramatically increased the probability of false positive results [20]. It has been shown in other research fields that efficacy is lower in studies that report measures to reduce the risk of bias [8; 28; 36; 45].
The details of methods used to implement randomisation and blinding and the methods and assumptions for sample size calculation were rarely reported. These are important to understand the quality of these procedures, as opposed to mere reporting. If methods and assumptions were reported this would allow assessment of the quality of these procedures using a tool such as Jadad scoring [24].

Publication bias
Publication bias analysis suggested that taking the many imputed missing studies into account dramatically reduced the global effect sizes in all data sets, except the smallest data

Limitations
Conducting a systematic review is time and resource-intensive, and the rate of publication means systematic reviews rapidly become outdated. This review is limited as the most recent information included was identified in November 2015. We propose that the present systematic review form a "baseline" systematic review, which can be updated and developed into a living systematic review, i.e. a systematic review that is continually updated as new evidence becomes available [15]. As new online platforms and tools for machine-learning and automation become available, preclinical living systematic reviews become more feasible [42]. Living systematic review guideline have recently been published [14] and Cochrane have also launched pilot living systematic reviews [3; 41].
The machine-learning algorithm based on our initial screening had a high sensitivity (97%) and medium specificity (67%). High sensitivity captures most relevant literature and has a low risk of missing relevant literature. An algorithm with lower specificity is more likely to falsely identify included studies (i.e. more irrelevant studies are identified as included), compared to an algorithm with high specificity. With a specificity of 67%, our algorithm did falsely identify some irrelevant studies, which resulted in the two independent human screeners having more studies to exclude when it came to data abstraction. We believe that this balance between sensitivity and specificity was appropriate as this reduced the risk of missing relevant studies.
In our meta-analysis, we grouped together the behavioural outcome measures that measure the same underlying biology. For example, in the case of experiments that reported using the grip test; five studies reported that the test was used to measure grip strength, and one reported that the test was used to measure muscle hyperalgesia [23]. For this reason, in our analysis we grouped all grip test outcome measures together as a non-pain-related behavioural outcome measures. It is possible that the same tests or similar tests could be used and the same measurements reported as different outcomes. This is one of the challenges when analysing published data.
We only included studies where the intervention drug was administered after or at the same time as the chemotherapeutic agent. Future literature reviews may look at those drug interventions given before chemotherapeutic agents to determine if drug intervention at this time point can effectively prevent CIPN.
Unfortunately, the reporting of measures to reduce the risk of bias was moderate in the studies included in this systematic review, which limits what we can infer from the results.
We hope this review will act to highlight this issue in the field of CIPN in vivo modelling.
Systematic review of animal experiments in other research areas has revealed low reporting of these measures and negative impact of failure to report these measures , across in vivo fields as diverse as modelling of stroke, intracerebral haemorrhage, Parkinson's disease, multiple sclerosis and bone cancer pain [9; 17; 30; 36; 37; 45] This has driven change, influencing the development of reporting guidelines [26], pain modelling specific guidelines [4] and the editorial policy of Nature Publishing Group [1]. Encouragingly, after an initial review on the efficacy of Interleukin-1 Receptor antagonist in animal models of stroke highlighted low reporting of measures to reduce the risk of bias [6], a subsequent review identified increased reporting of these measures [30], increasing the validity and reliability of these results. We anticipate that there will be a similar improvement in studies reporting the use of animal models of CIPN. We propose that once studies achieve sufficient quality, it will be possible to use a GRADE type analysis or process to rate the certainty of the evidence of animal studies [21]. The measures to reduce the risk of bias that we have assessed for the reporting of are largely derived from what is known to be important in clinical trials, and the extent to these measures impact upon the findings of animal studies has yet to be fully elucidated. However, reporting of these measures allows users of research to make informed judgment about findings. Finally, it should be noted that there are likely to be other measures that are important in animal studies that we have not considered.      . Impact of study design in modelling experiments using other behavioural outcomes. The size of the squares represents the number of nested comparisons that contribute to that data point and the value N represents the number of animals that contribute to that data point. a) Outcome measure accounted for a significant proportion of the heterogeneity. b) Chemotherapeutic agent accounted for a significant proportion of the heterogeneity. c) Strain accounted for a significant proportion of the heterogeneity.     14c) Figure 14. Impact of study design in intervention experiments using pain-related behavioural outcomes. The size of the squares represents the number of nested comparisons that contribute to that data point and the value N represents the number of animals that contribute to that data point. a) Outcome measure accounted for a significant proportion of the heterogeneity. b) Chemotherapeutic agent accounted for a significant proportion of the heterogeneity. c) Time of assessment accounted for a significant proportion of the heterogeneity. Figure 15. In intervention experiments using pain-related behavioural outcomes, strain accounted for a significant proportion of the heterogeneity. The size of the squares represents the number of nested comparisons that contribute to that data point and the value N represents the number of animals that contribute to that data point. Figure 16. Rank order of clinical and preclinical drugs. A Spearman's correlation was run to assess the relationship between clinical and preclinical rank of 17 drugs. There was no correlation between clinical and preclinical rank, r s = -0.0446, p = 0.8652. Figure 17. In intervention experiments using pain-related behavioural outcomes, intervention accounted for a significant proportion of the heterogeneity. The size of the squares represents the number of nested comparisons that contribute to that data point and the value N represents the number of animals that contribute to that data point.