Stem Cell Transplantation in Traumatic Spinal Cord Injury: A Systematic Review and Meta-Analysis of Animal Studies

A systematic analysis of the literature shows that stem cell implantation can improve function in animal models of spinal cord injury, depending on the methods used.


Introduction
Stem cells, from which all tissues can be generated, offer the potential to reconstitute tissues damaged by injury and disease. However, realising this potential will demand a detailed knowledge of the genetic and internal environmental cues that specify a cell's type, location, and interaction with its neighbours. It will also require a thorough understanding of stem cell behaviour in the context of lesioned or damaged tissues.
Stem cell transplantation was pioneered in the 1950s using haematopoietic stem cells to repopulate the bone marrow in patients with cancers of the blood and bone marrow [1]. Such is the success of this approach that an estimated 50,000 of these transplants are performed each year [2]. As understanding of stem cell biology has increased, so too has the ambition for restoring more complex tissues. In animal models, hepatocytes derived from stem cells can be engrafted into the damaged liver [3], and lineagespecific stem cells can repair damaged cornea [4,5]. Recent studies also demonstrate the generation of artificial tissues with key features of complex solid organs including blood vessels [6], heart [7][8][9], lung [10], and kidney [11]. Even in the CNS, where the breadth of cell types and the complexity of their interactions are maximal, stem cell implants appear able to integrate into the existing circuitry [12][13][14]. In patients, lineage-specific stem cells have been reported to show efficacy in the regeneration of craniofacial bones [15] and of damaged cornea [5].
Integration into the host environment and tissue reconstruction are not the only potentially relevant biological effects of stem cells.
There is now considerable preclinical literature on the possible benefits of stem-cell-based therapies following traumatic spinal cord injury. Stem cells may assist recovery through limitation of secondary injury, re-myelination, formation of new neuronal connections, and alteration of the inhibitory environment. However, it is unclear which type of cells and from what source are best to implant, how many are needed, whether immunosuppression should be used, and whether the implanted cells need to be modified to enhance particular desirable characteristics. It is also unclear whether the magnitude of integrative and protective effects is large enough to be potentially clinically meaningful. We also do not know whether reports of efficacy in animal models are potentially biased in favour of positive results.
Here, we report a systematic review, meta-analysis, and metaregression of data from controlled in vivo studies testing the efficacy of stem cells as a treatment in animal models of spinal cord injury. Our objectives are (i) to establish a summary estimate of the efficacy of stem cells in animal models of traumatic spinal cord injury, (ii) to ascertain the conditions under which animal experiments demonstrate greatest efficacy, and (iii) to determine any effect of study quality on reported efficacy.

Study Characteristics
Electronic searching identified 156 full publications that met our prespecified inclusion criteria (Table S1). Forty-five different stem cell types had been investigated, from which over a third were derived from adult rats. The duration of experiments following the induction of SCI ranged from 7 d to 6 mo.
One publication [32] with two individual comparisons involving 36 animals reported the effect of autologous bone marrow stromal cells on motor score. We included this publication in the overall assessment of the prevalence of the reporting of measures taken by the original authors to reduce the risk of bias in their experiments. However, because this was the only paper to report the effects of autologous (rather than allogeneic) stem cells, we did not analyse this further, focussing instead on allogeneic stem cells.
One hundred and fifty-five publications reported the effect of allogeneic stem cells in 317 individual comparisons; 380 different motor outcomes were reported and because more than one motor outcome was reported for some individual comparisons we nested (see Methods) these into 312 individual comparisons involving 5,628 animals ( Figure 1A). Six different tests were used to assess motor score: the Basso, Beattie and Bresnehan locomotor rating scale (BBB; [33]), the Basso mouse scale (BMS; [34]), the Tarlov scale [35], the forelimb placing test [36], the staircase test [37], and the mouse hind limb motor score [38]. Sixty-one sensory outcomes were reported; we excluded six outcomes that tested sensation in unaffected limbs. In 10 outcomes that used the same test at different intensities in the same cohort of animals, we only included the median intensity. Therefore, we report data on sensory outcome reported in 45 experiments nested into 24 comparisons using 473 animals ( Figure 1B). In 18 cohorts both motor and sensory outcomes were reported.

Risk of Bias
We describe the reporting of study quality checklist items reported for each included publication in Table S2. All studies included in this analysis came from peer-reviewed papers; while we identified a number of potentially relevant abstracts, none of these reported data in sufficient detail to be included. One hundred and eleven of 156 publications (71%) reported compliance with animal welfare regulations, and 25 (16%) reported whether or not a conflict of interest existed.
Allocation concealment was reported in 14 of 156 publications (9%). Random allocation to treatment group (72, 46%) and blinded assessment of outcome (72, 46%) were reported more frequently in these publications than in the modelling of other neurological disorders [39][40][41][42], but the reporting of a sample size calculation (less than 1%) was consistent with the proportions observed elsewhere (Table 1). No publication reported all four of these measures to minimise bias.
There were only sufficient data to assess publication bias in studies using allogeneic stem cells where outcome was measured as a motor score. Small study bias was suggested with asymmetry of the funnel plot ( Figure 2A) and Egger regression ( Figure 2B) but not by Trim and Fill.

Meta-Analysis
As expected, our search identified a diverse range of experiments. There was substantial between-study heterogeneity for studies using allogeneic stem cells both where outcome was measured as a motor score [heterogeneity (x 2 ) = 9,735, 311 degrees of freedom (df), p,10 299 ; effect size, 27

Author Summary
Spinal cord injury is an important cause of disability in young adults, and stem cells have been proposed as a possible treatment. Here we systematically assess the evidence in the scientific literature for the effectiveness of stem-cell-based therapies in animal models of spinal cord injury. More studies reported effects on the ability to move (''motor outcomes'') than on sensation (''sensory outcomes''). Overall, treatment improves both sensory and motor outcomes, and for sensory outcome there was a dose-response effect (which suggests an underlying biological basis). Although more measures were taken to reduce the risk of bias than in other areas of translational neuroscience, unblinded studies tended to overstate the effectiveness of the treatment. The variability observed between the studies is not explained by differences in the stem cells used, but does seem to depend on the different injury models used to emulate human spinal cord injury. This suggests that the mechanism of injury should be an important consideration in the design of future clinical trials. Furthermore, open questions arise about the use of immunosuppressive drugs, and efficacy in female animals; these should be addressed before proceeding to clinical trial.  for a significant proportion of the between-study heterogeneity in studies reporting a change in motor score ( Table 2). More influence was apparent for factors related to the lesion model than those related to stem cell biology. There was no detectable effect of stem cell dose, derivation (adult or embryonic), manipulation in culture (genetic, growth factor, antibiotic), number of passages in culture, method of stem cell selection prior to implantation, route of administration, frequency of administration, the presence or absence of a supporting scaffold, time of assessment, anaesthetic used, or temperature regulation during surgery. The neurobehavioural test used ( Figure 3A) accounted for most of the observed heterogeneity (adjusted R 2 = 12.2%, p,0.00001). Seventy percent of the data (228 comparisons, 4,042 animals) was obtained using the BBB locomotor rating scale and suggested an improvement in outcome of 26.7% (95% CI, 23.9-29.4). Other tests contributed at most 3.5% of the data; the BMS (10 comparisons, 196 animals) gave results similar to those observed using the BBB scale (24.5%, 11.2-37.7), while the Tarlov (9 comparisons, 200 animals) and forelimb placing tests (5 comparisons, 76 animals) suggested larger effects (73.1%, 57.5-88.7 and 47.9%, 18.8-77.1, respectively). The staircase (1 comparison, 12 animals) and mouse hind limb motor score (3 comparisons, 49 animals) tests reported no significant overall effects. Where multiple tests were used (in 20% of animals) the detected effect size was not different to when BBB or BMS were used alone.
Location of injury ( Figure 3B) accounted for 10.6% (adjusted R 2 , p,0.00001) of the observed heterogeneity, with larger improvements detected with the most caudal (low thoracic and lumbar) spinal cord lesions compared with other locations.
The approach used to induce injury had a smaller but significant effect (adjusted R 2 = 3.4%, p,0.01, Figure 3E Efficacy was highest with treatment strategies using cell lines (7 comparisons, 131 animals) rather than primary cells, and amongst primary cells those derived from mice were the least effective ( Figure 3F, adjusted R 2 = 4.3%, p,0.005).
Motor score subanalyses. A large proportion of the data (115 comparisons, 2,165 animals) were obtained from rats implanted with allogeneic stem cells, after injury created with an impactor, at the midthoracic level and assessed by the BBB test, where the sex of the animal was explicitly stated. This large and experimentally homogeneous subset of the data was analysed separately to establish whether a clearer picture of the key determinants of stem cell biology and implantation emerged.
Heterogeneity was reduced from 9,735 (x 2 ) over 312 individual comparisons to 1,420 over 115 comparisons, confirming the validity of this approach. As in the full analysis, stem cell dose, number of passages during culture, the presence of additional antibiotics or growth factors in the culture medium, selection methodology, the use of adult or embryonic stem cells and the species of origin, route of administration, presence of a supporting scaffold, and prior differentiation or transfection of the stem cells had no significant effect.
In this subpopulation of comparisons (Table 3) the anaesthetic used accounted for a high proportion of the heterogeneity (adjusted R 2 = 16.3%, p,0.001). Isoflurane was infrequently used (3 comparisons, 47 animals) and was associated with the largest improvement in outcome. Of the most commonly used anaesthetics, chloral hydrate [21 comparisons, 417 animals, 33.0% (16.0-50.1)] was associated with the largest effect size ( Figure 4A). The interval from lesioning to outcome assessment accounted for 11.0% of the heterogeneity such that absolute effect size fell by 1.7% for every additional week of delay to outcome assessment. The presence of immunosuppression also accounted for a large proportion of the heterogeneity in this constrained dataset (adjusted R 2 = 10.4%, p,0.01); both cyclosporine A and FK506 substantially reduced the benefit derived from stem cells ( Figure 4B). BBB scores were lower in experiments where other tests had also been reported [22 comparisons, 473 Figure 4C, adjusted R 2 = 5.0%, p,0.02]. There was no impact of whether stem cells were given once, at multiple times, or by continuous infusion; the sex of the animals; or the reporting of randomisation, allocation concealment, or blinded assessment of outcome. A second subanalysis of the motor dataset was performed to examine whether restriction of the analysis to higher quality studies appreciably altered the results. This analysis was hampered by the paucity of truly high-quality data. None of the contributing papers reported each of four key measures of internal validity (randomisation, blinded assessment of outcome, allocation concealment, and sample size calculation), and only 20 individual comparisons came from papers describing three of the four. As a compromise we analysed the 25% of the motor dataset that reported having both randomisation and blinding.
Restricting the analysis in this way reduced the number of animals assessed from 5,628 to 1,466 and heterogeneity fell from 9,735 to 945 (x 2 ). Despite this, the key features of both the full and the subanalysis are the same. The characteristics of the animal model still have more impact than the type of cells implanted (Tables 2 and 4).
Immunosuppression no longer has an effect on heterogeneity and the effect size in animals immunosuppressed with cyclosporine-A [mean, 24.3; 95% CI, 13.2-35.3] is the same as in animals where immune suppression is not used (mean, 24.9; 95% CI, 18.3-31.6). Allocation concealment emerges as significant, though not in the expected direction. Also the type of cell culture medium and type of cell manipulation prior to implantation also begin to have an impact, but it should be noted that in both cases it is the experiments where the precise conditions are ''unknown'' that report the greatest effect. In the subanalysis, the mean number of cells implanted is substantially lower than in the full analysis (6.3610 5 versus 7.4610 8 ), and a dose-response relationship is evident.
Sensory score in experiments using allogeneic stem cells. While motor behaviour was relatively unaffected by most factors specific to stem cell biology, the reverse was true for studies reporting a change in sensory outcome (Table 5).
Of the five study characteristics accounting for a significant proportion of the between-study heterogeneity, the type of manipulation in culture had the largest effect (adjusted R 2 = 61.3%, p,0.005). Prior differentiation was associated with larger effect sizes, while transfection was associated with smaller effects ( Figure 5A). The number of cells administered had a clear dose-response effect (adjusted R 2 = 31.7%, p,0.02; Figure 5B). Studies that delivered cells intravenously were associated with significantly larger effects than studies transplanting the cells directly into the lesion area of the spinal cord (adjusted R 2 = 19.2%, p,0.05) ( Figure 5C).
As with the motor score subanalysis, the anaesthetic agent had a large effect (adjusted R 2 = 42.8%, p,0.05). The use of isoflurane to induce anaesthesia in three individual comparisons was associated with substantial additional benefit compared to other methods of anaesthesia ( Figure 5D). All studies assessed sensory outcome in either all male or all female cohorts, with studies using female animals appearing to offer no benefit ( Figure 5E; adjusted R 2 = 21.5%, p,0.05).

Discussion
Systematic review and meta-analysis have helped identify biases within clinical trials [49], providing an impetus to improve standards [50]. This approach offers similar benefits for animal studies [28,41,51] by describing the impact of biological and experimental factors on reported efficacy in a systematic and transparent summary of all available data. This allows judgement of the extent to which conclusions are at risk of bias [52]. In this study we apply these techniques to provide a detailed systematic analysis of the animal literature describing stem-cell-based therapies in spinal cord injury.
Overall, treatment with allogeneic stem cells improves both motor and sensory outcome after spinal cord injury by around 25%, but with important differences between the two datasets. Because of the amount of data, conclusions relating to motor outcome (5,628 animals) are probably more robust than those relating to sensory outcomes (473 animals). For both outcomes there was a broad range of experimental approaches, reflected in the high levels of heterogeneity seen. This is typical for systematic reviews in animal studies and validates our choice of a random effects model, and our summary estimates should be considered as the average efficacy rather than the best estimate of a single ''true'' efficacy. Interestingly, improvement in sensory outcome seems to be sensitive to differences in factors relating to treatment (i.e., stem cell biology), while motor outcome appears to be more sensitive to factors relating to the lesion and the outcome measure used, and to be less dependent on the biological features of the stem cells used.
Evidence supporting a dose-response relationship for sensory outcome suggests the presence of a biologically plausible effect. We observed that prior differentiation of the implanted cells was associated with larger effects. Where the influence of cell differentiation was formally studied, a relationship with outcome was observed [53]. This suggests that optimal efficacy might be seen when cells have some lineage specificity but before final cell type commitment has occurred. For sensory outcome, studies where cells were delivered intravenously, rather than directly into the injured spinal cord, were associated with significantly larger effects. This suggests either that systemic changes may mediate the effects of stem cells or that local implantation may create additional injury that masks the benefit provided by stem cells. We did not see a dose-response relationship for motor outcomes, even where we limited our analysis to a more homogenous subset of experiments. It may be that there is no dose-response effect or that the doses used in these experiments were all large enough to generate maximal responses. Where dose response was formally studied the authors found increasing benefit from doses as low as 10,000 implanted cells [54], and the median number of implanted cells in comparisons reporting motor outcomes was 250,000.
Immunosuppression with cyclosporine A was associated with increased efficacy in a systematic review of stem cells in focal cerebral ischaemia [28], and it is therefore interesting that in spinal cord injury both cyclosporine A and FK506 are associated with reduced efficacy. This suggests that any beneficial effect of immunosupressants in promoting the survival of transplanted cells is outweighed by other factors, such as effects on stem cell biology or intrinsic repair mechanisms. Unfortunately, because of the univariate nature of our analyses we are unable to determine a ''benefit-risk ratio'' for the use of immunosuppression. However, there are studies that indicate that bone-marrow-derived stem cells are able to produce compartmentalised inflammatory lesions [55,56]. The mechanisms behind this observation are not understood, yet there are rising concerns that unwanted inflammatory-driven side effects, such as neuropathic pain, might limit the ''usefulness'' of gained motor function.
For motor outcome, the neurobehavioural test used ( Figure 3A) accounted for most of the observed heterogeneity. The BBB locomotor rating scale was used in 70% of animals. In the more focussed analysis of rat allogeneic, midthoracic impact injury, using BBB as an outcome, studies that used other behavioural tests in addition to the BBB reported smaller effect sizes for the BBB. This may be a manifestation of outcome reporting bias; if the outcome on the BBB is smaller than expected, investigators might also report the outcome on other tests where the effect was larger; if the effect measured using the BBB was considered ''sufficient,'' there might be less motivation also to report outcomes using other measures, particularly if these were smaller than seen using the BBB.
Overall, there was no improvement in motor outcome where this was assessed using the staircase or mouse hind limb motor score tests. However, these accounted for a small proportion of the overall dataset, and so these results should be interpreted with caution. Efficacy was strongly associated with both the location of and the methodology used to create the injury. The largest effect was seen with lower thoracic and lumbar lesions and when the spinal cord was lesioned by hemisection or transsection rather than contusion or compression.
The use of isoflurane anaesthesia at SCI induction was associated with substantial improvement in sensory outcome; in the overall motor analysis, there was no effect, but in the more homogenous restricted analysis, isoflurane was again associated with substantially larger effects. Again, this contrasts with findings in focal cerebral ischaemia and suggests that, despite interest in a general paradigm of ''neuroprotection,'' these conditions are in certain respects biologically very different. However, these findings are based on a small number of individual comparisons and should be interpreted with caution.
The sex of the experimental animal accounted for a large proportion of the observed heterogeneity in both the sensory and motor analyses. For the motor analyses, this seems to be the influence of abnormally high effect sizes reported in studies where either the sex of the animals used was not reported or where ''both sexes'' were used. For sensory outcome, studies using male animals led to significantly higher estimates of effect with no clear benefit detected in female animals.
Thirty percent of animals in our dataset were treated with stem cells at the time of injury. Although this may be helpful in the biological assessment of stem cell therapies, it is of limited clinical relevance. The time of administration, although important with regard to translation to a clinical setting, had no significant impact on the effects reported. This appears to be somewhat unlikely, and our findings may mask different efficacies of different stem cell approaches at different times-those with more neuroprotective characteristics perhaps being more effective when given early, and those with more influence on neuroregeneration and repair being more effective when given late.
We found that the prevalence of reporting of randomisation and blinded assessment of outcome was higher than that reported in the modelling of other neurological disorders, suggesting more rigour in the conduct of these studies [39][40][41][42]. Other markers of internal validity, such as sample size calculations, were rarely reported ( Table 1). The lack of an a priori sample size calculation increases the risk that group sizes were increased during the experiment, in light of analysis showing borderline nonsignificant results; this is an important potential source of bias. It is of course possible that some authors had taken measures to reduce bias but did not report them; this underlines the importance of reporting guidelines [57,58].
For the larger motor dataset, both publication bias ( Figure 2B) and failure to report blinding ( Figure 3H) were both associated with a significant overestimation of overall effect size; there was no apparent impact of a failure to report randomisation. In the Egger regression ( Figure 2B) removal of the two most extreme data points did not change the interpretation that publication bias was present (not shown).
Stratification of the data to determine the effect of the above facets of experimentation is desirable. However, no publication randomised, blinded assessment of outcome, concealed allocation, and performed a sample size calculation and only 20 individual comparisons came from papers describing three of the four. Therefore, we subanalysed the 25% of the motor dataset that reported having both randomised and blinded.
In this subanalysis the characteristics of the animal model still have more impact than the type of cells implanted. However, there were differences, but the reductionist approach of this subanalysis does raise the possibility that these might be false positives due to loss of power. The type of cell culture medium and type of cell manipulation prior to implantation appear to have an impact, but it should be noted that in both cases it is the experiments where the  Stem Cell-Based Therapy in Spinal Cord Injury precise conditions are ''unknown'' that report the greatest effect.
There is no obvious biological explanation for this. It may be that a failure to report such details is a surrogate indication that such work is generally of lower quality, and therefore at greater risk of bias.
Immunosuppression is no longer identified as accounting for a significant proportion of the heterogeneity. However, the effect size in cyclosporine-A-treated animals (mean, 24.3; 95% CI, 13.2-35.3) is the same as in animals where no immune suppression was used (mean, 24.9; 95% CI, 18.3-31.6). This appears to confirm that immune suppression offers no advantage in experiments using allogeneic implants to treat SCI.
Intriguingly, in the subanalysis a dose-response relationship does emerge. As the mean number of cells implanted is 6.3610 5 rather than 7.4610 8 in the full motor dataset, this is consistent with the hypothesis that such an effect was previously masked by a ceiling effect.
Limitations of our approach. Firstly, we were only able to include data from studies in the public domain and-for motor outcome at least-there is evidence of a publication bias in favour of studies with large effect sizes. Further, we found some evidence (in the motor BBB subanalysis) consistent with selective reporting of outcomes within individual publications. The true effect sizes are therefore likely to be lower than reported here. Secondly, for both study quality and study design features, we relied on published information. Where relevant information was not available (the sex of a cohort of animals, or the taking of measures to reduce bias), we have either analysed these as not known or inferred that things that were not reported did not occur. Thirdly, we present a series of univariate analyses; multivariate metaregression or stepwise partitioning of heterogeneity might provide more robust insights, but these techniques are not well established. Similarly, for continuous variables, the meta-regressions reported here assumed a linear relationship between the independent and dependent variables, and this is likely that this represents an oversimplification, at least for some independent variables. Fourthly, we have observed the experiments of others rather than conducted experiments of our own, and this observational research should be considered as hypothesis generating only. Finally, we limited our analysis to neurobehavioural outcomes; the greater benefit seen in hemisected and transsected lesions compared with compressive of contusional injuries may have important histological correlates, and this is worthy of further exploration.
In conclusion, stem cells appear to have substantial efficacy in animal models of traumatic SCI. Effects on sensory outcome appear more dependent on facets of stem cell biology: motor outcome appears to be more dependent on features of the animal modelling and the outcome scale used.

Methods
The study protocol is available at www.camarades.info/ index_files/Protocols.html. A completed PRISMA checklist and flow diagram for this systematic literature review can be found in Text S1.

Definitions
We define a ''publication'' as a discrete piece of work (including abstracts); each publication may report data from a number of experiments. Each experiment may describe outcome in a number of different experimental cohorts, and the contrast between outcomes in a single treatment cohort with that in a control cohort we define as an ''individual comparison.'' We define ''nesting'' as combining the effect sizes from different functional outcomes measured in the same cohort of animals to give a single summary estimate of effect in that individual comparison (a nested individual comparison).

Systematic Review
Using prespecified inclusion and exclusion criteria we identified all publications reporting relevant experiments (see below) by searching (December 2011) three electronic databases (PubMed, EMBASE, and ISI Web of Science) using the search strategy ''(stem cell OR stem OR haematopoietic OR mesenchymal) AND (spinal cord injury OR hemisection OR contusion injury OR dorsal column injury OR complete transection OR corticospinal tract injury),'' with search results limited to those indexed as describing animal experiments.

Inclusion and Exclusion Criteria
Two investigators (A.A. and E.S.) independently reviewed retrieved publications. We included experiments where functional outcome in a group of animals exposed to traumatic spinal cord injury and treated with allogeneic or autologous stem cells was compared with functional outcome in a control group of animals. We excluded individual comparisons that did not report (or where we could not calculate) the number of animals, the mean outcome, or its variance in each group. We excluded experiments where interventions such as growth factors were used to mobilise endogenous stem cells or where nontraumatic models of spinal cord injury were used.

Data Extraction
From each individual comparison we extracted data for reported outcomes. This included extraction of mean and variance data from each cohort exposed to an intervention (controls and active therapy) and from sham cohorts of normal (unlesioned and untreated) animals, and by imputation where the performance of a normal animal could be imputed from the description of the scoring scale. Stem cells were characterised as ''autologous'' where cells were extracted from an animal, might be manipulated in some way, then returned to the same animal; or ''allogeneic'' where embryonic or adult cells derived from a different animal were administered to a recipient animal. Where a publication reported more than one experiment, or where an experiment reported more than one individual comparison (for instance, increasing numbers of stem cells transplanted), we considered these separately and extracted data for each, correcting the weighting of these studies in meta-analysis to reflect the number of experimental groups served by each control group. Where different functional outcomes were reported in a single cohort of animals, we combined these outcomes using fixed effects meta-analysis (nesting), to give a summary estimate of functional outcome in that cohort, described here as a comparison. Where a test involved exposing the animal to increasing intensities of the same stimulus (for instance, in allodynia testing), we used data for the median intensity. For sensory tests, only data for stimulation distal to the lesion were included. Where functional outcome was measured at different times, we extracted data for the last time point reported.
Study quality was assessed using a checklist adapted from good laboratory practice guidelines for in vivo stroke modelling [59] and the CAMARADES quality checklist [60]. The checklist comprised (i) publication in a peer-reviewed journal, (ii) statements describing control of temperature, (iii) randomisation to treatment group, (iv) allocation concealment, (v) blinded assessment of outcome, (vi) avoidance of anaesthetics with known marked intrinsic neuroprotective properties, (vii) sample size calculation, (viii) compliance with animal welfare regulations, and (ix) whether the authors declared any potential conflict of interest.

Analysis
For each individual comparison, we calculate a normalised effect size [normalised mean difference) as the percentage improvement (''+'' sign) or worsening (''2'' sign) of outcome in the treatment group using the following formula: where x x c and x x rx are the mean reported outcomes in the control and treatment group, respectively, and x x sham is the mean outcome for a normal (unlesioned and untreated) animal. In this calculation, the score achieved by the sham animals acts as the ''fixed zero value'' or baseline allowing the difference between the sham and treatment groups to be expressed as a ratio. This ratio takes into account differences in the ''direction'' of individual neurobehavioural scales.
Its corresponding standard error was calculated using: where n c refers to the number of animals in the control group and n rx refers to the number of animals in the treatment group. SD 2 cÃ and SD 2 rxÃ are the normalised standard deviations for the control and treatment group, respectively. These were calculated using the formulae: where SD c and SD rx are the reported standard deviation for the control and treatment group, respectively. We then used DerSimonian and Laird random effects weighted mean difference meta-analysis to calculate a summary estimate of effect size; results are presented as the percentage improvement in outcome and its 95% confidence intervals. The variability of the outcomes assessed is presented as the heterogeneity statistic (x 2 ) with n21 degrees of freedom.
The analysis was stratified according to (i) the approach to stem cell therapy (allogeneic, autologous, embryonic, source of cells, ex vivo manipulation), (ii) biological factors (number of cells, time and route of administration, time of assessment of outcome), (iii) aspects of study design (anaesthesia, species of animal, immunosuppression, model and severity of spinal cord injury), and (iv) elements of study quality.
The extent to which study design characteristics explained differences between studies was assessed using meta-regression with the metareg function of STATA/SE10, and the significance level was set at p,0.05. The meta-regression was univariate rather than multivariate; and we calculated adjusted R 2 values (a measure of how much residual heterogeneity is explained by the model) to explain the proportion of the observed variability in the observed effect size for a group of experiments explained by variation in the independent variable in question [61].
We sought evidence of publication bias using a funnel plot, Egger regression, and Trim and Fill [62]. A detailed description of the statistical methods used for meta-analysis and meta-regression can be found in [63].

Supporting Information
Table S1 Included studies. First author, publication year, stem cell used, species of host animal, number of animals, number of cells, time of treatment in relation to injury, anaesthetic used, type of injury, route of delivery, and outcome measure reported for studies included in the review. (DOCX)