Preclinical trials in Alzheimer’s disease: Sample size and effect size for behavioural and neuropathological outcomes in 5xFAD mice

Mahvish Faisal; Jana Aid; Bekzod Nodirov; Benjamin Lee; Miriam A. Hickey

doi:10.1371/journal.pone.0281003

Abstract

5xFAD transgenic (TG) mice are used widely in AD preclinical trials; however, data on sample sizes are largely unaddressed. We therefore performed estimates of sample sizes and effect sizes for typical behavioural and neuropathological outcome measures in TG 5xFAD mice, based upon data from single-sex (female) groups. Group-size estimates to detect normalisation of TG body weight to WT littermate levels at 5.5m of age were N = 9–15 depending upon algorithm. However, by 1 year of age, group sizes were small (N = 1 –<6), likely reflecting the large difference between genotypes at this age. To detect normalisation of TG open-field hyperactivity to WT levels at 13-14m, group sizes were also small (N = 6–8). Cued learning in the Morris water maze (MWM) was normal in Young TG mice (5m of age). Mild deficits were noted during MWM spatial learning and memory. MWM reversal learning and memory revealed greater impairment, and groups of up to 22 TG mice were estimated to detect normalisation to WT performance. In contrast, Aged TG mice (tested between 13 and 14m) failed to complete the visual learning (non-spatial) phase of MWM learning, likely due to a failure to recognise the platform as an escape. Estimates of group size to detect normalisation of this severe impairment were small (N = 6–9, depending upon algorithm). Other cognitive tests including spontaneous and forced alternation and novel-object recognition either failed to reveal deficits in TG mice or deficits were negligible. For neuropathological outcomes, plaque load, astrocytosis and microgliosis in frontal cortex and hippocampus were quantified in TG mice aged 2m, 4m and 6m. Sample-size estimates were ≤9 to detect the equivalent of a reduction in plaque load to the level of 2m-old TG mice or the equivalent of normalisation of neuroinflammation outcomes. However, for a smaller effect size of 30%, larger groups of up to 21 mice were estimated. In light of published guidelines on preclinical trial design, these data may be used to provide provisional sample sizes and optimise preclinical trials in 5xFAD TG mice.

Citation: Faisal M, Aid J, Nodirov B, Lee B, Hickey MA (2023) Preclinical trials in Alzheimer’s disease: Sample size and effect size for behavioural and neuropathological outcomes in 5xFAD mice. PLoS ONE 18(4): e0281003. https://doi.org/10.1371/journal.pone.0281003

Editor: Thomas H. Burne, University of Queensland, AUSTRALIA

Received: September 8, 2022; Accepted: January 13, 2023; Published: April 10, 2023

Copyright: © 2023 Faisal et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: Funded by Estonian Research Council under the framework of EuroNanoMed III JTC 2018 project name: “CurcumAGE”. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

By 2050, cases of dementia are expected to almost triple from 2019 levels [1, 2]. Alzheimer’s disease (AD) is the leading cause of dementia [3] and the highest-ranking cause of disability-adjusted life years of the neurodegenerative diseases [4]. Although many possible treatments for AD are in development [5, 6], drug approval rates are typically low for the nervous system [7] and a recent approval of Aducanumab [8] has proven controversial [9, 10].

Preclinical trials in animal models are critical for the development of new drugs and also validation of disease mechanisms. Recent editorials and reviews have highlighted several important issues to address in rodent preclinical trials including trial design, trial registration and transparent reporting [11–13]. Specific areas to address include e.g., blinding, randomisation, inclusion/exclusion criteria, prior design of statistical analyses and sample size, and resources such as the Experimental Design Assistant have been developed to assist [14, 15].

Sample-size estimations are required for ethical approval and enable appropriate statistical power to detect an expected treatment effect. However, many preclinical therapeutic trials in AD transgenic mice show no sample-size calculation [16]. The authors suggested this may underlie the typically small sample sizes used (N<10) [16], and indeed, „small-study effects”have been noted in AD preclinical research [17]. Similar issues with respect to group sizes and sample-size calculations have been noted in preclinical trials in rodent models of Parkinson’s disease [18] and in rodent fear-conditioning [19]. Underpowered preclinical studies were identified as the single most important factor that contributed to failure of a therapeutic in a clinical trial in human ALS patients [20], and high-impact guidelines, including ARRIVE guidelines, recommend that sample-size estimates be conducted for preclinical testing [21, 22]. Nevertheless, an expected effect size is difficult to predict and the (clinical) significance of a particular effect size is difficult to estimate [23, 24]. Here, we have focused chiefly upon normalisation to wildtype levels, the assumed largest expected effect size for a potential therapeutic, and also a smaller effect size of 30% improvement in TG outcomes where possible. This smaller effect size may be relevant for a test agent of unknown efficacy [25]. We provide effect sizes for each outcome, together with its power, and then use several different, freely available resources for our sample size estimates.

5xFAD mice are a very well-known mouse model of Alzheimer’s disease [26]. They have been used by many researchers worldwide to study basic mechanisms underlying AD pathophysiology and also in preclinical trials. Our aim was to estimate sample sizes required to detect different effect sizes using outcomes from behavioural and neuropathological assays at ages where amyloid load is well-characterised [26]. The behavioural and neuropathological assays used are well-known and characterised for mice (automated open field [27, 28]; spontaneous alternation [29]; forced alternation [30, 31], novel object recognition [32]; Morris water maze [33]; neuropathology [26, 34–36]) but their relative ability to detect cognitive deficits in 5xFAD mice are inconsistent in the field. Indeed, recent data suggests that the most robust behavioural finding in 5xFAD mice is an increase in activity, rather than a change in cognition [37].

Assessments of relative efficacy of well-used tests are becoming more common in AD preclinical research [37–39] but sample sizes remain unclear. The experiments outlined here use standard protocols to provide provisional sample sizes to detect treatment effects in 5xFAD mice, one of the most widely used mouse models of AD.

Materials and methods

Mouse husbandry

Male transgenic (TG) 5xFAD mice (034840-JAX; B6SJL-Tg(APPSwFlLon,PSEN1*M146L*L286V)6799Vas/Mmjax) and female B6SJLF1 mice were purchased from Jackson laboratories (Bar Harbor, ME). Mice were group-housed with access to ad lib food (V1534-300, ssniff Spezialdiäten GmbH) and water (reverse osmosis-treated, and UV sterilised) and lights were set to 7am on 7pm off. Pups were weaned at approximately 3 weeks of age and group-housed in same-sex, mixed-genotype, mixed-litter cages of 8–10 animals with one cage being n = 4 (n = 2 TG, N = 2 WT). Our paper adheres with ARRIVE guidelines [40]. Authorisation to perform experiments was provided by the Estonian Animal Welfare authorisation committee, licence numbers 175 and 189, according to the EU Directive 2010/63/EU.

Genotyping

Mice were genotyped for the PSEN transgene, wildtype (WT) APP gene and also the phosphodiesterase-6b retinal degeneration-1 (Pde6brd1) allele by PCR (see Table 1 for primers, obtained from Tag Copenhagen, Frederiksberg, DN) using tail samples obtained from neonates between the ages of P3 and P10. TG/mutant and wildtype controls and a water control were run in every PCR. DreamTaq PCR Master Mix (2X) (ThermoFisher Scientific) was used for PCR. Mice that were homozygous recessive for Pde6brd1 were not used. Cycling conditions for AD status: 95°C for 5 mins followed by 40 cycles of 94°C, 48°C, 72°C for 45s, 30s, 90s, respectively, followed by a final extension at 72°C for 10minutes. Cycling conditions for Pde6brd1 status: 94°C for 2mins followed by 28 cycles of 94°C for 15s, 57°C for 15s and 72°C for 10s with a final extension time of 72°C for 2 mins.

Download:

Table 1. Primer sequences.

https://doi.org/10.1371/journal.pone.0281003.t001

Cohorts tested

All behavioural analyses were conducted during the light phase of the light cycle (between 9am and 2pm). Male mice were not used because male transgenics exhibit a delayed disease progression compared with female transgenic mice [26]. Mice exhibiting stereotypy (continuously doing backflips or continuous jumping in the cage) or less than 16g were excluded from testing (N = 2 TG). Mice were handled to reduce anxiety and were weighed regularly. Behavioural testing was conducted blinded to genotype. For all behavioural testing, mice were acclimatised to the testing room for 20-30mins. For video-based automated analysis of behaviour, white, beige and grey fur was coloured using human hair spray (dark brown or black, L’Oréal).

Young cohort.

Female transgenic and wild-type mice were tested, together (Young cohort: N = 14 WT, N = 16 TG). Mice were tested for spontaneous activity in an open field (136±2d mean ± sem; all mice on sasssme day), spontaneous alternation in the spontaneous alternation (age 137±2d; all mice on same day), novel object recognition memory (140±3d; mice divided over two days), spatial memory in the Morris water maze (all mice tested, together, over a 3-week period, see protocol below, beginning at 148±2d and ending at 169±2d) and forced alternation (189±3d; testing divided over three days).

Aged cohort.

Female mice were tested, together (Aged cohort: WT N = 17, TG N = 12). Mice were tested for spontaneous activity in an open field (406±4d mean ± sem; all mice on same day), for spatial memory in the Morris water maze (all mice tested, together, over a 4-day period, see protocol below, beginning at 435±4d and ending at 438±4d) and for spontaneous alternation (442±4d; all mice on same day).

Behavioural testing

Open field.

Spontaneous activity in a novel environment was analysed using Noldus Phenotyper cages equipped with Ethovision XT V11 (Noldus, Wageningen, Netherlands). Mice were placed into the Phenotyper cages singly and their activity recorded for 1 hour. Outcome measures, generated automatically by the software, included distance travelled and speed. For N = 2 mice per genotype in the Young cohort, no data was collected for more than 30% of timebins and they were not included in analyses.

Spontaneous alternation.

Mice were placed into the centre of the maze (3 arms at a 120° angle from each other, arms 30cm long and 10cm wide; walls and floor opaque and 10cm high) and their behaviour video-recorded for 5 min [39] or 8 minutes [26, 44, 45] for subsequent analysis. Visual cues in the form of room furniture, constant position of the experimenters etc., were available to the mice. Analysis of videos was conducted blinded. An entry was defined as when the hindquarters entered an arm. The number of entries, the arm entered, triplets (e.g., ABC, CBA, ACB) and working memory (re-entries within a triplet, i.e., unsuccessful triplet) errors were quantified from videos by a blinded observer. The apparatus was cleaned with 70% ethanol and dried thoroughly, between tests. N = 1 TG mice from the Young cohort and N = 2 TG mice from the Aged cohort did not reach entry number threshold of 10 and were not included in analyses.

Forced alternation.

A T-maze, constructed from black Perspex walls and a white floor was used (stem: 30cm long, 10cm wide, 20cm high; distal arms each 30cm long, 10cm wide and 20cm high; Pleksiklaas OÜ, Tartu, EE). Animals were placed into the start (home, stem of the T) arm and allowed to explore the apparatus freely over a period of 5 minutes. One of the distal arms was blocked off during this phase (pseudorandomly assigned per mouse using Excel rand function). After 1 hour, the mouse was placed back into the start arm, and again allowed to explore freely over a period of 5 minutes (all arms now available for exploration). Visual cues in the form of room furniture, constant position of the experimenters etc., were available to the mice. Lighting was set to 20–40 lux at the centre of the maze. The apparatus was cleaned with 70% ethanol and dried thoroughly, between tests. The total distance travelled (using ezTrack [46]), preference index (entries into novel arm / total entries) and difference index (entries into novel arm minus entries into familiar arm) was quantified and compared. A threshold of 10 entries or more activity was required for use of behavioural data for analysis. Only 7 WT mice (out of 14) and 5 TG mice (out of 16) achieved this threshold.

Novel-object recognition.

Testing was as per [32, 47]. The testing arena was a box with black walls and white floor made of Perspex (25cm x 25cm x 30 cm high: Pleksiklaas OÜ, Tartu, EE). Mice were placed into the box on day 1, with no objects, for habituation (habituation; 5 minutes). Behaviour was video recorded for analysis. On day 2, familiarisation and testing took place. During familiarisation, a pair of identical objects were pseudorandomly assigned to individual mice (using Excel rand function), which were placed into the box near the northwest and northeast corners. The mouse was then placed into the south end of the box facing away from the objects and behaviour recorded over a period of 5 minutes [32, 47]. Two sets of objects were used for novel object recognition: one set of identical objects were small, round, lidded jars filled with sand with duct tape around them to provide texture; the second set of identical objects were small, green, glass, hexagonal candle holders. No difference in baseline exploration of objects was found, no intrinsic preference for either set of objects was found and no genotype-dependent exploration of either set of objects found. Three hours later, during the testing phase, the final identical member of the object triplet (familiar object) was placed into one of the corners (randomly assigned) and one novel object (a jar if habituated to the candle holder; a candle holder if habituated to the jar) placed in the other corner. The mouse was then placed into the box at the south end, facing away from the objects and behaviour recorded over a period of 5 minutes. Light was approximately 20 lux at the centre of the box. The apparatus and objects were cleaned with 70% ethanol and dried thoroughly, between tests. Behaviour was analysed from videos by a blinded observer. Exploration was defined as when the mouse sniffed the object or touched it while looking at it at a distance of 2cm or less between mouse and object. Climbing was not considered exploration [47]. A threshold of 20s exploration during familiarisation was required for mouse to be included in data analysis [47]; N = 2 WT and N = 1 TG did not reach threshold. Outcome measures included preference score (time spent exploring novel object/total time spent exploring), difference score (time spent exploring novel object minus time spent exploring familiar object) and discrimination index (difference score/total time spent exploring objects) [32]. This test was not examined in the Aged cohort.

Morris water maze: Young cohort.

The water maze (diameter 140 cm, height 45 cm) was filled with water (22°C), that was then coloured using tempura white paint to obscure the platform position. Visual cues were placed surrounding the maze. The trial length for all trials was 60s. If an animal failed to find the platform within 60s, they were guided gently to it. For trials where a platform was present, the platform was placed at 1cm below the surface of the water and threshold for successful location of the platform was 5.2s on the platform. Mice were trained to stay on the platform for 10-15s before being removed and placed into a warmed cage. Inter-trial intervals were approximately 10 minutes. The order of testing was the same for individual mice within a cage, but order of cages was changed daily.

For cued learning, mice were trained (4 trials per day over a period of 5d) to locate a submerged platform marked with a highly salient visual cue. Starting positions and platform positions varied with trial, according to [33]. Despite extensive guidance, N = 1 WT mouse failed to demonstrate an ability to find the cued platform; data from this mouse was not included in the analyses.

Spatial learning began on the 8^th day. For spatial learning, mice were trained over 4 trials per day, for 6 days, to find a submerged platform that remained fixed in the southwest position. Starting positions varied with trial according to [33] (with the addition of D6, starting positions trial 1: SE, trial 2:NW, trial 3: E, trial 4:N). On the 7th day of spatial learning, the platform was removed, and mice placed in the pool at a novel start-site (NE) for probe testing.

Reversal learning began on the 15^th day. The mice were given 4 trials per day over a period of 6 days to learn a new fixed platform position (NE). Starting positions varied with trial according to [33] (with the addition of D6, starting positions trial 1: NW, trial 2: W, trial 3: SE, trial 4: S). On the 7th day of reversal learning, the platform was removed, and mice placed in the pool at a novel start-site (SW) for probe testing.

Morris water maze: Aged cohort.

Protocols were as per guidelines provided in Vorhees and Williams [33] and as described above. However, we had to make several adjustments to assist the frail TG mice, including 1) raising the water temperature: 23–24°C, 2) lowering the platform position to 2cm below the water surface to enable the TG mice to climb on, 3) increasing the inter-trial interval to approximately 60 minutes, 4) lowering the number of trials per day–on day 1, the number of trials was 4 but for subsequent days 2–4, we used 3 trials per day. Finally, mice were tested over 4 days of cued learning (visual learning) only. Spatial learning and reversal learning phases were not conducted as transgenic mice did not achieve sufficient success during cued learning. N = 1 WT showed thigmotaxis (swam following the wall and consistently less than 5cm from wall [33]) and was removed from the analysis.

Morris water maze: Analysis.

For analysis, Ethovision XT V8 (Noldus, Wageningen, Netherlands) was used to calculate latencies to find platform, quadrant occupancies, Gallagher’s proximity, velocity, distance, time on platform and frequency of platform crossings.

Pathological analysis

A series of mice at 2, 4 and 6 months of age were analysed (N = 3–4 [42, 48, 49] female wildtype and transgenic littermates per age). Mice were euthanised by cervical dislocation and decapitation. Brains were dissected out and divided into hemispheres and one hemisphere was placed in fresh 4% paraformaldehyde for post-fixing. Samples for post-fixing were incubated at 4°C, with rocking, for 48-72hrs. Samples were then placed in 30% sucrose for a further 48-72hrs and then briefly washed with 0.01M PBS, excess liquid dried off and then they were snap frozen in liquid nitrogen. Samples were stored at minus 80°C. Serial sagittal cryosections (40μm) were taken and placed in cryoprotectant and stored at minus 20°C until processing. No plaques are observed in non-TG mice from this line of mice [26] and so WT mice were included in astrocyte and microglial analyses only.

Congo red staining.

Three sections per mouse, approximately -2.0mm lateral from bregma, were stained for Congo red as previously described [35]. Briefly, sections were washed in 0.01M TB for 3x5 minutes and then mounted onto gelatin-coated glass slides and dried overnight. On the following day, sections were washed in dH₂0 (30s) then placed in saturated NaCl (NaCl is added to 80% EtOH while stirring until a layer of approximately 5mm is obtained) for 20 minutes. Slides were then placed in Congo red solution for 30 minutes (0.2% Congo red in saturated NaCl, filtered prior to use). Slides were then brought through dehydration steps (8 dips in 95% ethanol, 3 x 5 minutes in xylene) and coverslipped. Photomicrographs were taken at x20 using cellSens Entry, V2.2 software (Olympus Life Science, Center Valley, Pennsylvania) on an Olympus IX70 microscope. Images quantified by a blinded observer for size and number of plaques per field of view at the hippocampus (DG, CA1, CA2/3, subiculum) and frontal cortex. Briefly, the number of plaques was quantified using cell counter in ImageJ (FIJI), and for plaque size, a grid was placed on the image and the area of any plaque contacting lines on the grid (1500μm², random offset) was quantified in ImageJ (FIJI) to a maximum number of 10 per image (1 image per region of interest per section).

Fluorojade C staining.

For Fluorojade C (FJC) staining [36], two sections per mouse, approximately -2.0mm lateral from bregma, were mounted onto gelatin-coated slides, dried overnight and then washed in 0.01M PBS for 1 min then incubated in KMnO₄ (0.06% in 0.01M PBS) for 20mins and incubated in FJC for 20 minutes (0.0001% in 0.01M PBS + 0.1% acetic acid). Sections were then washed in 0.01M PBS for 3x1min, dehydrated and defatted for coverslipping. Photomicrographs (1 image per region of interest per section) were taken at x10 using Zen software on an LSM 780 confocal microscope (ex 488nm, em 505-550nm). Images were batch-processed in ImageJ (FIJI) using the Intermodes thresholding algorithm, followed by despeckling and then analysis of particles greater than 5μm².

Immunocytochemistry.

Free-floating sections were processed according to standard protocols. Two lateral (2.6mm lateral of Bregma) and two medial (1.5mm lateral of Bregma) sections were used per mouse.

For IBA1, sections were washed (3 x 5mins 0.01M PBS) and then endogenous peroxidases inactivated (1% H₂O₂ in 0.5% Triton X-100 in PBS; 20 min). Sections were then blocked (5% donkey serum (Jackson laboratories) in 0.5% TX-100 in 0.01M PBS; 30 minutes) and incubated in primary antibody overnight (Abcam Iba1 ab5076; 1:1000 in blocking solution). On the following day, sections were washed (3 x 5mins 0.01M PBS) and incubated in secondary antibody (donkey anti-goat 705-065-147 Jackson ImmunoResearch; 1:200 in blocking solution) for 2 hours. Following washing (3 x 5mins 0.01M PBS), sections were incubated in Vectastain Elite ABC Reagent in PBS containing 0.2% Triton X-100 for 2 h, according to manufacturer instructions. Sections were washed (3 x 5mins 0.01M PBS) and then developed in 0.03% 3-3-diaminobenzidine tetrahydrochloride containing 0.0006% H₂O₂ in 0.05 M Tris buffer, pH 7.6. Development was monitored carefully and then sections washed in 0.01M TB for 4 x 5minutes. Sections were then mounted onto gelatin-coated slides, dehydrated and defatted and then coverslipped for photography. Control sections were run in parallel that were not exposed to primary antibody: no staining was observed in these sections.

For GFAP, sections were washed (3 x 5mins 0.01M PBS) and then blocked (5% goat serum (Jackson laboratories) in 0.5% TX-100 in 0.01M PBS; 30 minutes) and incubated in primary antibody overnight (GFAP, Sigma Aldrich FLJ45472; 1:500 in blocking solution). On the following day, sections were washed (3 x 5mins 0.01M PBS) and incubated in secondary antibody (goat anti-rabbit 111-585-003 Jackson ImmunoResearch; 1:500 in blocking solution) for 2 hours. Following washing (3 x 5mins 0.01M PBS), sections were incubated in Hoechst (1μg/ml) for 10 minutes, washed 1 x 5mins in 0.01M TB and then mounted onto gelatin-coated slides and coverslipped using aqueous mounting medium. Control sections that were run in parallel that were not exposed to primary antibody showed no staining.

Image analysis for immunohistochemistry.

IBA1. Photomicrographs of subiculum and frontal cortex layers V-VIa were taken at x20 using cellSens Entry, V2.2 software (Olympus Life Science, Center Valley, Pennsylvania) on an Olympus IX70 microscope. To ensure consistency, all pictures were taken using the same settings, having calibrated brightness across the field of view and white-balanced the camera. For analysis, images were processed as previously published with small modifications [34]. Briefly, using ImageJ batch processing, images were FFT bandpass filtered, converted to grayscale, brightness and contrast were adjusted automatically, then an unsharp mask was run twice, images were then despeckled, and converted to binary using the RenyiEntry for automated thresholding. The final images were then despeckled, and the close and the remove outliers plugins were used to close objects and smooth final objects. All particles greater than 30μm² and away from edges were measured, and mean particle size per mouse used to generate group means.

GFAP. Z-stack photomicrographs through the depth of the subiculum and of frontal somatosensory cortex layers V and VIa using an LSM780 confocal (x20; 0.92 x 0.92 x 2.04μm per pixel; frame size: 472.33 x 472.33 μm;). Images were batch-processed in ImageJ. Briefly, colours were split and the GFAP channel made into maximum-intensity projections, converted to 8-bit, auto-thresholded (Maximum Entropy algorithm) and percent area per image quantified to generate means per mouse and then group means.

Statistical analyses

All analyses were conducted blinded to genotype and then the code broken for generating graphs and for statistics. Individual data points as well as mean values +/- standard error of the mean are shown where possible. Critical values were set to 0.05. Body weights from Young and Aged cohorts were combined for analysis and analysed using a mixed-effects ANOVA with Geisser-Greenhouse’s epsilon followed by Šídák’s multiple comparisons tests. To compare one factor between two separate groups, unpaired T-tests were used (e.g., open field total distance moved over 1 hour, e.g., open field velocity over 1 hour). Where one factor was compared over time within a particular group, 1-way ANOVAs with repeated measures were used followed by appropriate post-hoc testing. Two-way ANOVAs with Geisser-Greenhouse’s epsilon were used to compare data where there were two factors (e.g., genotype and time) and were followed by appropriate post-hoc tests, e.g., Gallagher’s proximity over days of learning in the Morris water maze. In the case of missing data, mixed-effects ANOVAs were used instead. Post-hoc tests were Šídák’s multiple comparisons tests for between-group analyses and Tukey’s multiple comparisons tests for within-group analyses. Three-way ANOVAs with repeated measures and followed by Tukey’s multiple comparisons test were used for GFAP and IBA1 immunostaining data. For Novel object recognition, one-sample T-tests were used to compare performances to theoretical values. To compare the mean number of trials showing “bumps” per mouse per group on the final day of MWM testing in the Aged cohort, a Mann Whitney U test was used because data were non-parametric.

GraphPad Prism V9.3.1 was used for the statistical analyses for basic outcome measures; a mouse was considered to be the experimental unit. ClinCalc [50] was used for post-hoc power estimations, Cohen’s D was calculated as per [51]. Sample-size estimations were based upon two-tailed hypotheses, and power was set to 80% except where noted. Matlab [52], ClinCalc [53], BioMath [54] and G-Power [55] were used to determine sample sizes. Sample-size estimates assumed the same group size (1:1) and standard deviation. Power calculations, effect sizes and sample-size calculations are only calculated where robust between-genotype effects were observed. Data are available within S1 File.

Results

Body weight

Reproducing previous data [37, 38, 56, 57], body weights in 5xFAD females failed to increase as much as WT littermates and began to differ significantly from WTs by approximately 4 months (Fig 1; body weights from Young and Aged cohort combined for analysis; age x genotype (F(11,416) = 14.9, p<0.0001). WT females continued to gain weight throughout with AUC analysis showing peak weight at 13m; approximately 30g. TG female peak weight occurred at 8m; approximately 23g. The effect size of weight differences between WT and TG mice was higher for 1-year-old mice (Table 2). At 5.5m, 9–15 mice (depending upon algorithm used to calculate sample size) would be required to detect a treatment effect of bringing weights back to WT levels (Table 3). The theoretical sample sizes required to detect a treatment effect of bringing weight back to WT levels are very small at 1 year of age (1 to <6), likely reflecting this larger difference between genotypes (Table 3). Thus, group sizes for this endpoint at 1 year of age are relatively small if the expected effect size for a particular agent is large (>30%).

Download:

Fig 1. Body weights of WT and TG mice.

Body weights from Young (N = 14 WT, N = 16 TG)) and Aged (N = 17 WT, N = 12 TG) cohorts were combined for analysis and analysed using mixed-effects ANOVA followed by Šídák’s multiple comparisons tests. WT mice gain weight throughout but 5xFAD TG mice reach maximum weight by 8m. The colours of the symbols used for TG mice show whether their weights are similar to their WT littermates (light grey) or are significantly different from their WT littermates with increasingly darker shades denoting increasingly significant differences. Data are of group mean ± sem. but if errors are less than the size of the symbol, they are not shown.

https://doi.org/10.1371/journal.pone.0281003.g001

Download:

Table 2. Effect size and post-hoc power of pathological, behavioural and general health outcomes.

https://doi.org/10.1371/journal.pone.0281003.t002

Download:

Table 3. Group sizes required to detect treatment effects for neuropathological, behavioural or general health outcome measures.

https://doi.org/10.1371/journal.pone.0281003.t003

Behavioural endpoints: Activity

The Young cohort showed no differences in horizontal activity (Fig 2A) or speed (Fig 2B). We noted a high within-group variability in these young mice (distance SD/mean: WT = 37%, TG = 33% versus 11–12%, respectively, at 1 year of age). Sample-size estimations were not calculated.

Download:

Fig 2. Spontaneous activity in a novel environment.

Activity in a novel environment (open field activity) over a period of 1 hour. No difference was detected between genotypes in the Young cohort and we note a large within-group variability at this age (A, B). C-F: By 13m of age, 5xFAD TG mice are hyperactive. Graphs C and D show per-minute analyses (symbols are of group means±sem), but no individual timepoints were significantly different between genotypes. E, F: Analysis of total activity over the 1-hour period was a robust and more useful outcome measure for detecting treatment effects than per-minute analysis. A, B, E, F: Individual mice are shown as black filled circles (TG) or open circles (WT) with lines depicting mean ± sem. C, D: mean ± sem shown. Group sizes: Young: N = 12 WT, N = 14 TG; Aged: N = 17 WT, N = 12 TG.

https://doi.org/10.1371/journal.pone.0281003.g002

In the Aged cohort, TG mice showed increased horizontal activity (Fig 2C, effect of genotype F(1,27) = 9.6, p<0.01) and increased speed of movement (Fig 2D, effect of genotype F(1,27) = 7.8, p<0.01). However, post-hoc tests showed no significant differences between genotypes at any individual “per minute” timepoint (Fig 2C and 2D genotype x time interaction F (59, 1588) = 1.0, ns for distance and F (59, 1588) = 1.0, ns for speed), suggesting that analysis of short timepoints is not optimal for determining treatment effects. When analysing total activities using T-tests, differences were as robust as the genotype factor from the ANOVAs (Fig 2E and 2F; p<0.01 for each outcome measure WT versus TG). Post-hoc power, per se, can be problematic because, statistically, it remains possible that the null hypothesis is correct [58]. However, as our data reproduces recent findings from others [37], the null hypothesis of there being no difference between genotypes is less likely. Thus, post-hoc power for the difference in distance travelled is high (Table 2) and only 6–8 mice would be required to detect a treatment effect of bringing distance travelled to WT levels (Table 3).