Impact of a progressive stepped care approach in an improving access to psychological therapies service: An observational study

England’s national Improving Access to Psychological Therapies (IAPT) programme advocates stepped care as its organizational delivery of psychological therapies to common mental health problems. There is limited evidence regarding the efficacy of stepped care as a service delivery model, heterogeneity of definition and differences in model implementation in both research and routine practice, hence outcome comparison in terms of effectiveness of model is difficult. Despite sound evidence of the efficacy of low intensity interventions there appears to be a perpetuation of the notion that severity and complexity should only be treated by a high intensity intervention through the continuation of a stratified care model. Yet no psychotherapy treatment is found to be more superior to another, and not enough is known about what works for whom to aid the matching of treatment decision. In the absence of understanding precise treatment factors optimal for recovery, it may be useful to better understand the impact of a service delivery model, and whether different models achieve different outcomes. This study aims to contribute to the discussion regarding the stepped care definition and delivery, and explores the impact on clinical outcomes where different types of stepped care have been implemented within the same service. An observational cohort study analysed retrospective data (n = 16,723) over a 4 year period, in a single IAPT service, where delivery changed from one type of stepped care model to another. We compared the outcomes of treatment completers with a stratified care model and a progression care model. We also explored the assumption that patients who score severe on psychological measures, and therefore are potentially complex, would achieve better outcomes in a stratified model. Outcomes in each model type were compared, alongside baseline factor variables. A significant association was observed between a recovery outcome and model type, with patients 1.5 times more likely to recover in the progression delivery model. The potential implications are that with a progression stepped care model of service delivery, more patients can be treated with a lower intensity intervention, even with initial severe presentations, ensuring that only those that need high intensity CBT or equivalent are stepped up. This could provide services with an effective clinical model that is efficient and potentially more cost effective.


Introduction
Depression and anxiety disorders are common mental health problems, with one in six people in their lifetime likely to suffer with a mental health problem at some point, with impacts on employment, relationships and general quality of life [1]. Investing in treatment for mental illness with a talking therapy may pay for itself by an increase in people working and reduction in the cost to the state [2,3]. The evidence base for efficacy of psychological interventions in common mental disorders is well-established, however less is known about how to implement them. In addition, the notion that severe and complex presentations require a more intensive treatment pervades in service model design and clinical guidance. This is despite growing evidence of the efficacy of low intensity interventions [4]; not enough is known about what exactly works for whom.
The Improving Access to Psychological Therapies (IAPT) programme in England is a national service transformation and expansion development in place since 2008 [1]. By increasing resources and training to develop skill, it aims to increase access to psychological therapies for common mental disorders such as depression and anxiety on a large scale. Its goals are major clinical benefits, and also economic benefits in terms of return to work. Prescribed data is collated locally and nationally, which is useful for evaluation. The IAPT programme broadly takes a 'stepped care' approach. The stepped care model [5] is an organising framework of treatment and intensity recommended by the National Institute for Health and Clinical Excellence (NICE). NICE is a Non Departmental Public Body (NDPB) which is responsible for producing guidelines and recommendations for treatment using well researched evidence. The stepped care NICE guidelines set out which evidence based psychological treatments are recommended for which disorders. The National IAPT implementation guidelines [6] outline the expected treatment delivery including expected number of sessions to which disorder and it is assumed that all IAPT services are structured to follow this recommendation. The existing literature regarding stepped care defines it to be a service delivery system recommending the least intrusive intervention first, starting with low intensity treatments [7], with step up if no progress. Whilst the NICE guidelines do recommend least intrusive intervention first, they also recommend a higher intensity treatment in the first instance for more complex and severe presentations and specific disorders such as post-traumatic stress disorder (PTSD) and social anxiety, which is more akin to a stratified model. The stratified model is a form of stepped care where choice of a lower or higher intensity treatment is made by the assessing therapist, in contrast to a progressive stepped care approach in which there is provision of low intensity treatment first. IAPT guidance recommends that service delivery follows NICE guidelines [6] where there are a range of prescribed treatments offered at a step 2, low intensity and step 3 high intensity levels for specific disorders. It is assumed that all IAPT sites will be following the national IAPT implementation guidelines and delivering a version of stepped care, but with variation between emphasis on stratified and progressive models across services. One study demonstrates this with an evaluation of the development of four mental health service sites [8]. All developed a slightly different interpretation of stepped care to each other, demand and capacity was seen to influence service design, as was clinician bias effecting pathway flow. Systems for psychological treatment delivery are less well-researched than therapies themselves, and evidence of the effectiveness of stepped care is limited, particularly when comparison of stratified and progressive models is sought. Two recent systematic reviews [9,10] found considerable heterogeneity between studies on types of treatment, and method of step up. Both concluded that in general, stepped care can be seen as effective when compared with usual care. Limitations included substantial heterogeneity between studies and underpowered samples.
Underpinning the apparent preference in the literature, routine practice and clinical guidelines for a stratified or mixed model of stepped care, appears to be the notion that the more complex and severe the presentation, a higher intensity treatment is needed. This is not supported by evidence. In a recent meta-analysis [4], severe presentations had as much clinical gain with low intensity treatments as those less severe. Another study compared outcomes of low and high intensity treatment and found no difference in baseline scores between low or high intensity treatments [11]. Similarly, a different study [12] found initial scores were not an influencing factor on a service achieving low or high recovery rates. A systematic review [10] found considerable variation of patient severity and symptom chronicity and no clear trends that related chronicity/severity to clinical outcome. Furthermore, the approach of stratifying care by severity assumes we may know 'what works for whom', whereas a recent network meta-analysis found there to be no superiority between therapy modality [13].
In designing or evolving psychological therapy service delivery for common mental disorders, it would be extremely useful to have systematic evidence on how a progressive model of stepped care compares to a stratified model. Examining this question via a randomised controlled trial is challenging, perhaps requiring a cluster method and very large samples. However, the opportunity now exists in England to examine a large observational dataset of patient characteristics and outcomes within IAPT services to study associations between service models and outcomes.
This paper describes an observational study using retrospective data in routine practice. The study compares service delivery of stratified care, and progressive stepped care within a single IAPT service, taking the opportunity of a natural experiment due to model change over time. Two main research questions are addressed: 1. What is the relationship between clinical outcomes of depression and anxiety and service delivery model for adults treated in an IAPT service?
2. What is the relationship between the clinical outcomes for moderate to severe anxiety and depression, and service delivery model?
The relationship with outcome and model and baseline variables is also explored.

Ethics statement
A study proposal outlining the methodology, and that the retrospective data would be anonymised at source was submitted to Durham University ethics committee and was approved. The proposal was also discussed and approved by Tees, Esk and Wear Valleys NHS Foundation Trust Research and Development Department (R&D) and the Trust's Quality Assurance group.

Study site
The study site is a large IAPT service in North East of England with at the time around 90 clinical therapists, either trainee or qualified as Psychological Wellbeing Practitioners (low intensity interventions step 2) or High Intensity Therapists (E.g. Cognitive behavioural therapy or interpersonal therapy delivered at step 3). IAPT services across England generally deliver interventions at step 2 and 3 of the stepped care model outlined by NICE [5]. During the first two years of delivery, the study site operated mainly with a stratified model of stepped care. The decision making point for treatment level occurred after assessment, Those deemed severe or complex by the assessing clinician were placed straight onto step 3 high intensity waiting lists, and those initially treated at step 2 were stepped up if needed. By the end of year one the waiting lists at step 3 were unacceptably long, with underused capacity at step 2. A decision was made during year 2 to move towards a progression model of stepped care; informed by a growing body of evidence regarding the efficacy of low intensity interventions, even with severe presentations [4]. This offered a potentially efficient, cost effective and evidence based solution to the growing waiting list at step 3. A pragmatic decision was taken regarding presentations of PTSD and social anxiety; given the waiting list issues at the time. It was believed that to offer psycho educative support through step 2 low intensity interventions in the first instance and then stepped up would be more helpful than receiving no treatment, or waiting a long time, and aided clinical risk management.
In terms of robustness of data analysis regarding model delivery, years 2 and years 4 were isolated to minimise variable effect of poorer data input in year 1 as systems refined, and model changeover effect in year 3.
As with all IAPT services, the study site submits throughput and outcome data for national evaluation. A minimum dataset (MDS) is collected at each assessment and intervention session, compromising of a number of psychological measures and demographic information, including ethnicity, disability and employment status as defined by the national IAPT programme [14]. The data is evaluated against a specific set of key performance indicators (KPI's). Clinical recovery is defined through movement from above to below certain scores on two psychological measures, PHQ9 and GAD7. 16,723 patients were included in the initial analysis. These were patients who had been assessed, completed treatment and discharged by the service within the four year period. The definition of completed treatment used in this study (apart from a comparator in discharge reasons) was that used for the national IAPT KPI's. Completed treatment is defined as two or more separate treatment sessions.

Design
An observational retrospective cohort design was used. Four years of routine data was extracted for the purposes of specific evaluation. Descriptive analysis was undertaken on the whole dataset (n = 16,723). Year 2 and 4 data were then isolated (n = 8578), to control the effect of contaminated or less robust data from service set up and system refinement during year 1, and model delivery changeover in year 3. Year 2 (n = 3932) represents a stratified model of delivery, and year 4 (n = 4646) a progression model. Subset cohorts were also selected using moderate to severe initial scores of the psychological measures. (Years 2&4, PHQ9 moderate to severe initial scores n = 5440, GAD7 moderate to severe initial scores n = 7398).
As described earlier, treatments offered by IAPT services are prescribed by the national IAPT implementation programme, which uses NICE guidelines. Therefore the effectiveness of such treatments are already well researched, agreed by NICE, and are not within the scope of this study to measure. Types of psychological treatment delivered were not a controlled variable in this study. Within the progression model, some patients potentially with PTSD or social anxiety may have received a step 2 intervention in the first instance. Initial identification of these patients may have proved difficult to isolate as at the time of the study, as the service needed to improve its initial disorder coding within the clinical database. Also it is possible that definite diagnosis of these presentations may occur at a later stage during therapeutic intervention, when the patient discloses at a later session rather than at initial assessment.
As therapists employed within IAPT are required to be trained and qualified in specific therapeutic treatment delivery, there is the assumption that NICE and IAPT guidelines as described earlier are adhered to. The service in this study ensured regular clinical supervision and clinical records audit as part of clinical governance to provide assurance regarding treatment fidelity. Whilst it is accepted that there may be some variation of fidelity to treatment model, this study concerns routine practice data, with the likelihood of co-morbid presentations and therefore it would be expected that treatment will appropriately vary to reflect this. The outcomes of IAPT services against NICE compliance regarding treatment has already been measured [15] and found compliance with NICE guidance was associated with higher recovery rates. This study is not a randomised controlled trial (RCT) and it is not within the scope to measure treatment fidelity.
In terms of treatment dosage, the service within this study delivered manualised sessions at step 2, e.g. 4 sessions for telephone guided self-help or psycho education course, and 6-8 sessions face to face guided self-help.
Step 3 is less prescriptive, advocating the clinician to follow the NICE guidelines of between 12-16 sessions on average.

Psychological measures and outcomes
The minimum data set (MDS) includes a number of measures, however for the purposes of this study the data from the specific phobia questions and the work and social adjustment scale were not analysed. The specific measures counted in this study are the PHQ-9 and GAD-7. The PHQ-9 is a nine question scale that measures depression symptoms frequency scoring from 0, "not at all bothered by the problem", to 27 "bothered nearly every day". The reliability and validity of the PHQ-9 in terms of measuring depression is good [16]. The GAD-7 is a seven question scale that measures the frequency of anxiety symptoms scoring from 0-21. The reliability and validity of the GAD-7 in terms of measuring general anxiety symptoms is good, and satisfactory with more specific disorders such as social phobia, or obsessive compulsive disorder [17]. The scales are used in every clinical session and the scores at the first and last sessions are used to measure outcome. The KPI defines recovery for IAPT as patients scoring above clinical caseness at first session on at least one measure, and below caseness on both measures at the last session to count as recovered (caseness = 10 on PHQ-9, 8 on GAD-7). Reliable clinical change was defined in the same way as another large IAPT study [15] which outlines; "Patients were deemed to have reliably recovered if they scored above the clinical cut-off on the PHQ-9 and/or the GAD-7 at initial assessment, they showed reliable improvement during treatment, and they scored below the clinical cut-offs on both the PHQ-9 and the GAD-7 at the end of treatment. Reliable improvement was assessed using Jacobson and Truax (1991) reliable change criteria. The measure of reliability used for the PHQ-9 and the GAD-7 was Cronbachs α, taken from the validation studies of the measures (Kroenke et al., 2001;Spitzer et al., 2006)."

Statistical analysis
Statistical analysis was performed using the software SPSS version 20. A statistician was consulted and recommended the use of cross tabulations to measure proportion and frequency, chi square tests-two sided, to show any association between categorical variables, and multiple logistic regression to test variables as predictors of outcomes. Variables were entered simultaneously. The value of p was set at 0.05 in terms of significance for all tests. The dataset preparation for multiple logistic regression removed outliers and certain variables.
Given extremely low numbers in both ends of the age categories, those were removed. The variables ethnicity and disorder type were also removed. The ethnicity variation other than White British was not of a number to be able to undertake further statistical analysis.
The disorder type as a variable was removed for the regression tests as there was not enough confidence that this data represented an accurate picture of patient presentation. In the years that this data applies to, the information was taken from the referral at the point of receipt, prior to assessment. It was believed that there may be a large number where the disorder category, i.e. the primary reason for treatment may change after assessment and indeed perhaps at a later point after some therapy sessions. In particular the nature of PTSD may mean initial presentation of depression or anxiety but a disclosure at a later point reveals the actual disorder. It was felt that it would not be helpful to include this variable and make conclusions particularly on specific disorders given the amount of potential variation described. It is acknowledged by doing so; a comment cannot be made regarding the service delivery model impact on clinical outcomes regarding disorder type, other than acknowledging disorder type may be a confounding variable that at this stage was difficult to control. Subsequent developments with IAPT services mean that it is possible to now update disorder type within a clinical database to reflect a more accurate picture. Also, the nature of this study uses routine practice data, the study is not an RCT. Therefore it is perhaps more applicable to consider that there will be large elements of co-morbidity within the dataset, and that the study is concerned with measuring the broad impact of service model delivery on clinical outcomes for a routine clinical population.
The initial regression used the dependent variable (outcomes) arranged as binary, i.e. recovered versus non-recovered. Independent variables included the model delivery type, stratified or progression, initial severity scores, patient characteristics and discharge reasons.
The stratified model type was the reference. Initial severity scores were grouped into categories using the severity categories defined in the IAPT data handbook [14] which enabled them to be categorical variables and are more meaningful to compare the results. Demographic variable categories were used in line with national IAPT KPI reporting, as described earlier with regards to gender, disability and ethnicity. Age was organised into categorical variables using mainly 10 year age groupings aligned with population reports undertaken by the Office of National Statistics.
Discharge reasons were service defined categories; completed treatment, dropped out, not suitable, declined treatment and referred on. These are chosen by the clinician and entered into the clinical database at point of patient discharge. Completed treatment defined by clinician as a discharge reason is different from completed treatment defined by IAPT KPI definition as described earlier. Completed treatment IAPT KPI definition was used to select the whole dataset. Looking at therapist defined completed treatment as a discharge reason enables comparison with other discharge reasons i.e. dropped out.
To explore the question regarding the impact of service delivery model on presentation severity and outcome, regression tests were undertaken with the moderate to severe initial score groups.

Results
The descriptive analysis showed a normal distribution of the clinical population of this study site, regarding demographics and psychological measures, comparable to the national IAPT data [18]. Chi square tests showed a significant association between a number of independent variables, and between model type and recovery. The volume of patients being treated and completing treatment rose incrementally from 1893 in year one, to 4291 in year 2, and 5145 in year 4. As the progression model was introduced, the numbers being treated at step 2 in the first instance increased, with a proportional reduction of those treated at step 3 in the first instance in year 4 at 7.6%, compared to year 2 at 23%.

Baseline factors
There was no cohort difference in the spread of baseline factors. Descriptive analysis of the whole dataset (n = 16,723) in Table 1 showed a mean age of 42.3 years, (SD = 13.9, Mdn = 42, IQR = 21). There were a larger proportion of younger women than men, with the difference in gender decreasing as the age bands rise. 28% of men were unemployed compared to 19% women. 13% men and 10% women are categorised sick or disabled. Within those registered disabled, there were more men than women registered with a significant association found between gender and disability, X 2 (1, N = 16,718) = 31.7, p = <0.001). There were also more in the age bands 45-54, and 55-64, with a larger proportion being male.

Impact of model type on outcomes
In terms of the mean recovery rates, there is a distinct difference of 9% between the models with the stratified model measuring 40% as recovered, compared to the progression model measuring 49% as recovered. Chi square tests showed an association between recovered versus non recovered outcome and model, X 2 (1, N = 8578) = 78.6, p = <0.001, No correlation was found for reliable improvement or reliable deterioration.
Treatment results as demonstrated in Table 2 showed there was a small increase in treatment dosage within the progression model compared to the stratified.
Reasons for discharge were analysed and as shown in Table 3 there was nothing notable between model delivery type in terms of the various reasons for discharge such as not suitable, referred on, and declined, however there was an increase in therapist defined completed treatment in the progression model, and a small decrease in dropout rate within the progression model compared to the stratified model.
Logistic regression showed that patients were 1.53 times more likely to recover in the progression model, as shown in Table 4. (Wald statistic (1) = 57.075, p< .001, OR = 1.53, lower CI = 1.368, upper CI = 1.705). The data were shown to fit the model using the Hosmer and Lemeshow test (p > .05). The logistic regression model was statistically significant, X 2 (25) = 3555.72, p < .001. The model explained 45.6% of the variance in the recovery outcome (using Nagel-kerke's R 2 ) and correctly classified 76.5% of the cases. Sensitivity was 79.8%, specificity was 73.7%. The positive predictor value was 72.1% and the negative predictor value was 81%.

Impact of relationship between patient characteristics and model type on outcomes
Within the logistic regression model where the progression service design was shown to be more likely to attain recovery than the stratified design, gender and disability were not found to affect the recovery outcome. Certain ages were found to affect the recovery outcome, with patients aged 44-54 1.33 times more likely to recover than the group 18-24 (p = 0.016). The 65-74 group were 2.75 more likely to recover than the 18-24 group (p = < .001). Patients in employment were also more likely to recover.

Initial score severity impact
The median initial score for PHQ-9 was 16, GAD-7 median was 15, (Table 5) placing the most common score within the severe range for both measures. The last scores median were 8 for PHQ-9, and 7 for GAD-7, placing the most common within the below caseness range on both measures.
The stratified and progression model data were then also separated by score severity using the groups described earlier. Cross tabulations and chi square tests were undertaken on the higher PHQ-9 and GAD-7 groups and show an association between score severity, outcome and model, a larger proportion of participants with moderate (GAD7), moderate/severe (PHQ9), or severe (GAD7 or PHQ9) initial scores recovered in the progression model compared to stratified, with significant association. Chi square tests were undertaken on the higher PHQ9 groups and show an association between score severity, outcome and model. Severe PHQ9 showed X 2 (3, N = 2776) = 27.2 p = <0.001, with participants scoring severe a larger proportion recovered in progression model, a larger proportion did not recover in stratified and there was no discernible difference with reliable improvement or reliable deterioration. Moderate/severe PHQ9 showed X 2 (3, N = 2664) = 30.1, p = <0.001, with participants scoring moderate to severe a larger proportion recovered in year 4, (10% proportional difference between years) and a larger proportion attained reliable improvement, reliable deterioration or did not recover in the stratified model.
Cross tabulations and chi square tests were undertaken on the higher GAD7 score groups and showed an association between severity, outcome and model. The GAD7 severe group showed X 2 (3, N = 4759) = 40.7, p = <0.001, with the severe group the largest proportion to recover was in the progression model than in stratified (over 8% proportional difference between years), and conversely a larger proportion of reliable improvement, reliable deterioration, and non-recovery in stratified. The moderate GAD 7 group showed X 2 (3, N = 2639) = 24.8, p = <0.001, with the largest proportion to recover in progression model, with over 10% difference between years. The stratified model has the larger proportions for reliable improvement, reliable deterioration and non-recovery. Logistic regression was undertaken with the score severity cohort ( Table 6). The data were shown to be a good logistic regression model fit with the moderate to severe groups of initial scores of PHQ9 and GAD7, and all statistically significant (p < .001) more likely to recover in the progression model. The odds ratios were 1.28 for moderate PHQ9, 1.79 for moderate/ severe and 1.6 with severe PHQ9. The odds ratios for GAD7 were moderate 1.49, and severe 1.55.
In terms of outcomes other than recovery, e.g. reliable improvement, no change or deterioration, there was either no significance with either model type, or where there was significance it was with the stratified model.

Discussion
In a large IAPT service collecting data on more than 16000 patients that had completed treatment according to IAPT KPI definitions, it was possible to use retrospective cohort analysis to compare outcomes in the two stepped care models due to the natural experiment opportunity of a changed service model during the period. Proportionally the progression model had more clinician defined treatment completers, as referral rates rose, than with an earlier year using a stratified model. A confounder could be natural improvement in efficiency within the two year period, increasing turnover, volume of completed treatment, which could have had an effect on the recovery outcome increase. Whilst possible, an equally unmeasured variable that counters this possibility is that during the same period, the service experienced around 15% average annual workforce capacity reduction through maternity leave, delays in recruitment, long term sickness, and attrition. This resulted in recruiting a larger than predicted number of trainees with naturally a lower caseload capacity and potential skill. Despite this, higher recovery rates have been achieved and we have shown that the progression model is associated with higher recovery rates compared to the stratified model.
It is possible that the specific standardisation and therefore less variation of certain step 2 interventions within this service has contributed to the increase of recovery rates, thus the factor being fidelity to treatment protocol. At a higher intensity level, it is possible that model drift may be more likely to occur, with a tendency of therapists to lean more towards a transdiagnostic model with perceived complex cases. Certainly, one large IAPT study [15] states that compliance with NICE treatments is a predictor for better recovery rates. This assertion appears to be derived from the dataset including a number of IAPT services, with data outlining numbers receiving NICE compliant treatments; however protocol fidelity in each service was not specifically measured. Similarly at the time of analysis in this study, treatment protocol fidelity was not a controlled variable.

Does complexity and severity need high intensity treatment?
The underpinning belief of a stratified model is that complex and severe cases should be treated by high intensity treatment delivered by a more qualified therapist. This belief is further perpetuated with the mixed model design of stepped care within NICE guidelines. Also in studies seeking to demonstrate the effectiveness of stepped care, using a progression model definition, however building a caveat or exception for complex cases to be treated in the first instance at a higher intensity level [19].
The growth in evidence of effectiveness of low intensity interventions, even with complexity and severity [4] raise questions regarding that notion, and indeed present a compelling and cost effective alternative for services wishing to maximise the volume of people treated in an efficient, yet clinically effective manner.
This study demonstrates that in cases with higher initial scores on PHQ9, recovery is associated with the progression model where most will receive step 2 interventions only, compared to a stratified model. This may indicate that those more severely depressed actually benefit greatly from a very simple, structured treatment. The growth in evidence regarding the effectiveness of Behaviour Activation supports this notion [20]. Whilst there was a statistical significance within the stratified model in terms of reliable improvement for patients scoring moderate to severe PHQ9, this result is likely explained by the effectiveness of treatment delivered in both cohorts. As there was no difference of patient type or presentation between the model cohorts, the statistical significance of the progression model more likely to attain recovery is a more welcome result. Essentially more people attaining reliable recovery rather than reliable improvement is the aim.
The differences in the higher initial GAD 7 scores and recovery compared to PHQ9 group were not clear enough to make conclusions. A possible confounding factor could be that this particular service was maybe more skilled at treating depression. Competencies and experience were uncontrolled variables. Type of disorder treated was not a controlled variable, and it is possible that the low intensity psycho-educative treatment was too generic and standardised to effect maximum gains for some specific anxiety presentations.
Initial scores on the psychological measures used give a self-reporting indication on frequency of symptoms and they can be a major factor in consideration of intensity of treatment. It is fair to say that self-reported scores may not necessarily correspond with other clinical information, and clinical decision making is usually based on observed presentation and assessment as well as psychological measures. This study did not account for the potential variable of presentation and scores, therefore this may be a confounding factor to the results. It may be useful for future analysis to differentiate score and presentation congruent and incongruent cases and to see whether outcomes differ.
A further possible confounding factor may be that we did not compare the clinical outcomes of those stepped up receiving treatment at step 2 then 3 to those treated at step 2 alone, as the focus was on the service delivery model, rather than the intensity or type of treatment. It is possible that certain diagnosed specific anxiety disorders (PTSD and social anxiety) would do better with a high intensity treatment in the first instance, as is currently recommended by the National IAPT manual [6]. There is currently a gap in the evidence regarding the effectiveness of specific step 2 treatments for these disorders, and also gaps in our understanding of which specific change components within therapy work for whom.
Accepting the confounding factors potential impact on the results, this study does raise the possibility that a progression model, with cases being treated at step 2 in the first instance, may achieve better recovery rates for those scoring more severely initially than those within a stratified model. This supports other studies that found no difference in the initial score severity between those treated at low or high intensity [11], or scores not to be an influencing factor on a service achieving low or high recovery rate [12]. One systematic review [10] found considerable variation of patient severity and symptom chronicity and no clear trends that related chronicity/severity to clinical outcome. One of the concerns treating patients that score highly on the measures with a low intensity intervention in the first instance could be that more patients would drop out. This study did not find this, on the contrary, dropout rate decreased in the progression model; however it is acknowledged that a possible influencing confounding factor of length of wait time prior to treatment was not controlled.

Treatment dosage
The average dosage at each step within this study is generally consistent with the National IAPT picture [15]. However it is notable that within this study with the group of patients that are stepped up the average dosage increases by about 2 sessions. However compared to a stratified model the service achieves a bigger throughput of patients with a progression model, and the larger volume receiving step 2 only, therefore the gains are made here in terms of throughput, which in turn reduces waiting time, with less demand on the step 3 level that has the slower turnover of patient volume. Given that not enough is known regarding optimal dosage, this further supports the notion and definition of pure stepped care to commence a treatment with the least intrusive, smallest dosage first.

Limitations
Further to the potential confounding factors acknowledged earlier, the limitations of an observational study within routine practice compared to a gold standard RCT are acknowledged. Although arguably more applicable to IAPT given the real data in a 'real' service, it is accepted that there are a number of uncontrolled variables that may have impacted on outcomes. There is a potential limitation to the results not controlling for effect of treatment type, however this was not the research question focus, and generally there is an acceptance of existing evidence of efficacy of treatment as recommended by NICE. Measuring the variables of disorder and treatment type however would have been useful to see if there were in fact, any difference in outcomes for PTSD and social anxiety within service model type, however as discussed previously this was not seen as viable at this point to give a meaningful result. Methodological design controlled certain variables, e.g. the cohort chosen, and given the nature of staff turnover and a proportion of continuation of trainees impacting on capacity and skill level, it is proposed that this most likely countered any effect of service improvement occurring naturally over time. Reversing the referencing groups and repeating regression tests were not undertaken, nor were more advanced regression models to test the stability of the data, and this is an accepted statistical analysis limitation. This is an observational study using data from one service, and therefore it is possible that the results may be unique to that particular service. However the descriptive analyses show that the demographics of participants in this study are consistent with the larger national IAPT dataset [18]. Similarly results support other studies findings in terms of the efficacy of low intensity interventions even with those scoring as severe.

Implications and challenges for practice
The study has several implications both broad and specific. Firstly, the questions raised by the difference between the stepped care model research definition and NICE guidelines, stepped care implementation within research trials and implementation in routine practice, demonstrate the need for consistency, in order to measure effectiveness of what is stated is being measured.
Second, where there is flexibility in a service delivery design, or within guidelines, it is suggested that clinicians will continue to make decisions based on the belief that complexity and severity require more intense treatment, despite a growth of evidence including this study that question the need, unless policy, guidelines and systems persuade otherwise.
Although one study shows that initial severity predicted reliable recovery, i.e. the more severe the scores the less likely to recover [15],this study shows that even those with severe scores may do better in a progression model of stepped care, where the majority of patients will receive a step 2 intervention only. Therefore whilst severity might predict recovery, it does not necessarily predict the need for high intensity treatment. The growth in the use of low intensity interventions over the past 10 years is particularly supported by the national IAPT programme. There is sound evidence of their effectiveness with depression and anxiety, and although NICE currently recommend high intensity for PTSD and social anxiety, following these guidelines with large step 3 waiting lists may leave these patients disadvantaged, and without treatment for a long period of time. Aside from perhaps the need for increased resources, it also highlights the need for improved low intensity /stabilisation treatments for these disorders that can be offered in the first instance. The rationale of the progression model is a pragmatic one; some psycho-education treatment in the first instance is better than no treatment at all. Particularly to enable services to risk manage, rather than patients sitting on waiting lists, and may promote better engagement. Although this study found a reduction in dropout rates since the progression model was introduced, we do not know enough about why patients drop out of treatment, and how specifically an increase in low intensity treatments offered in the first instance to a larger volume of patients in other services may affect dropout rates. Pragmatically, the progression model facilitates a notion that a service that can increase throughput and volume of patients treated, and thus treat more people faster, will increase engagement and completion of treatment and conversely reduce dropout.
The IAPT programme is tasked with improving access, increasing the volume of patients treated whilst delivering evidence based interventions and achieving good recovery rates. The challenge for IAPT programmes is how to achieve all three elements simultaneously.
The evidence base is changing, with the effectiveness of low intensity interventions to be demonstrable. Building on a body of evidence regarding Behavioural Activation, one study reported that BA was found to be not inferior to CBT "in terms of reduction of depression symptoms" and more cost effective [20]. Unfortunately there is little understanding in routine practice how much clinician bias and preference may influence and skew demand on levels of treatment intensity. Perhaps the biggest implication and challenge for practice, is to have confidence to use a system design to control the variable of clinician bias. A progression model of stepped care, removes this variable. In the current context of gaps in evidence regarding what specifically works for whom, a more attractive option could be where the system controls certain variables. A system which controls delivery of evidenced based low intensity interventions in the first instance, which can increase the number of patients treated, reduce costs through a larger volume being treated with a shorter number of sessions, and achieving good recovery rates may be preferable. The challenges for services are the acceptability of this system by clinicians, and the culture change process that has to occur for its success.

Author Contributions
Conceptualization: Lisa Boyd.