Do Patients’ Symptoms and Interpersonal Problems Improve in Psychotherapeutic Hospital Treatment in Germany? - A Systematic Review and Meta-Analysis

Background In Germany, inpatient psychotherapy plays a unique role in the treatment of patients with common mental disorders of higher severity. In addition to psychiatric inpatient services, psychotherapeutic hospital treatment and psychosomatic rehabilitation are offered as independent inpatient treatment options. This meta-analysis aims to provide systematic evidence for psychotherapeutic hospital treatment in Germany regarding its effects on symptomatic and interpersonal impairment. Methodology Relevant papers were identified by electronic database search and hand search. Randomized controlled trials as well as naturalistic prospective studies (including post-therapy and follow-up assessments) evaluating psychotherapeutic hospital treatment of mentally ill adults in Germany were included. Outcomes were required to be quantified by either the Symptom-Checklist (SCL-90-R or short versions) or the Inventory of Interpersonal Problems (IIP-64 or short versions). Effect sizes (Hedges’ g) were combined using random effect models. Principal Findings Sixty-seven papers representing 59 studies fulfilled inclusion criteria. Meta-analysis yielded a medium within-group effect size for symptom change at discharge (g = 0.72; 95% CI 0.68–0.76), with a small reduction to follow-up (g = 0.61; 95% CI 0.55–0.68). Regarding interpersonal problems, a small effect size was found at discharge (g = 0.35; 95% CI 0.29–0.41), which increased to follow-up (g = 0.48; 95% CI 0.36–0.60). While higher impairment at intake was associated with a larger effect size in both measures, longer treatment duration was related to lower effect sizes in SCL GSI and to larger effect sizes in IIP Total. Conclusions Psychotherapeutic hospital treatment may be considered an effective treatment. In accordance with Howard’s phase model of psychotherapy outcome, the present study demonstrated that symptom distress changes more quickly and strongly than interpersonal problems. Preliminary analyses show impairment at intake and treatment duration to be the strongest outcome predictors. Further analyses regarding this relationship are required.


Introduction
Inpatient psychotherapy, an intensive and multimodal treatment for patients with mental disorders, is especially common in Germany. There are more psychotherapeutic hospital beds and more facilities specialized solely on psychiatric disorders per capita than in any other country in the world [1]. This has also been noted in previous meta-analyses addressing special issues in inpatient psychotherapy: A large part of the included studies were conducted in Germany [2,3]. In inpatient psychotherapy, patients are primarily treated with individual and group psychotherapy, which is complemented by other therapeutic approaches such as psychoeducational groups, occupational therapy, creative therapy, relaxation training, exercise therapy and medical treatment. Almost all psychotherapeutic hospitals offer similar complementary treatments (such as creative therapy and exercise) but the type and amount of psychotherapeutic interventions applied may vary significantly: Some hospitals offer psychodynamic, others cognitive behavioral treatment or specialized concepts (e.g. interpersonal psychotherapy or dialectic-behavioral therapy). Some clinics focus on individual sessions complemented by group sessions, while others focus on group therapy. The amount of psychotherapy ranges mainly from one to four sessions a week [1,[4][5][6].
In addition to the internationally more common inpatient psychiatric services [1,7], there are two other treatment modalities in Germany that provide inpatient psychotherapy, i.e. psychotherapeutic hospital treatment and psychosomatic rehabilitation [5,7]. Both treatment forms focus on psychotherapeutic rather than on medical or pharmacological approaches. In general, inpatient psychotherapy is prescribed when outpatient treatment is considered to be insufficient [5]. Accordingly, inpatient psychotherapy addresses patients at risk of self-harm, patients with difficulties to cope with everyday life or patients with serious conflicts in their social environment [5]. Hospital treatment addresses more acutely ill patients, while rehabilitation puts a special emphasis on improving patients' working ability. Both treatment forms pursue the goals of healing patients' disorders, preventing aggravation, or easing discomfort [8]. Schulz and Koch [8] summarize that there is a stronger indication for hospital treatment as opposed to rehabilitation in the case of curative goals, life threatening risk, profound disruptions in everyday life, a need for diagnostic assessment and a high complexity concerning medical and nursing needs. However, considering the general complexity of mental disorders, a differential allocation of patients to the appropriate acute vs. rehabilitative setting is often complicated.
In 2007, Steffanowski et al. carried out the first meta-analysis on the effectiveness of inpatient rehabilitation. This meta-analysis revealed a medium effect at discharge (overall outcome: d prepost = 0.57) and slightly decreased long-term effects (d pre-followup = 0.49; MESTA study [9]). Given that in Germany alone, more than one million patients are treated in psychotherapeutic hospitals per year (e.g., 1,127,971 cases in 2008, [7]), it is surprising that a meta-analysis on the effectiveness of these treatments was conducted only recently by Liebherz and Rabung [10]. This meta-analysis revealed medium to large short-term effects for psychotherapeutic hospital treatment (overall outcome: d pre-post = 0.71) with a slight increase to follow-up (d pre-followup = 0.80). Due to the heterogeneity of patients, interventions, outcome measures, and study quality, the aggregated effect sizes showed a great variance [10].
In assessing the effectiveness of psychotherapy, it is essential to consider different outcome areas and outcome measures, as both may affect the results. One possibility is to distinguish between monetary (e.g. sick leave, health care utilization) and nonmonetary criteria (e.g. patient satisfaction, subjective well-being) [11]. Steffanowski et al. [9] classified outcome instruments into five domains, namely physical and mental complaints, social and functional adjustment and cost effectiveness. Liebherz and Rabung [10] complemented these five domains by two other outcome areas, i.e. dysfunctional cognitive patterns and general well-being. The domain general well-being showed the highest effect sizes in the meta-analysis of Liebherz and Rabung [10] while social functioning showed the lowest. Cost-effectiveness was rarely addressed; only two studies reported results in this domain.
However, the approach ensuring the highest comparability between studies is to confine comparisons to single outcome measures. In addition, the latter solution bears the possibility of providing benchmarks to hospitals that routinely evaluate their outcomes by the use of these specific measures.
Two of the most commonly used outcome instruments in psychotherapy research are the Symptom-Checklist (SCL [12]) and the Inventory of Interpersonal Problems (IIP [13]). While the Symptom-Checklist focuses on various psychiatric symptoms (e.g. somatization, depression, anxiety), the Inventory of Interpersonal Problems deals with typical interaction problems occurring in different types of social relationships, such as being domineering/ controlling or overly accommodating. Both instruments are selfrated measures.
In this context, the present meta-analysis aims to integrate all available evidence from original studies investigating the treatment effects of psychotherapeutic hospital treatment in Germany. Since it was our aim to offer specific results and benchmarks for psychotherapeutic hospitals, this paper aims to complement the first publication on this study [10], which provided data for different outcome areas over a wide range of outcome measurements. Therefore, in the present paper we confine our analyses on the two most common outcome measures used in inpatient psychotherapy outcome research, i.e. the SCL and the IIP. Additionally, we provide first results of moderator analyses.

Literature Search and Study Selection
We conducted an electronic database search using 'PSYNDEXplus Literature and Audiovisual Media 1977 to September 2009' with the following search terms (in English and German): (therapy* or treatment* or intervention*) AND (inpatient* or clinic* or hospital* or unit* or ward*) AND (result* or evaluation* or change* or effect* or efficac* or follow-up* or outcome* or course*) AND psych*. Additionally, we performed a hand search in relevant German journals ( To identify unpublished papers, we searched web pages of psychotherapeutic hospitals in Germany [14,15]. Identified fulltexts were screened for eligibility by two independent raters (SL and SR). Papers investigating the same or overlapping subgroups were integrated into one study. Only disjunctive (i.e. not overlapping) samples were considered for outcome calculation.

Inclusion Criteria
We included published as well as unpublished papers (in German and English) reporting outcomes of psychotherapeutic hospital treatment in Germany. Investigations from other countries had to be excluded due to the differences in health care systems and the unique position of inpatient psychotherapy in Germany. Inclusion criteria were based on those used in the metaanalysis on psychosomatic rehabilitation mentioned above (MESTA study [9]), but were modified according to the context of psychotherapeutic hospital treatment (see Table 1).

Data Abstraction and Data Details
Data abstraction was mainly carried out by one rater (SL) and supported by three trained student research assistants. The student assistants extracted the sample characteristics and the quality criteria, but not the outcome data. All extracted data were verified by the first rater (SL). In case of ambiguity, the results were discussed with a second rater (SR). To guarantee a high quality of data extraction, the second rater (SR) additionally carried out unsystematic double ratings and checked for inter-rater agreement. In case of variations, which occurred in less than 1% of all ratings (for the extracted quantitative outcome data, r$0.99), the two raters reached a consensus through discussion.
We extracted the following information from the identified studies: authors, title, year of publication, type of publication, country of study execution, study quality, measurement points, treatment characteristics (e.g. treatment duration), socio-demographic data (e.g. age, sex, education), socio-medical data (e.g. inability to work), clinical data (e.g. illness duration, diagnoses), sample size and outcome data (means and standard deviations). If relevant outcome information was missing, we contacted the authors of the study.

Risk of Bias in Individual Studies
There is a considerable lack of available checklists for the appropriate assessment of the study quality of psychotherapy outcome studies. Especially for non-randomized, i.e. observational studies, common scales (e.g. Cochrane Collaboration's risk-of-bias tool [16]) seem to be of limited applicability. For this reason, we scrutinized the issue of study quality in a separate study [17]. Based on a systematic review of relevant quality assessment tools, we selected the 19 different quality criteria most relevant to nonrandomized psychotherapy outcome studies, which address various aspects of general methodological quality, internal validity, and external validity. To assure objectivity, we operationalized ratings as being ''fulfilled'' (2 points), ''partially fulfilled'' (1 point) or ''not fulfilled'' (0 points) for all items. We evaluated the quality of the included studies separately for each of these 19 criteria. Additionally, we calculated a composite score as the mean across all items, which ranged from zero to two, indicating low quality to high quality, respectively.

Data Analysis and Data Synthesis
We calculated standardized pre-post effect sizes as well as prefollow-up effect sizes. In the case that more than one follow-up measurement point was reported, we selected the first one following the end of treatment to calculate the pre-follow-up effect size. We conducted pre-post analyses for all subscales (a total of nine SCL-and eight IIP-subscales) and for the total scores of both instruments (SCL: Global Severity Index (GSI); IIP: Total Score). However, due to the small number of studies providing follow-up data for all subscales, we had to limit the pre-follow-up analyses to the total scores only.
Hedges' g [18] was applied to correct for bias due to small sample sizes, as Cohens' d is known to be upwardly biased when based on small samples [19]   To ensure comparability across different samples (e.g. homogenous samples of depressed patients vs. heterogeneous samples with mixed diagnoses) we used the mean standard deviation pooled across all samples to calculate the treatment effect for each single sample.
According to Cohen [20], an effect of d.0.20/0.50/0.80 can be considered as a small/medium/large effect. However, Cohen's classification refers to between-group effect sizes. To interpret a pre-post effect or a pre-follow-up effect as small, medium or large, this within-group effect must exceed the effect of an (untreated) control group by the reference value defined by Cohen. The mean effect size in untreated control groups in psychotherapy studies is about d = 0.10 [21] (p 708), [22]. Accordingly, to consider an effect size as small, medium or large, we firstly deducted 0.10 from the achieved effect sizes before applying the thresholds proposed by Cohen [20] in this meta-analysis.
To address the concept of clinical significance, we calculated the percentage of remitted patients for each sample based on the GSI of the SCL (see Formula 2). To differentiate between mentally ill and healthy subjects, we used a cut-off of c = 0.57 as suggested by Schauenburg and Strack [23].

c{Mpost) SDpost
Random effects models rather than fixed effect models were applied to aggregate effect sizes across studies (see DerSimonian & Laird [24]), as they do not assume that included studies are obtained from the same population [25,26]. By weighing each study effect with its inverse variance, smaller samples contributed less to the effect than larger samples. Results were tested for statistically significant differences to zero (two-tailed tests). To test for heterogeneity of effect sizes, we calculated Q statistics [18] as well as the I 2 index [27]. While the Q statistic tests for significant Table 1. Inclusion Criteria (PICOS [16,47]).

Participants
Adults (18-65 years) with mental disorders (according to ICD-10, Chapter V) Table 2. Studies included in the meta-analysis.

Moderator analyses
To address differences between the included studies, we performed moderator analyses by calculating meta-regressions via restricted maximum likelihood, weighted by the inverse variance of the particular criterion. We calculated univariate correlations between potential moderators (sample, intervention and study characteristics as reported in Table 2) and treatment effect (Hedges' g) in SCL GSI respectively IIP Total and reported the standardized beta-weights and the p-values.

Risk of Bias across Studies
To reduce the risk of bias, we included published as well as unpublished studies. Due to difficulties in identifying unpublished studies, we also calculated Egger's test [28] and provided the standardized beta-weights (B) For this test we considered the results with a p-value of p,0.10 (two-tailed) as significant to estimate publication bias conservatively -as recommended by Egger et al. [29]. Positive correlations between the standard error and the effect size indicate a ''small study bias'' while negative correlations indicate that small studies show lower outcome values.

Software
For all calculations we used SPSS 15.0 [30], supplemented by a macro for meta-analysis by David B. Wilson [31].

Study Selection
Based on the inclusion criteria, our search resulted in 59 studies which were described in 67 different publications (see Figure 1 and Table 2). For 34 articles of which results were incomplete, we contacted the authors. 30 authors answered, of which 20 were able to provide the relevant information. Some studies (''i'') were described in several publications (''j'', cf. Figure 1). Since some publications describe more than one sample (for example different diagnostic groups or samples which received different treatments) and do not report data for the total sample, the total number of extracted samples is k = 96. All samples, which received psychotherapeutic treatment and had available outcome data were included.

Study and Publication Characteristics
Except for one study conducted in Germany and Switzerland, all studies were conducted exclusively in Germany. The majority of papers (85.1%) were published after 1999 and all others were published from 1993 to 1999. Seventy-five percent of the studies were published in scientific journals, 15 percent in books or book chapters, nine percent were not formally published and one study was published as a scientific report.

Sample Characteristics
The majority of samples were recruited from psychodynamic treatment settings. The mean treatment duration ranged from 20 to 183 days (M = 80.33, SD = 33.07, Median = 72.70), follow-up duration ranged from three to 41 months (M = 13.22, SD = 8.09, Median = 12.00). Socio-demographic, socio-medical and clinical characteristics were typical for inpatient psychotherapeutic samples in Germany. Depressive disorders were the most common   diagnoses (see Table 3). For 21.9 percent of all samples, some information about psychopharmacological treatment was available. In almost all of these samples, some -but not all -patients used medication if indicated. In one sample, patients had to be medication-free before inclusion and received a pharmacological placebo during their inpatient stay.

Quality Criteria
The mean quality score ranged from 0.50 to 1.78 (M = 1.24, SD = 0.27). Some criteria (i.e. definition of follow-up period) were fulfilled in almost all studies, while others (i.e. description of missing data handling) were rarely fulfilled (see Figure 2). Five percent of the studies used randomized controlled designs, 29 percent used quasi-experimental designs and 66 percent used observational designs.

Outcome: Symptom Severity
Treatment effects on global symptom severity (GSI of the SCL) had a medium size at discharge (see Table 4) as well as at followup, although there was a slight reduction in effect size to follow-up (see Table 5). Taking into account the defined critical values, four percent of all samples showed no meaningful improvement at discharge, 28 percent showed an improvement of a small effect size, 49 percent showed an improvement of a medium effect size and 20 percent showed an improvement of a large effect size. No sample showed an aggravation of symptoms. Mean effects on the SCL subscales ranged from g = 0.46 ('Anger/Hostility') to g = 0.84 ('Depression'). All mean effects differed significantly from zero.

Outcome: Interpersonal Problems
Regarding interpersonal problems (Total Score of the IIP), an improvement of a small effect size was found at discharge (see Table 4), which slightly increased but remained a small effect size at follow-up (see Table 5). Follow-up measurement points for these samples ranged from three to 24 months (M = 14.14, SD = 7.49, Median = 12.00). While 35 percent of all samples showed no substantial change at discharge (g,0.30), 65 percent showed improvement with 51 percent of these achieving a small and 14 percent a medium effect size. Improvement on all subscales differed significantly from zero and ranged from g = 0.06 ('Domineering/Controlling') to g = 0.36 ('Socially Inhibited').

Outcome: Remission rates
Considering the cut-off score for the GSI of the SCL as provided by Schauenburg and Strack [23], 36 percent of patients (range: 0-56%) had achieved remission at discharge. Referring to samples with follow-up data, 36 percent (range 0-51%) had achieved remission at discharge and 32 percent (range 0-41%) had achieved remission at follow-up.
Pre to follow-up effects showed no significant heterogeneity, however the I 2 score for the IIP Total indicated a small amount of heterogeneity. The SCL GSI's I 2 at follow-up was smaller than 25 percent, indicating low heterogeneity (see Table 5).

Moderator analyses
To explain the heterogeneity in treatment effects, we examined the percentage of females, mean age, diagnostic composition (homogeneous vs. heterogeneous diagnostic groups), impairment at intake, type of treatment (cognitive-behavioural, psychodynamic or mixed), treatment duration, publication year as well as the mean study quality as potential moderators. In both SCL GSI and IIP Total, only impairment at intake (SCL GSI: b standardized = 0.28; Values refer to the available data. The majority of studies provide data concerning treatment characteristics as well as basic population characteristics (sex and age). About half of the studies provide data concerning marital status, education and diagnoses. However, there is a lack of data on partner status, employment situation, illness duration and comorbidity. Only one quarter or less of all studies provides this relevant data.

Risk of Publication Bias
We calculated the risk of publication bias with the Egger's test. Results showed a significant asymmetry (p,0.10) regarding the SCL's GSI, the SCL's subscales 'Obsessive Compulsiveness', 'Interpersonal Sensitivity', 'Depression', 'Anger/Hostility' and 'Paranoid Ideation' as well as for the IIP subscale 'Socially Inhibited' (see Table 4). In all of these scales, smaller studies showed lower effects (see Figure 3).

Discussion
This study represents the first meta-analysis on the effectiveness of psychotherapeutic hospital treatment in Germany. There is a substantial data base of 59 included studies that applied either the Symptom Checklist (SCL [12]) or the Inventory of Interpersonal Problems (IIP [13]) as an outcome measure.
It can be concluded that psychotherapeutic hospital treatment shows positive outcomes for both psychopathological symptoms and interpersonal problems. However, the effects in the two domains differ in their magnitude and pattern. Symptom reduction reaches a medium effect size at discharge but the effect slightly decreases between discharge and follow-up. On the other hand, interpersonal problems are reduced at a slower pace and are less substantial in the short term, yet they continue to decrease from discharge to follow-up. Similar results have been reported by Barkham et al. [32] who also found higher effect sizes in symptom improvement than in interpersonal problems. These findings correspond to Howard's phase model of psychotherapy outcome [33] that indicates three phases of outcome (i.e. remoralization, remediation and rehabilitation). According to this model, the first improvements are expected to occur in subjective well-being, which then allows for a symptom reduction. Symptom reduction on its own seems to be a necessary condition for improvement in life functioning, including interpersonal functioning.
An additional explanation for lower effect sizes concerning the IIP might be that we combined the IIP Total Score and individual subscales across all different kinds of samples. This resulted in a reduction of information, as different patient samples show different patterns of interpersonal problems. The simple mean value calculation, however, does not consider the circumplex structure of interpersonal problems [34].
Compared to the samples of inpatients treated in psychosomatic rehabilitation clinics in Germany which were included in the meta-analysis conducted by Steffanowski et al. [9], the present samples of inpatients treated in psychotherapeutic hospitals showed a higher symptom severity at intake (M = 1.31, SD = 0.67 vs. M = 1.14, SD = 0.69, d = 0.25). Correspondingly, symptom reduction at discharge (d = 0.76 vs. d = 0.66) and at follow-up (d = 0.62 vs. d = 0.46) was higher as well. However, when interpreting these differences in effect size, one needs to consider several possible reasons favoring hospital treatment. First, a higher initial symptom load may allow for a higher symptom reduction. Second, hospital treatment is characterized by a longer duration than rehabilitation treatment (twelve vs. eight weeks on average across the included studies). Third, as the present analysis was able to include more recent studies than the MESTA study, more advanced treatment concepts may in part account for the existing differences.
On the one hand, it can be concluded that there is variation regarding the effectiveness of psychotherapeutic hospital treatment differing between the included scales. The heterogeneity index varied from 0 percent (SCL 'Psychoticism') to 71 percent (SCL 'Phobic Anxiety'). As we included almost no sample with psychotic patients, the high homogeneity between samples regarding 'Psychoticism' appears plausible. With regard to 'Phobic Anxiety', our results may reflect the fact that certain symptoms are more prevalent in some samples than in others. On the other hand, considering the diversity of patients and treatments in the included studies, the finding that no subscale showed a large heterogeneity (all I 2 ,75%) indicates the presence of basic similarities in the treatments under study. Surprisingly, the follow-up results show only a small (respectively smaller) heterogeneity even though there was a high variability in follow-up intervals (ranging from three to 41 months). The variety of heterogeneity between different subscales corresponds to typical characteristics of the investigated sample.
None of the investigated patient characteristics except the impairment at intake correlated significantly with the treatment effect. In accordance with Bohart and Greaves Wade [35], samples with higher impairment at intake show larger changes during treatment. Concerning treatment characteristics, only the treatment duration was associated with the effect sizes; interestingly, the results differed between the two investigated outcome measurements. With regard to symptom severity, longer treatment duration is associated with lower effect sizes, whereas regarding  interpersonal problems, samples with longer treatment durations showed larger effect sizes. Generally, the relation between outcome and treatment duration is not a simple one: While dose-response-models [36,37] postulate that treatment duration affects outcome (higher response rates in longer treatment), the good-enough-model [38] implies that symptom change predicts treatment duration (longer treatments in severely disturbed patients). In this context, our findings may reflect the reality that symptom change constitutes a primary outcome of inpatient psychotherapy while change in interpersonal problems constitutes a more secondary goal. However, our data do not allow for any more detailed interpretations. To clarify these relations, further studies are required.
The quality of included studies was not significantly associated to the treatment effect, accordingly there is no evidence that low quality studies overestimate the treatment effects in this metaanalysis.
One major limitation of this meta-analysis may be seen in the lack of randomized controlled trials (RCTs) addressing the efficacy of psychotherapeutic hospital treatment. This lack of RCTs may be attributed to the specialties of the German health care system and its indication standards for inpatient and outpatient psychotherapy: As inpatient psychotherapy is considered to be the indicated and available treatment option for seriously disturbed patients in Germany, an allocation to a treatment condition of lower intensity (i.e. outpatient treatment or waitlist) would be considered unethical. Therefore, any study aiming at evaluating the efficacy of psychotherapeutic hospital treatment by use of an RCT design would be disapproved by the local ethics committee.
Correspondingly, the only existing RCTs in this field compare different treatment conditions within inpatient psychotherapy [39,40] or -on rare occasions -inpatient to day clinic treatment [41].
As a consequence, our analysis had to focus on observational or quasi-experimental pre-post/pre-follow-up comparisons. Thus, this meta-analysis does not allow causal interpretations. Changes cannot exclusively be attributed to the psychotherapeutic treatment but may also be caused by spontaneous remission or other confounding influences. In addition, as psychotherapy is only one part of the multimodal inpatient treatment concept, the proportion of improvement caused by psychotherapeutic interventions in a narrower sense remains unclear. Since the application of psychopharmacological treatment is rarely described, analyses on the influence of medication were not feasible in this meta-analysis. In one of the included randomized controlled trials, the combination of behavior therapy and fluvoxamine was superior to behavior therapy and placebo in patients with obsessivecompulsive disorder regarding obsessions and depressive symptoms but not superior regarding compulsions [39]. In another randomized trial, the application of interpersonal psychotherapy additional to pharmacotherapy showed a higher reduction of depressive symptoms compared to pharmacotherapy plus clinical management, but was not superior regarding social and interpersonal functioning [40]. Cuijpers et al. [2] found a small statistically significant additional effect favoring psychological treatments compared to usual care and structured pharmacological treatment in depressed inpatients. Regarding these results, one can assume that the psychotherapeutic treatment itself is an effective factor in this setting -at least in some outcome areas. Schauenburg and Strack [23] reported data for the SCL's GSI in large German psychotherapy samples (outpatients, M = 1.12, SD = 0.57; inpatients M = 1.29, SD = 0.70). In accordance with the indication for inpatient psychotherapy, patients in our sample show higher impairment at intake than outpatients (SCL GSI: M pre = 1.31, SD pre = 0.67; d = 0.31). At discharge, the patients we studied were less disturbed than typical outpatients (SCL GSI M post = 0.79, SD post = 0.60; d = 20.56) but still more than twice as impaired as the German norm population (M = 0.33, SD = 0.25, d = 1.08 [23]). Still, one third (36%) of the examined patients reached remission.
To date, established criteria to classify within-group (e.g. prepost) effects are lacking. We addressed this problem by deducting the effect sizes occurring in untreated control groups in (outpatient) psychotherapy studies [21,22] from our calculated effect sizes before applying the critical values which have been proposed by Cohen for the interpretation of between-group effects [20]. However, this provisional approach certainly requires further validation.
A possible imprecision of effect size calculation could as well have arisen from lacking information about pre-post-correlation in outcome measures, which did not allow the consideration of interdependence [25].
Methodological weakness of included studies is often criticized as one major source of bias in meta-analyses. To do justice to the complex relationship between study quality and outcome of psychotherapy, we carried out an extensive complementary project on this issue [17]. Based on a comprehensive review of the literature and an expert rating, we selected 19 relevant quality criteria to quantify the quality of the included studies. With a mean score of M = 1.24 (SD = 0.27) on a scale ranging from 0 = 'low quality' to 2 = 'high quality', the overall quality of the included studies may be considered as medium. However, study quality varies considerably over different studies and different criteria. Especially in terms of dealing with dropouts, more detailed information in original papers is required. In spite of these limitations, there is no evidence that low quality studies distort the results of this meta-analysis since no correlation was found between study quality and outcome. Although our approach allows for a sophisticated appraisal of relevant quality criteria, especially with regard to non-randomized studies, there are no benchmarks available until now, since this is the first application of our checklist.
The majority of the studies' outcome parameters showed no significant results in Egger's test. As the few significant results indicated overally smaller effects in smaller studies, there was no evidence for a small study bias.
Some studies provided more than one publication, which complicated the process of data abstraction and data aggregation since the different publications sometimes focused on partially overlapping subgroups. We emphasized on including all relevant information without integrating data from overlapping subgroups in our calculations.
Data regarding employment status, illness duration and comorbidity were incomplete in many cases, which limited the representativeness of the overall sample description. Heterogeneous classifications of socio-demographic variables complicated a consistent data aggregation. Fortunately, at least data on the therapeutic approach, age, sex and the main diagnoses were nearly complete.
The SCL is a well-established instrument in psychotherapy research. It is able to differentiate between subjects with and Table 5. without a psychiatric disorder and is qualified for measuring change in outcome studies [42]. The SCL GSI shows a high internal consistency [42,43], while the results on the subscales are inconsistent [42,44]. Previous studies show that most of its subscales measure one broad dimension of general symptom distress and are not suitable to differentiate between various diagnostic groups, therefore the concept of multi-dimensionality is doubtful [42][43][44][45][46]. The IIP scales are dominated by this general factor as well, but also showed high loads on three factors on interpersonal behavior and interpersonal problems identified by Tran et al. [43]. On the one hand, it is therefore questionable whether the IIP provides relevant additional information. On the other hand, our results show different results in IIP compared to SCL, justifying the application of both measures. Even if the factorial validity of the subscales is doubtful, we reported these results to provide benchmarks for facilities applying these scales for evaluation purposes.
As the high number of included studies involved an immense effort regarding the data extraction, results of this extensive metaanalysis were not available until more than four years after the end of the literature search. Although we expect some relevant studies to be published during this period, the included studies may still be regarded as being up-to-date. Due to the large number of included studies, we do not expect that a small number of new studies would change the results significantly.
Due to the restriction of the electronic literature search to the primarily German database PSYNDEX, some exclusively English publications may have been missed. However, this risk of bias can be assumed to be low as PSYNDEX comprises more than 500 English journals and electronic search was complemented by a comprehensive hand search.

Conclusion
In spite of all methodical limitations in this meta-analysis, there is evidence that psychotherapeutic hospital treatment shows positive outcomes regarding symptom severity as well as interpersonal problems in severely disturbed patients. To clarify the relations between symptom severity, interpersonal problems and treatment duration, further research is required.