Effects of telephone-based health coaching on patient-reported outcomes and health behavior change: A randomized controlled trial

Objective Telephone based health coaching (TBHC) seems to be a promising approach to foster self-management in patients with chronic conditions. The aim of this study was to evaluate the effectiveness of a TBHC on patient-reported outcomes and health behavior for people living with chronic conditions in Germany. Methods Patients insured at a statutory health insurance were randomized to an intervention group (IG; TBHC) and a control group (CG; usual care), using a stratified random allocation before giving informed consent (Zelen’s single-consent design). The TBHC was based on motivational interviewing, goal setting, and shared decision-making and carried out by trained nurses. All outcomes were assessed yearly for three years. We used mixed effects models utilizing all available data in a modified intention-to-treat sample for the main analysis. Participants and study centers were included as random effects. All models were adjusted for age, education and campaign affiliation. Results Of the 10,815 invited patients, 4,283 returned their questionnaires at baseline. The mean age was 67.23 years (SD = 9.3); 55.5% were female. According to the model, TBHC was statistically significant superior to CG regarding 6 of 19 outcomes: physical activity in hours per week (p = .030) and in metabolic rate per week (p = .048), BMI (p = .009) (although mainly at baseline), measuring blood pressure (p< .001), patient activation (p< .001), and health literacy (p< .001). Regarding stages of change (p = .005), the IG group also showed statistically different results than the CG group, however the conclusion remains inconclusive. Within-group contrasts indicating changes from baseline to follow-ups and significant between-group comparisons regarding these changes supported the findings. Standardized effect sizes were small. TBHC did not show any effect on mental QoL, health status, alcohol, smoking, adherence, measuring blood sugar, foot monitoring, anxiety, depression and distress. Campaign-specific subgroup effects were detected for ‘foot monitoring by a physician’ and ‘blood sugar measurement’. Conclusion TBHC interventions might have small effects on some patient reported and behavioral outcomes. Practice implications Future research should focus on analyzing which intervention components are effective and who profits most from TBHC interventions. Registration German Clinical Trials Register (Deutsches Register Klinischer Studien; DRKS): DRKS00000584


Introduction
Due to better medical treatment and changes in demographics, an ageing population will result in increasing numbers of people living with chronic conditions.In 2010, 15% of Europe's population was older than 65 years; prognoses expect an increase to 25% in 2050 [1].In Europe, chronic conditions account for 80% of the mortality; particularly diseases of the circulatory system account for nearly 50% [1,2].In addition to those patients affected by one chronic condition (24.3%), the proportion of multimorbid patients is very high: 13.8% had two and 11.7% had more than three chronic conditions [3], resulting in a reduction in life expectancy by about 1.8 years with each additional chronic condition for a 67-year old individual [4].Besides accounting for most part of the health care expenditures and lost work productivity [5,6], chronic conditions also have a large impact on the individual living with it.Cardiovascular diseases for example are responsible for the most lost "disability-adjusted lifeyears" (DALYs) in low-and middle-income countries in Europe, and the third most lost DALYs in high-income countries [1,7].
However, an existing chronic condition and its impact on a person's life may be modifiable in several ways, such as adopting better health behaviors and a better self-management.Studies show that the consumption of alcohol and tobacco, as well as high blood pressure are the three most important risk factors predicting a higher disease burden [1].Therefore, current treatment guidelines, like the NICE guideline for managing diabetes [8], as well as the disease management program (DMP) guidelines for diabetes, breast cancer, asthma, and coronary heart disease in Germany [9], include self-management trainings and lifestyle change as a part of the medical treatment of many chronic conditions.Also in the US, the enhancement of self-management abilities and patient empowerment are major goals, as stated in the "Strategic Framework on Multiple Chronic Conditions" of the US Department of Health and Human Services [10].
Meta-analyses show that self-management interventions can improve quality of life (QoL) as well as disease-specific outcomes and decrease health care costs [11][12][13].Besides self-management courses in group settings [14], one promising approach to improve self-and diseasemanagement in people living with chronic conditions is telephone-based health coaching (TBHC), which is more accessible to people living in rural areas or having limited mobility.However, evidence regarding the effectiveness of TBHC is still inconclusive: Reviews conclude that TBHC may have beneficial effects on some clinical, behavioral, and psychosocial outcomes [10,15,16], but the heterogeneity, for example in duration and delivery method (e.g.calls, video calls, short messages, automated messages), of the TBHC interventions included, as well as the narrative review methods, make it difficult to draw clear conclusions.
Studies reporting the effect on patient-reported outcomes, like QoL, mental status, and distress are rare and have not been thoroughly summed up.A systematic review about the effectiveness of TBHC for chronic conditions found that results regarding psychosocial outcomes were quite inconclusive [15].Most studies reported no effect of TBHC on "overall QoL" [17][18][19][20][21][22], but there were effects on "physical QoL" [19,[22][23][24].Although not the focus of most of the studies, the effects of TBHC on anxiety of people living with chronic conditions have been quite positive [25,26].On the contrary, effects of TBHC on depression were mixed, but more likely to show no effect in favor of TBHC compared to controls [17,19,25,27,28].
Most existing studies were conducted in the United Stated, Australia, and the United Kingdom, which leads to different health systems and quite diverse populations, especially in countries with huge rural areas.There is limited data on TBHC in Europe; in particular there is no study on TBHC outcomes in Germany besides the present study [43,44] and its pilot study [45].The evaluation of health economic outcomes of this study showed no effect of the TBHC on the time until and probability of rehospitalization, number of daily defined medication doses (medication), as well as frequency and duration of inability to work.Nevertheless, there was a reduction of hospitalization in participants with heart failure, and a reduction of mortality in participants with chronic somatic conditions [44].

Objectives
The aim of this study was to evaluate the effects of a TBHC for people living with chronic conditions on 1) QoL, 2) health behaviors, (e.g.treatment or medication adherence, smoking, and alcohol consumption), as well as 3) psychosocial outcomes, (e.g.depression, anxiety, health literacy, patient activation and stages of change) compared to a CG of patients receiving usual care.

Materials and methods
The methods, including design, randomization process and all measures have been described elsewhere [43,44].The primary outcome of this study was "time from enrolment until hospital readmission within two years" which was assessed using routine data of the statutory health insurance data set.Together with further health economic outcomes (e.g.health related costs, inability to work, and mortality) the corresponding results are presented and discussed elsewhere [44].Here we report findings on patient-reported secondary outcomes of the study.

Study design
In this 4-year (June 2010 to October 2014) prospective, pragmatic randomized controlled trial (RCT) we compared participants receiving a TBHC intervention with patients in usual care.Using Zelen's single-consent design, patients were randomized into IG or CG before giving informed consent [46] due to ethical reasons within the statutory health insurance.If patients declined TBHC, they received usual care.In addition to baseline measure (T 0 ) there were three follow-up measures at 12 months (T 1 ), 24 months (T 2 ), and 36 months (T 3 ).A study protocol reporting rationale, study design and statistical analysis procedures has been published a priori [43].
The study complied with the Helsinki Declaration 2008.The ethics approval was granted by the Hamburg Medical Chamber Ethics Committee on the 12.05.2011(process number: PV3567).The study was registered in the German Clinical Trials Register (DRKS00000584).The trial was registered nine months late due to unexpected delays in the course of contract negotiations with the funding institution (Kaufma ¨nnische Krankenkasse Hannover: KKH) and resulting problems with timely recruitment of scientific staff.Nevertheless, registration was completed before any data were analyzed.The authors confirm that all ongoing and related trials for this intervention are registered.The consort checklist can be found in S1 Table.

Sample
The sample consisted of patients insured at the statutory health insurance KKH, which met the inclusion criteria within the recruitment period.

Inclusion criteria and exclusion criteria.
Study participants were adults (� 18 years), insured at KKH, and diagnosed with at least one chronic condition.Based on the diagnosed condition, eligible persons were grouped into different campaigns: The "chronic campaign" was utilized for type 2 diabetes, hypertension, and coronary artery disease; the "heart failure campaign" for heart failure patients, and the "mental health campaign" for chronic depression and schizophrenia.For the "chronic campaign" there were two different identification ways: For 'chronic campaign 1' patients were identified by a previous hospital stay, for 'chronic campaign 2' patients were identified by a risk score.If an insurant has more than one chronic conditions diagnosis, he is grouped in the most specific campaign in following order: "mental health campaign", "heart failure campaign", "chronic campaign".Insurees were excluded if they were not able to understand German, had hearing impairment, or were not able to use a phone [43].

Procedures.
Successive recruitment took place between June 2010 and October 2011, with the follow-ups exactly 1, 2 and 3 years later.The members of the randomized IG received an invitation to take part in the TBHC and an acquisition call by the health insurance nurses.After sending back the informed consent for taking part in the intervention to the health insurance, they were included in the study as "TBHC participants".In case they did not send back the required confirmation they were grouped as "TBHC decliners".The randomized CG did not receive an invitation.To avoid bias also decliners received questionnaires, but due to economic reasons, decliners group and CG were randomly limited to 3,000 patients.All included patients received the consent form for the study with the baseline questionnaire, which had to be sent back to the research institute for taking part in the study.Patient reported data were collected by questionnaires sent to the insured persons' home by the statutory health insurance.

Randomization.
All eligible persons were blindly randomized by a computer algorithm at the statutory health insurance's headquarters to either IG or CG.Inclusion and randomization process took place gradually over a period of 14 months.We used a stratified random allocation design based on sociodemographic variables available in routine data.The randomization process was carried out by the statutory health insurance.

2.2.4
Blinding.Blinding of study participants and coaches was not possible, as the coaching is provided one-to-one.However, the coaches did not know who did answer the questionnaires and who did not.The questionnaires were pseudonomyzed, to enable an aggregation of data from different sources.

Sample size calculation.
The a priori power calculation showed that an overall minimum of 1,670 patients were needed at T 2 to be able to detect a small standardized mean difference (Cohen's d of 0.2) in group comparisons in order to achieve a power of at least 95% at a type I error rate of α = 5% in a two-sided test accounting for the unbalanced group allocation [47].Based on experiences in the pilot study showing low response and high drop-out rates [45], we targeted to invite 12,000 patients to participate, but achieved 10,815.

Intervention
The intervention is described following the TIDieR checklist [48] in S2 Table .The TBHC concept was originally developed by Health Dialog Inc. [49,50], adapted to the German health care system and, subsequently, widely implemented by the health insurance KKH.A pilot study indicated that the intervention was well accepted by the participants [45].Important components and counselling strategies were motivational interviewing (MI) to increase willingness to change and confidence to implement changed behaviors in daily life, individual and collaborative goal setting, and shared decision-making (SDM) [49,50].SDM focused on shared information on advantages and disadvantages of health behaviors and a joint decision.The set goals were recorded by the coach and followed-up in the upcoming calls.The intervention was tailored to important chronic conditions that require similar self-management strategies in the three campaigns "chronic campaign", "heart failure campaign", and "mental health campaign".Although patients were identified for the 'chronic campaign' in two different ways, the intervention was the same.
The intervention was conducted by 20 nurses and one ecotrophologist located in two call centers (Munich and Halle/Saale).The coaches were trained in TBHC with MI and SDM components by experts directly trained by Health Dialog.They were supervised two to four times per year by two experienced supervisors from the project group (MH,IBB).
The minimum call frequency was defined as one telephone contact every six weeks with a maximum intervention duration of one year.Specific intervention manuals for the coaches regarding different situations (e.g. for smoking cessation), available topics, and accessible information materials provided support for the coaches.Also, the coaches were assisted by an online health platform (www.netdoktor.de)providing evidence-based and up-to-date health information.NetDoktor is a health portal written and edited by health professionals, certified by HONcode (www.hon.ch) and related to the afgis criteria (www.afgis.de),two quality certifications for reliable online health information.Data on the coaching process, individual goal setting, medication, and clinical parameters (e.g.Hb A1c and blood pressure) were recorded by the coach in an electronic documentation system.Written patient information for specific conditions, medication plans, and weight-control tables could be sent to the TBHC participants.Additionally, participants in the heart failure campaign got a booster call, in which the coaches checked whether participants maintained health behaviors (e.g., weighing and medication adherence).
The CG received no coaching.

Measures
We assessed changes in QoL with the subscales "mental QoL" and "physical QoL" of the "Short Form 12 Health Survey" (SF-12) [51] and the health status with the visual analogue scale of the "EuroQol-5 Dimension" (EQ-5D) [52].
Health behaviors (alcohol consumption, medication adherence, exercise) were assessed with the "Alcohol Consumption Questions of the Alcohol Use Disorders Identification Test" (AUDIT-C) [53], the "Medication Adherence Report Scale" (MARS-D) [54], and the "Freiburg Questionnaire for Physical Activity" (FFKA) [55].The FFKA calculates the activity in hours per week and metabolic rate per week.We used self-developed, ordinally scaled instruments for the assessment of the rate of measuring blood pressure (1 = not until now, 2 = not regularly, 3 = weekly, 4 = mostly once a day, 5 = twice a day or more), measuring blood sugar (1 = not until now, 2 = not regularly, 3 = mostly once a day in the morning, 4 = twice a day or more when eating), foot monitoring by themselves (1 = not until now, 2 = not regularly, 3 = once a week, 4 = daily), foot monitoring by their physician (1 = unnecessary, 2 = once in the last year, 3 = twice or more in the last year).
Other psychosocial outcomes included patient activation with the German version of the "Patient Activation Measure" (PAM) [56], health literacy with the "Functional Communicative Critical Health Literacy" (FCCHL) [57], and the process of behavior change with an adaptation of the "Stages of Change across 10 Health Risk Behaviors for older Adults" (SOC) [58].Changes in depression and anxiety were assessed with the "Hospital Anxiety and Depression Scale" (HADS) [59].Additionally, we assessed socio-demographic factors like age, nationality, sex, marital status, number of children, net income, years of school, level of education and occupation, as well as clinical parameters with self-developed, ordinally scaled items.All outcomes and times of assessment can be found in Table 1.

Statistical methods
We applied three different analyses to minimize participation bias.We followed an intentionto-treat approach including available data from all patients randomized to IG (TBHC participants and TBHC decliners) in one analysis to avoid bias (intention-to-treat 1, ITT-1).Thus, as the majority of the study participants invited to the IG declined participation in the allocated intervention, we ran two additional analyses: one comparing the TBHC participants only (i.e.removing decliners) with the CG (intention-to-treat 2, ITT-2) and finally, in the as-treated (AT) analysis we compared TBHC participants with a minimum of 5 calls to the CG (S1 Fig) .We defined the ITT-2 analysis as our main outcome.Decliners were not added to the CG group to prevent a larger bias in ITT-2 and AT.
Chi-square tests (for categorical outcomes) and ANOVA tests (for dimensional outcomes) were used to compare groups at baseline.
Mixed models with maximum likelihood estimation were used to test the impact of health coaching on the course of outcomes from baseline across the three follow-ups compared to routine care.In all models, intervention group (IG and CG), time (t 0 , t 1 , t 2 , t 3 ) and the interaction between group and time ('time x group') were set as fixed effects.Participants and study centers were included as random effects.Due to group differences, models were adjusted for the campaign they were in ("chronic campaign" ('chronic campaign 1', 'chronic campaign 2'), "heart failure campaign", "mental health campaign") and some sociodemographic variables (ITT-1: education; ITT-2: education, age; AT: education).In contrast to the health economic publication [44] we decided to subdivide the chronic campaign into its two smaller subgroups, which differed regarding their inclusion criteria, to avoid a possible bias and detect potential group differences.
The effect of health coaching was estimated based on the interaction between group and time.For group comparisons we calculated standardized between-group effect sizes (Cohen's d) by dividing the estimated marginal means difference (EMM difference) of the groups by the observed standard deviation of the CG group.Post hoc interaction contrasts between group and time, i.e. the difference between the two groups in the amount of change from baseline (t 0 ) to post-intervention measurements (t 1 , t 2 , t 3 ), indicative of treatment effects, were determined.
Additionally, we tested whether campaigns moderate the effectiveness of the health coaching intervention, including the interaction between group, time and campaign (along with all lower level interactions) as a fixed effect in the analysis.
Across all tests, we considered results with p < 0.05 as statistically significant.Mixed model analyses were performed with R version 9.2 [60] using the lmer command from the lme4 package.Differences between the groups at baseline, imputation of missing values, as well as observed values were conducted with IBM SPSS 23 [61].
Although stated otherwise in the study protocol [43], we preferred the mixed model approach over the ANCOVA, as it utilizes data across all measurement points simultaneously and is more robust against missing values.Also, we did not adjust for multiple comparisons given the explorative nature of this study.As the Bonferroni correction increases the type II  error in favor for decreasing type I error, we decided to follow Perneger (1998) to describe the results openly and discuss them carefully [62].

Missing values
In order to calculate sum scores, it was necessary to impute missing values on item level.First, to check whether the missing values were missing at random, we applied Littles MCAR test [63].It showed that missing values could not be considered to be missing completely at random at all times (t 0 , t 1 , t 2 , t 3 ).Therefore, we decided to use the expectation-maximization algorithm for imputing missing values on single item level across each scale at each time point, as it is assumed to be unbiased and efficient even though missing mechanisms may be unclear [64].Also, we decided to impute values just for those patients that provided more than 70% valid responses in accordance with Wirtz (2004) [65].If there were data missing due to lost to follow-up, there was no imputation done since mixed model analyses provide unbiased estimates under the assumption that data are missing at random conditional on the variables in the model [66].Therefore, we did not use last observation carried forward as initially planned [43].As the analysis will include all patients that responded at baseline, the sample will be as follows:

Baseline characteristics
Of the 4,283 patients that returned their questionnaires at baseline, 41.3% belonged to the participants, 30.2% declined participation, and 28.5% belonged to the CG.The mean age was 67.3 years (SD = 9.3).More than half were female (55.5%), and most of them were married (66.3%).They had an average of two children and a median household net income of 1,501 to 2,000 €.Patients went to school for an average of 9.6 years (SD = 2.0) and most of them completed an apprenticeship.The majority was retired (Table 2).
There were statistically significant differences between the IG and the CG for the primary and the sensitivity analysis (ITT-1, ITT-2, and AT).The IG had significantly more children than the CG although we considered the difference as only statistically but not clinically significant.Nevertheless, there was a difference between the groups regarding their level of education that was both statistically and clinically significant (p < .001).Participants more likely completed an apprenticeship than decliners and patients in the CG more often had a university degree.
In addition to the sociodemographic characteristics, the IG and CG showed further statistically significant baseline differences.For the ITT-1 analysis, there were differences in the physical subscale of the SF-12 (p = .016),and the reported health status (EQ5D-VAS) (p = .028),each with the IG reporting a slightly higher quality of life.Also, there were significant differences in the ITT-1 analysis for medical adherence (p = .006)with the IG being slightly more adherent, the BMI (p = .007),patient activation (p = .007),for health literacy (p = .004)and distress (p = .019).For the ITT-2 analysis, there were just differences regarding the BMI (p< .001)with the IG reporting a higher BMI than the CG.For the AT analysis, there are difference in the baseline values for the reported alcohol consumption (p = .030),with the IG drinking a bit less, medication adherence (p = .041),with the IG being slightly more adherent, and the BMI (p = .001)with the IG reporting a higher BMI (Table 3).All in all it can be subsumed, that the differences might be statistically significant, but the clinical significance is questionable.

Outcomes
In the main analysis, the ITT-2, a mixed model analysis was used to determine the group effect over all times of measurement ('time x group') controlling for level of education and age (Table 4).The estimated marginal means, their standard errors and the contrasts between baseline and t 1 , t 2 and t 3 for ITT-2 are presented in Table 5, accompanied by the differences in the contrasts between the groups, showing the differences in differences.

Quality of life.
Overall, groups did not statistically significant differ regarding the course of mental (p = .963)and physical quality of life (p = .441)from baseline to three years (SF-12; interaction effect 'time x group').Also, there was no significant difference between the groups over time concerning the health status reported with the visual analogue scale of the EQ5D (p = .147).
3.3.2Health behaviors.For health behaviors, there was no difference between the groups over time ('time x group') regarding alcohol consumption (p = .238),smoking (p = .531),medication adherence (p = .939),measuring blood sugar (p = .619),foot monitoring by themselves (p = .352),and foot monitoring by a physician (p = .720).Nevertheless, there was a significant 'time x group' interaction effect regarding the physical activity in hours per week (p = .030),although the 'time x group' interaction contrasts (adjusted group contrasts) between the groups were all non-significant.There was also a significant 'time x group' interaction effect regarding physical activity measured in metabolic rate per week (p = .048),which did not result in any significant difference in contrasts between the groups at any follow-up.Furthermore, there was a significant 'time x group' interaction effect on the BMI (p = .009).The 'time x group' interaction contrasts (adjusted group contrasts) were significant at t 1 between IG (adjusted EMM Diff = -0.15)and CG (adjusted EMM Diff = 0.15) by -0.30BMI points (p = .02),at t 2 between IG (adjusted EMM Diff = -0.16)and CG (adjusted EMM Diff = 0.26) by -0.41 BMI points (p = .002),and at t 3 between IG (adjusted EMM Diff = -0.21)and CG (adjusted EMM Diff = 0.09) by -0.31 BMI points (p = .03).Also, there was a significant 'time x group' interaction effect on measuring blood pressure (p < .001).The between-group differences in differences ('time x group' interaction contrasts) were statistically significant (p < .001) in the changes from baseline at t 1 between IG (adjusted EMM Diff = 0.18) and CG (adjusted EMM Diff = -0.01)with an adjusted group contrast of .19 and at t 2 (p = .03)with an adjusted group contrast of .11(IG: adjusted EMM Diff = 0.07; CG: adjusted EMM Diff = -0.04).

Psychosocial outcomes.
For psychosocial outcomes, there were several significant 'time x group' interaction effects: For instance, there is an overall effect on patient activation (p< .001),resulting in significant 'time x group' interaction contrasts in the changes from baseline to t 1 (p < .0001)with an adjusted group contrast of 0.95 (IG: adjusted EMM Diff = 0.71; CG: adjusted EMM Diff = -0.24),and also in the changes from baseline to t 2 (p = .01)with an adjusted group contrast of .69(IG: adjusted EMM Diff = 0.59; CG: adjusted EMM Diff = -0.10).Also, there was a significant overall 'time x group' effect for health literacy (p < .001),with significant 'time x group' interaction contrasts (adjusted group contrasts) in the changes from baseline to t 1 of 1.02 (p < .001;IG: adjusted EMM Diff = 0.86; CG: adjusted EMM Diff = -0.16)), to t 2 of 1.60 (p < .0001,IG: adjusted EMM Diff = 0.92; CG: adjusted EMM Diff = -0.67))and to t 3 of 1.52 (p < .001;IG: adjusted EMM Diff = 0.89; CG: adjusted EMM Diff = -0.63)).Regarding stages of change there is a significant 'time x group' effect (p = .005).The 'time x group' interaction contrasts were significant for the change from t 0 to t 1 with an adjusted group contrast of -0.64 (p < .01) between IG (adjusted EMM Diff = -0.58)and CG (adjusted EMM Diff = 0.06), from t 0 to t 2 with an adjusted group contrast of -0.63 (p < .0;IG: adjusted EMM Diff = -0.63;CG: adjusted EMM Diff = 0.00) and from t 0 to t 3 with an adjusted group contrast of -0.70 (p < .01;IG: adjusted EMM Diff = -0.66;CG: adjusted EMM Diff = 0.04).Regarding anxiety (p = .646),depression (p = .758)and mental distress (p = .815)there were no significant differences between the groups over the time of three years.
A closer look at the changes from baseline (contrasts (adjusted EMM differences)) for both groups may give a more detailed impression of the different courses over time for each outcome (Table 5).There are statistically significant within-group changes from baseline in both groups.Nevertheless, the effect size of all contrasts were below .2indicating a small effect size.
The detailed results for ITT-2 including the estimated marginal means (EMM), the standard error (SE), the adjusted EMM differences between t 0 and each follow-up (including 95% confidence interval (CI)), the significance (p) of the adjusted EMM difference, as well as the effect size (Cohen's d) for each time of measurement for each group, as well as the between-group differences in the within-group changes (adjusted group contrasts), its 95% CI and significance (p) can be found in Table 5.The observed means and standard deviations for all analyses (ITT-1, ITT-2, AT) across all time points can be found in S3 Table .Estimated marginal means, their standard errors and estimated marginal differences by time (t 0 , t 1 , t 2 , t 3 ), adjusted for education for ITT-1 and AT can be found in S4 and S5 Tables.
3.3.4Moderator analyses.We conducted moderator analyses to examine whether intervention effects vary among campaigns ("heart failure campaign" (N = 397), "chronic campaign 1" (N = 993), "chronic campaign 2" (N = 2698), and "mental health campaign" (N = 190)).We found campaign-specific intervention effects for foot monitoring by physician (p = .036)and blood sugar measurements (p = .001).Detailed results of the subgroup analyses can be found in Table 6.The estimated marginal means for "measuring blood sugar" and "foot monitoring by physician" can be found in S6 Table.

Principal findings
The aim of this randomized controlled trial was to evaluate the effectiveness of a TBHC intervention for people living with chronic conditions on a variety of patient reported outcomes such as QoL, health behaviors, and psychosocial outcomes.We compared the effectiveness of a TBHC intervention for people living with chronic conditions, insured at a German statutory health insurance, to a usual care group.
In the a modified intention-to-treat analysis, comparing TBHC participants and CG (ITT-2), seven out of 19 outcomes showed significant overall intervention effects ('time x group') after controlling for education and age.Patients in the IG group differed significantly from patients in the CG group regarding their physical activity (hours per week and metabolic rate per week) and their BMI, which can be attributed to the central focus of the TBHC to promote exercise, to improve nutrition, and to provide information about these topics (see S2 Table 'TIDIER Checklist').With regard to blood pressure measurement, TBHC was found to result in more frequent blood pressure measurements, although the overall frequency of measuring blood pressure is significantly lower in the CG at all times.One explanation for this finding is that this is one of the main foci of the TBHC, including the use of a "blood pressure log book" for recording blood pressure measurements.Furthermore, it was found that TBHC results in greater (patient) activation, like taking charge over their health, and better health literacy.This suggests that the MI technique used by the health coaches is an effective counselling method that enhances motivation and thus acts as a catalyst to accelerate health behavior change.Nonetheless, in contrast to these findings, it was found that patients in the TBHC group were less motivated to change according to the stages of change (no willingness to change, considering change, preparation, taking action and maintenance) over time compared to patients in the CG whose willingness did not change.One explanation for this finding might be that participants receiving TBHC are more satisfied with their own health behavior and therefore see no reason to change.These findings are supported by a closer look at the contrasts comparing each follow-up to the baseline.The IG remains rather stable regarding their physical activity, whereas the CG shows a significant decrease in active hours per week.Nevertheless, comparing the groups regarding their differences from baseline there is no significant difference between the groups.With regards to the BMI the IG shows a continuous drop over the years, while the CG shows an increase in BMI (especially at the 2 years follow-up).This difference from baseline is also statistically different between the groups at all times.Concerning the overall frequency of measuring blood pressure and the patient activation the IG reveals a short-(t 1 ) and mid-term (t 2 ) increase that does not remain until t 3 .The change from baseline is also statistically significant between the groups for t 1 and t 2 .The significant improvement in health literacy in the IG remains stable over all follow-ups, whereas the CG shows a significant decline of health literacy at t 2 and t 3 , which is also statistically significant in the post hoc interaction contrasts.The decrease of the stages of change in the IG remains stable over all followups, while there is no statistically significant change in the CG.The difference in difference analysis also supports this finding by showing statistically significant group differences in change over all times.These inconsistent findings warrant further investigation in future clinical trials.
Finally, the absence of significant effects requires some comment.There were no effects on QoL (mental, physical QoL and health status), alcohol consumption, smoking, medication adherence, the amount of measuring blood sugar and foot monitoring (by the participants or a physician), anxiety, depression and mental distress.One plausible explanation for the absence of significant effects could be that a change in QoL, health status, anxiety and depression depends on too many other complex and multifaceted factors to be merely influenced by TBHC and the engagement in the desired health behaviors alone such as regular exercise and a healthy diet.The change of addictive behaviors like smoking or alcohol consumption might be too challenging for a broad telephone intervention that focuses on more than one health behavior.The absent effect on medication adherence might be due to problem of the high ceiling effects of the MARS-D and possible effects of social desirability as the TBHC was conducted by their health insurance.
There were just two moderator effects between the study group and the campaigns on 'blood sugar measurement' and 'foot monitoring by a physician', suggesting limited variation in the treatment effects across campaigns.
The results of the AT analysis were very similar with slightly higher effect sizes.Therefore, we cannot state a dose-effect based on these analyses.There is future research needed to provide valid information on this.
In contrast with earlier findings in the literature, our findings do not confirm previous research reporting TBHC effects on anxiety [25,26], which can be attributed to a different focus of our TBHC intervention (S2 Table ).However, we found an effect of TBHC on the frequency of blood pressure measurements, physical activity and the BMI of participants, while most studies do not show effects on these parameters (blood pressure measurement [29,34,35,38], physical activity [23,[29][30][31][32][33][34][35][36][37], BMI [29-31, 38-40, 70-72]), although there are also studies that have similar results (blood pressure measurement [35,38,70], physical activity [25,28,30,37,38,41,70,72,73], BMI [17,25,74]).However, as our findings are exploratory, we can only assume why we see these effects.One possible reason could be that our intervention is more tailored to the needs of the patients, be it their condition, their leisure time activities or their delegation of responsibility regarding their condition.As we also found that the activation and health literacy is higher in the IG than in the CG, especially in the first year, it is possible that empowering the patient to take responsibility for their health actions is a good way to moderate effects on exercise, diet and BMI.
Furthermore, we tested outcomes that were not used so far.Health literacy has not been assessed in the evaluation of a TBHC so far, therefore our findings may be the first to indicate that TBHC could be an effective measure to promote participants' health literacy.For the Transtheoretical Model of Change and its stages of change there have only been longitudinal, nonrandomized studies that concluded, contrary to our findings no effects of TBHC [27,32,75].
Overall, these inconsistent findings indicate that the effectiveness of TBHC remains inconclusive given the spectrum of heterogeneous studies, although the effects of the TBHC intervention on psychosocial outcomes are partly in line with the known literature.That said, the data presented in this paper significantly adds knowledge to the existing body of literature regarding the effectiveness of TBHC.Despite some international and nationwide studies like Birmingham OwnHealth [76,77], TERVA [78], the DIAL study [24] and the Connection Program [79], there have not been many studies with such a large sample available for the assessment of patient reported outcomes.

Strengths and weaknesses
As this study included 4,283 patients in the analyses, a main strength is the large sample size.There are few studies in this field that include as many patients in the evaluation of patient reported data.The drop-out over this amount of time of 50.6% is as expected as it is comparable to the pilot study [45].We tried handle missing values conservatively: EM-imputation for item-level missings to compute scores for those that provide 70% of the scores data on the one hand, and the statistical analyses with mixed model analysis which is quite robust against missing values on the other hand.
The randomization process based on Zelen's single-consent design before informed consent ensures that patients participating in the intervention are clearly willing to take part in the intervention.Also, the participation rate is higher than in classic RCT designs leading to larger sample sizes [80].Nevertheless, an intention-to-treat analysis, in this case ITT-1, is necessary to avoid selection bias in Zelen's design.Therefore, the thorough statistical evaluation can be considered a strength of this study.
For good scientific practice a study protocol was published [15], and it was registered in the German Clinical Trials Register (GCTR).The inclusion of more than one of the most important chronic conditions, like diabetes mellitus type II and chronic heart diseases, increases generalizability.Also, this study provides high treatment fidelity, as it is manual-based; the coaches received regular supervision and the quality of the intervention was assessed regularly.A further strength is that this study was conducted in real routine health care.Therefore, generalizability and external validity are high.Also, the analysis of three yearly follow-ups enables the readers to assess the long-term effectiveness of this intervention.
Nevertheless, this study has some limitations.Although we employed an intention-to-treat approach (ITT-1) to avoid the participation bias, one could argue that this is not a flawless ITT, as we do not have complete data of all patients that were randomized, but of those, that replied at least at baseline [81].Therefore, the results must be interpreted with caution.It is possible that there are differences between those taking part and those who dropped out on the one hand and between those taking part in the intervention and those who declined the intervention on the other hand.As stated earlier, eligible insurants decided whether to participate after randomization-this could be a crucial cause for potential bias.57.4% of the randomized potential participants decided to decline participation.A comparison between those groups showed that participants were significantly less educated than decliners or CG.Therefore, we tried to control for this bias by adjusting the model by education and age as a fixed effects.Nevertheless, there is a possibility of other differences between the groups, as we did not assess all possible confounding variables.Additionally, we did not assess whether the patients have an immigrant background, which could also lead to some participation and group differences.Also, we lacked the insurants' clinical and physician data.Therefore, the severity of the patients' condition could not be considered.In addition, no data were available on the routine health care that has taken place alongside the intervention and during followup.Therefore, it was not possible to ascertain how much of a poorly controlled disease status is lack of adequate treatment.Another limitation could be that the chronic campaign was heterogeneous with different diagnoses, like diabetes mellitus type II, coronary artery disease etc. in need for different coaching targets.To prevent an even higher bias within the chronic campaign we divided it into two smaller subgroups that differed regarding their inclusion criteria.Therefore, the campaigns are slightly different to the manuscript reporting the health economic outcome [44].
Also, the questionnaires were sent out by the health insurance.Therefore, it is possible that there is a bias due to social desirability, despite making clear the pseudonomyzation.Also, patient reported outcomes are prone to social desirability bias and inaccurate reporting.Additionally, effect sizes of all statistically significant EMM differences (despite for health literacy and measuring blood pressure) were very small.Also, the clinical significance needs to be questioned as all statistically significant effects were practically very small and we did not adjust for multiple testing (Bonferroni).
Generalizability of the findings might also be limited, since those insured at this health insurance have a slightly higher socioeconomic status than at other health insurances due to historic reasons.Nevertheless, in Germany nearly all citizens are insured for health care as mandatory members of the public health insurance (86.2%) or private health insurances (10.6%) [82].

5.Conclusion and practice implications
Based on previous research and the results of our study, TBHC interventions might have small effects on some patient reported outcomes.It would be interesting to find out, which intervention components actually have an effect.So maybe a more disease specific approach could make it easier to distinguish between effective and ineffective components without disease specific variables diluting the results.Also, future research should focus on who exactly profits most from TBHC interventions and whether there are any differences regarding disease, multimorbidity or gender.and Dr. Isaac Bermejo (IBB) for the good cooperation in this project, the team of the Kaufma ¨nnische Krankenkasse Hannover for their support and cooperation and our student researchers Stefan Mahr, Alexander Berndt and Hark Empen for supporting us in data management.