Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Validation of the Mobile Application Rating Scale (MARS)

  • Yannik Terhorst ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    Affiliations Department of Research Methods, Institute of Psychology and Education, University Ulm, Ulm, Germany, Department of Clinical Psychology and Psychotherapy, Institute of Psychology and Education, University Ulm, Ulm, Germany

  • Paula Philippi,

    Roles Data curation, Formal analysis, Methodology, Writing – review & editing

    Affiliation Department of Clinical Psychology and Psychotherapy, Institute of Psychology and Education, University Ulm, Ulm, Germany

  • Lasse B. Sander,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Rehabilitation Psychology and Psychotherapy, Institute of Psychology, Albert-Ludwigs-University Freiburg, Freiburg im Breisgau, Germany

  • Dana Schultchen,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Clinical and Health Psychology, Institute of Psychology and Education, University Ulm, Ulm, Germany

  • Sarah Paganini,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Sport Psychology, Institute of Sports and Sport Science, University of Freiburg, Freiburg, Germany

  • Marco Bardus,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Health Promotion and Community Health, Faculty of Health Sciences, American University of Beirut, Beirut, Lebanon

  • Karla Santo,

    Roles Data curation, Writing – review & editing

    Affiliations Academic Research Organization, Hospital Israelita Albert Einstein, São Paulo, Brazil, Westmead Applied Research Centre, Westmead Clinical School, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia, Cardiovascular Division, The George Institute for Global Health, Sydney, Australia

  • Johannes Knitza,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Internal Medicine 3 – Rheumatology and Immunology, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nuremberg, Erlangen, Germany

  • Gustavo C. Machado,

    Roles Data curation, Writing – review & editing

    Affiliations Institute for Musculoskeletal Health, Sydney, New South Wales, Australia, Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia

  • Stephanie Schoeppe,

    Roles Data curation, Writing – review & editing

    Affiliation School of Health, Medical and Applied Sciences, Appleton Institute, Physical Activity Research Group, Central Queensland University, Rockhampton, Queensland, Australia

  • Natalie Bauereiß,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Clinical Psychology and Psychotherapy, Institute of Psychology and Education, University Ulm, Ulm, Germany

  • Alexandra Portenhauser,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Clinical Psychology and Psychotherapy, Institute of Psychology and Education, University Ulm, Ulm, Germany

  • Matthias Domhardt,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Clinical Psychology and Psychotherapy, Institute of Psychology and Education, University Ulm, Ulm, Germany

  • Benjamin Walter,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Internal Medicine I, Gastroenterology, University Hospital Ulm, Ulm, Germany

  • Martin Krusche,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Rheumatology and Clinical Immunology, Charité – Universitätsmedizin Berlin, Berlin, Germany

  • Harald Baumeister,

    Roles Data curation, Supervision, Writing – review & editing

    Affiliation Department of Clinical Psychology and Psychotherapy, Institute of Psychology and Education, University Ulm, Ulm, Germany

  •  [ ... ],
  • Eva-Maria Messner

    Roles Conceptualization, Data curation, Supervision, Writing – review & editing

    Affiliation Department of Clinical Psychology and Psychotherapy, Institute of Psychology and Education, University Ulm, Ulm, Germany

  • [ view all ]
  • [ view less ]



Mobile health apps (MHA) have the potential to improve health care. The commercial MHA market is rapidly growing, but the content and quality of available MHA are unknown. Instruments for the assessment of the quality and content of MHA are highly needed. The Mobile Application Rating Scale (MARS) is one of the most widely used tools to evaluate the quality of MHA. Only few validation studies investigated its metric quality. No study has evaluated the construct validity and concurrent validity.


This study evaluates the construct validity, concurrent validity, reliability, and objectivity, of the MARS.


Data was pooled from 15 international app quality reviews to evaluate the metric properties of the MARS. The MARS measures app quality across four dimensions: engagement, functionality, aesthetics and information quality. Construct validity was evaluated by assessing related competing confirmatory models by confirmatory factor analysis (CFA). Non-centrality (RMSEA), incremental (CFI, TLI) and residual (SRMR) fit indices were used to evaluate the goodness of fit. As a measure of concurrent validity, the correlations to another quality assessment tool (ENLIGHT) were investigated. Reliability was determined using Omega. Objectivity was assessed by intra-class correlation.


In total, MARS ratings from 1,299 MHA covering 15 different health domains were included. Confirmatory factor analysis confirmed a bifactor model with a general factor and a factor for each dimension (RMSEA = 0.074, TLI = 0.922, CFI = 0.940, SRMR = 0.059). Reliability was good to excellent (Omega 0.79 to 0.93). Objectivity was high (ICC = 0.82). MARS correlated with ENLIGHT (ps<.05).


The metric evaluation of the MARS demonstrated its suitability for the quality assessment. As such, the MARS could be used to make the quality of MHA transparent to health care stakeholders and patients. Future studies could extend the present findings by investigating the re-test reliability and predictive validity of the MARS.


The global burden of disease is high across the world [1]. Mobile health applications (MHA) have the potential to substantially improve health care by providing accessible, effective, cost-efficient, and scalable interventions, as well as health information that can improve the screening, diagnostics, prevention and treatment of diseases [26].

Currently, there are over 300,000 MHA available in the app stores, and more than 200 MHA are added each day [7]. Several randomized controlled trials have shown that MHA can be an effective intervention tool for the prevention and treatment of various health conditions [6]. A recent meta-analysis of randomized trials reported small to moderate pooled effects of MHA for improving depression, anxiety, stress levels, and quality of life [6, 8]. However, the number of evidence-based MHA on the MHA market is surprisingly small [3, 4, 9, 10]. The lack of evidence-based MHA in combination with the rapidly growing MHA market highlight that patients and health care providers need better guidance to identify high-quality MHA that meet patients’ needs [11]. Reliable and valid measures to assess the quality of MHA are needed to provide such information to health care stakeholders and patients.

The Mobile Application Rating Scale (MARS) is the most widely used scale for evaluating the quality and content of MHA [3, 10, 12, 1324]. The MARS is a multidimensional instrument to assess MHA quality and was developed based on semantic analysis and synthesis of relevant literature [16]. In total four separate dimensions were derived: engagement, functionality, aesthetics and information quality [16]. The original validation study showed good reliability of the subscales (α = 0.80 to 0.89) and the overall scale (α = 0.90), and good objectivity (subscales: Intra-class correlation (ICC) = 0.50 to 0.80; overall = 0.90) [16]. These results were replicated in several other studies investigating the metric basic of translated versions of the MARS [2527]. However, the generalizability of previous findings is limited due to small sample sizes, and MHA used for specific health conditions and geographic areas. Furthermore, crucial metric properties have not been extensively evaluated: 1) no study has evaluated the construct validity of the MARS–meaning whether the proposed four separate dimensions are indeed independent—, 2) the concurrent validity with other quality instruments, such as the ENLIGHT instrument [28], is unknown, and 3) the findings regarding the concurrent validity with user-ratings in the app stores are inconclusive to this point [3, 14, 16]. Moreover, there are some methodological limitations in previous MARS evaluations (e.g., using Cronbach’s alpha for reliability [2931]).

In an effort to address the aforementioned research gaps, this study aimed to validate the MARS based on pooled MARS data from 15 international reviews assessing the quality and content of MHA in various health conditions. The following research questions were investigated:

  1. What is the validity of the MARS in terms of:
    1. Construct validity: What is the latent structure of the MARS and are the proposed four dimensions independent?
    2. Concurrent validity: What are the correlations between the MARS and another frequently used quality assessment tool called ENLIGHT [28]?
  2. Reliability: What is the internal consistency of the overall MARS and its subscales?
  3. Objectivity: What is the agreement between reviewers?


Study design

This is a validation study evaluating the metric quality of the MARS [16]. Similar to an individual patient data meta-analysis approach [32], research groups using the MARS were contacted and asked to provide their primary data (= quality ratings of MHA). Subsequently, all data sets provided were verified, homogenized, and merged into a single data set.

Inclusion criteria and search

To obtain a large data set, only reviews about MHA using the MARS were eligible. Reviews that used the MARS to assess the quality of MHA were identified through literature searches conducted in Google Scholar and PubMed in July 2019, using terms such as MHA reviews, app quality or MARS. The literature searches were conducted by PP, YT and EM. The corresponding authors of the identified reviews were contacted and asked to share their data. Data from on-going reviews in which the authors were involved were also included. Data from the original validation study of the MARS [16] were excluded to obtain an independent sample for the present validation study.

Measurement: Mobile Application Rating Scale

The MARS is a multidimensional instrument assessing the quality of MHA [16]. The quality assessment consists of a total of 19 items covering four dimensions. The dimensions are: (A) engagement (5 items: fun, interest, individual adaptability, interactivity, target group), (B) functionality (4 items: performance, usability, navigation, gestural design), (C) aesthetics (3 items: layout, graphics, visual appeal), and (D) information quality (7 items: accuracy of app description, goals, quality of information, quantity of information, quality of visual information, credibility, evidence base). All items are assessed on a 5-point scale (1-inadequate, 2-poor, 3-acceptable, 4-good, and 5-excellent). Items assessing information quality can also be rated as not applicable (e.g., in case of missing evidence or missing visual information).

Statistical analysis


Construct validity: Confirmatory factor analysis. Confirmatory factor analysis (CFA) was applied to examine four proposed models. The MARS was designed to measure app quality. Based on the four subscales engagement, functionality, aesthetics, and information quality, we hypothesized four competing confirmatory models:

  1. Model 1 consisted of four latent factors accounting for the item co-variance of the respective subscales, correlations between the four latent factors were allowed (see Fig 1);
  2. Model 2 assumed a latent factor for the items of each subscale, and in contrast to model 1, a higher order factor was introduced to account for correlations between the factors (see Fig 2);
  3. Model 3 has one general latent factor (g-factor) accounting for the co-variance of all items and four residual factors accounting for the remaining co-variances of the respective subscale items (see Fig 3);
  4. Model 4 assumed only a general factor (see Fig 4).
Fig 1. Hypothesized CFA model 1 of the MARS.

Item-wise error variances are not represented in the models; correlations between errors were not allowed.

Fig 2. Hypothesized CFA model 2 of the MARS.

Item-wise error variances are not represented in the models; correlations between errors were not allowed.

Fig 3. Hypothesized CFA model 3 of the MARS.

Item-wise error variances are not represented in the models; correlations between errors were not allowed.

Fig 4. Hypothesized CFA model 4 of the MARS.

Item-wise error variances are not represented in the models; correlations between errors were not allowed.

Due to the high power of the χ2-test and its tendency to reject slightly mis-specified models [3335], the model fit was evaluated using various fit indices: the root mean square error of approximation (RMSEA) as a non-centrality parameter, the standardized root mean square residual (SRMR) as a residual index, the confirmatory fit index (CFI) and the Tucker-Lewis index (TLI) as incremental indices. Cut-off values for an acceptable goodness of fit were based on standard modeling criteria: RMSEA < 0.06, SRMR < 0.08, CFI > 0.95 and TLI > 0.95 [36]. Akaike information criterion (AIC) and the Bayesian information criterion (BIC) were used for model comparisons.

Full information maximum likelihood was used as a robust estimator given its capability to handle missing data [37, 38]. Hubert-White robust standard errors were obtained [38]. Modification indices were used to further investigate the structure of the MARS and potential sources of ill fit [39].

Concurrent validity. Since the MARS was designed to measure app quality, it should be related closely to other app quality metrics. Some of the included data sets provided both ratings using the ENLIGHT instrument and the MARS. Similar to the MARS, the ENLIGHT is a quality assessment tool for MHA [28], which assesses app quality covering seven dimensions: a. usability (3 items), b. visual design (3 items), c. user engagement (5 items), d. content (4 items), e. therapeutic persuasiveness (7 items), f. therapeutic alliance (3 items), and g. general subjective evaluation (3 items). Items are rated from 1 (= very poor) to 5 (= very good). The intra-rater-reliability of the ENLIGHT (ICC = 0.77 to 0.98) and the internal consistency (α = 0.83 to 0.90) are excellent [28].

Correlations were used to determine the concurrent validity between the MARS and ENLIGHT. All correlations reported in this study were calculated using correlation coefficient r, which ranges between 0 (no relationship) to 1 (perfect relationship) or -1 (perfect negative relationship) respectively. For all correlation analyses, the alpha-level was 5%. P-values were adjusted for multiple testing using the procedure proposed by Holm [40].

Reliability: Internal consistency.

As a variant of reliability, internal consistency was determined. Omega was used as reliability coefficient [41]. Compared to the widely used Cronbach’s Alpha, Omega provides a more unbiased estimation of reliability [2931]. The procedures introduced by Zhang and Yuan [42] were used to obtain robust coefficients and bootstrapped bias-corrected confidence intervals. A reliability coefficient of < 0.50 was considered to be unacceptable, 0.51–0.59 to be poor, 0.60–0.69 to be questionable, 0.70–0.79 to be acceptable, 0.80–0.89 to be good, and > 0.90 to be excellent [43].

Objectivity: Intra-class correlation.

The MARS comes with a standardized online training for reviewers [16]. Following the training, the MARS assessment is suggested to be either conducted by a single rater or by two raters (pooling their ratings) [16]. Consistency between raters was examined by calculating intra-class correlation based on a two-way mixed-effects model [44]. A cut-off of ICC above 0.75 (Fleiss, 1999) was used to define a satisfactory inter-rater agreement. All data sets based on ratings of two reviewers were included in this analysis.

Analysis software.

The software R was used for all analyses [45], except for the intra-class correlation, which was calculated using SPSS 24 [46]. For the CFA, the R package “lavaan” (version: 0.5–23.1097) was deployed [47]. Omega was assessed using “coefficient alpha” [42]. Correlations were calculated using “psych” (version: 1.7.8.) [48].


Sample characteristics

The literature searches identified a total of 18 international reviews that assessed the quality of MHA using the MARS. All research groups that have published an eligible review were contacted. In total, 15 of the 18 contacted research groups responded and agreed to share their data [3, 10, 12, 14, 15, 18, 19, 22, 24, 4954]. The present sample consists of N = 1299 MHA. MHA targeting physical, mental and behavioral health, as well as specific target groups were included: anxiety (n = 104), low back pain (n = 58), cancer (n = 78), depression (n = 38), diet (n = 25), elderly (n = 84), gastrointestinal diseases (n = 140), medication adherence (n = 9), mindfulness (n = 103), pain (n = 147), physical activity (n = 312), post-traumatic stress disorder (n = 87), rheumatism (n = 32), weight management (n = 66), and internalizing disorder MHA for children and youth (n = 16). For all included data sets, the MARS rating was conducted by researchers holding at least a B.Sc. degree.

The overall quality of these MHA based on the quality assessment using MARS was moderate (mean MARS score [M] = 3.74, standard deviation [SD] = 0.59). The quality of MHAs was highest in relation to the functionality dimension (M = 4.03, SD = 0.67), followed by aesthetics (M = 3.40, SD = 0.87), information quality (M = 3.06, SD = 0.72) and engagement (M = 2.96, SD = 0.90) (see Fig 5).

The MARS assesses the evidence base of an app using the question “Has the app been trialled/tested; must be verified by evidence (in published scientific literature)?”. Overall, 1230 (94.8%) of all included MHAs were rated as not evidence-based.

Construct validity: Confirmatory factor analysis

None of the a-priori defined confirmatory models were confirmed by CFA. The best-fitting model was model 3. Model 3 was further investigated using modification indices. Introducing a correlation between items 3 and 4 (= Model 3a) yielded an acceptable model fit. Fit indices of all models are presented in Table 1. Model 3a is presented in Fig 6.

Fig 6. Model 3a.

Loadings are standardized; correlations between all latent variables were set to zero; item-wise error variances have been excluded; Model 3a differs from the a-priori defined model 3 in the correlation between item 3 (a03) and item 4 (a04).

Concurrent validity

A total of 120 MHA were rated using both the ENLIGHT instrument and the MARS. Correlations between MARS and ENLIGHT were calculated based on the respective subsample. Correlations are presented in Table 2.

Table 2. Correlations between the MARS and ENLIGHT using a subsample of apps.

Reliability: Internal consistency

The internal consistency of all sections was good to excellent (see Table 3).

Objectivity: Intra-class correlation

To calculate the agreement of raters only data sets providing ratings of both reviewers were used. A total of 793 apps (= 15067 rated items per reviewer) were included in the intra-class correlation analysis. Overall, intra-class correlation was good: ICC = 0.816 (95% CI: 0.810 to 0.822). Section-wise ICC is summarized in Table 4.


To our knowledge, the present study is the first study to evaluate the construct validity of the MARS. Furthermore, this study builds on previous metric evaluations of the MARS [16, 2527] by investigating its validity, reliability, and objectivity using a large sample of MHAs covering multiple health conditions. The CFA confirmed a bi-factor model consisting of a general g-factor and uncorrelated factors for each dimension of the MARS. Given the theoretical background of the MARS, the latent g-factor could represent a general quality factor or a factor accounting for shared variance introduced by the assessment methodology. Either way, the four uncorrelated factors confirm the proposed dimensions of the MARS [16]. Thus, the interpretation of the sum score for each dimension seems legit. However, the present analysis highlights that not all items are equally good indicators for the dimensions. Hence, a weighted average of the respective items of each of the four dimensions a) engagement, b) functionality, c) aesthetics and d) information quality would be more adequate.

Besides the construct validity, the concurrent validity was evaluated. High correlations to the ENLIGHT indicated a good concurrent validity. Furthermore, previous metric evaluations in terms of reliability and objectivity [16, 2527] were replicated with the present MHA sample. Our findings showed that both reliability and objectivity of the MARS were good to excellent. Overall, considering the validity, reliability and objectivity results the MARS seems to be an app quality assessment tool of high metric quality.

The correlation between the MARS and the ENLIGHT instrument was high, at least in a sub-sample of the analyzed apps. This indicates good concurrent validity between both expert assessments. However, ENLIGHT contains a section assessing therapeutic alliance [28] which was only moderately covered by the MARS. The integration of therapeutic alliance in the MARS could further strengthen the quality of the MHA assessment. Especially in the context of conventional and digitalized health care, therapeutic alliance, guidance, and therapeutic persuasiveness, are important aspects along with persuasive design [25, 28, 55, 56].

Pooling data from multiple international reviews of the quality of MHA using MARS also provided an insight into the quality of many commercial MHA. While most MHA show high quality in terms of functionality and aesthetics, the engagement and information quality of MHA show high heterogeneity and an overall moderate quality. However, most striking is the lack of evidence-based MHA. Only 5% of the MHA were evaluated in studies (e.g., feasibility, uncontrolled longitudinal designs, RCT). This lack of evidence is in line with previous research and a major constraint in the secondary health market [3, 4, 9]. Creating an evidence-based MHA market and addressing central issues, like 1) data safety and privacy, 2) user adherence and 3) data integration, are core challenges that have to be solved to utilize the potential benefits of MHA in health care [5759]. Using the MARS to make those issues transparent to health care stakeholders and patients, as well as establishing guidelines for the developments of MHA are both necessary and promising steps to achieve this goal [16, 57].


Some limitations of this study need to be noted. First, the main aim of this study was to evaluate the construct validity of the MARS. By including ratings of multiple reviewers across the world and multiple health conditions, we regard the external validity of the results as high. Nonetheless, the results might be only valid in the present sample and not transferable to other conditions, target groups or rating teams. Thus, the confirmed bifactor model should be challenged in other health conditions and also non-health apps. Notably, the necessary modification to the a-priori defined bifactor model should be closely investigated, since it was introduced based on modification indices and is of an exploratory nature. Second, the evaluation of the construct validity of the MARS might be biased due to the format of the MARS, as throughout the MARS all items are assessed on a 5-point scale. Since there is no variation in the item format, item-class specific variance cannot be controlled in the present evaluation. As a result, item-class variance might be attributed to the quality factor. These issues could be addressed in future studies by using a different item format. Also using a multi-method approaches, for example by integrating alternative assessments like the user version of the MARS [60] or the ENLIGHT [28] could lead to a more comprehensive assessment of the quality of MHA. Third, although reliability of the MARS was also a focus in this study (i.e., internal consistency), there are facets of reliability which are still unexplored. For instance, re-test reliability of the MARS has never been evaluated. To investigate re-test reliability, an adequate study design with time-shifted assessments of the same version of apps by the same reviewers is needed. This remains to be investigated in future studies. Finally, throughout the study, quality is discussed as a fundamental requirement for apps. However, the internal validity in the sense whether quality is predictive, for example, for engagement, adherence, effectiveness was not evaluated in this study. No study has yet investigated this using the MARS. Baumel and Yom-Tov [61] examined which design aspects are essential using the ENLIGHT instrument. For instance, engagement and therapeutic persuasiveness were identified as crucial quality aspects associated with user adherence [61]. Based on the high correlation between MARS and ENLIGHT, one could assume that their findings could also be applied to the MARS. However, this has to be confirmed in future studies. The role of quality should also be investigated in a more holistic model containing MHA specific features (e.g., persuasive design) [62, 63], user features (e.g., personality) and incorporating existing model such as the unified theory of acceptance and use of technology (UTAUT) [64].


The MARS is a metrically well-suited instrument to assess MHA quality. Given the rapidly growing app market, scalable solutions to make content and quality of MHA more transparent to users and health care stakeholders are highly needed. The MARS may become a crucial part of such solutions. Future studies could extend the present findings by investigating the re-test reliability and predictive validity of the MARS.


The present study was only possible based on the previous work of research groups. The authors would like to thank all researchers involved in these projects: Abraham, C., Ahmed, O.H., Alley, S., Bachert, P., Balci, S., van Beurden, S.B., Bosch, P., Bray, N.A., Catic, S., Chalmers, J., Chow, C.K., Direito, A., Eder, A.-S., Gnam, J.-P., Haase, I., Hayman, M., Hendrick, P., Holderied, T., Kamper, S.J., Kittler, J., Kleyer, A., Küchler, A.-M. Lee, H., Lin, J., van Lippevelde, W., Meyer, M., Mucke, J., Pinheiro, M.B., Plaumann, K., Pryss, R., Pulla, A., Rebar, A.L., Redfern, J., Richtering, S.S., Schrondanner, J., Sewerin, P., Simon, D., Smith, J.R., Sophie, E., Spanhel, K., Sturmbauer, S., Tascilar, K., Thiagalingam, A., Vandelanotte, C., Vossen, D., Williams, C., Wurst, R.


  1. 1. James SL, Abate D, Abate KH, Abay SM, Abbafati C, Abbasi N, et al. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392: 1789–1858. pmid:30496104
  2. 2. Albrecht U. Chancen und Risiken von Gesundheits-Apps (CHARISMHA) [chances and risks of mobile health applications]. Albrecht U, editor. Medizinische Hochschule Hannover; 2016.
  3. 3. Terhorst Y, Rathner E-M, Baumeister H, Sander L. “Help from the app store?”: A systematic review of depression apps in the German app stores. Verhaltenstherapie. 2018;28.
  4. 4. Donker T, Petrie K, Proudfoot J, Clarke J, Birch M-RR, Christensen H. Smartphones for smarter delivery of mental health programs: A systematic review. Journal of Medical Internet Research Journal of Medical Internet Research; Nov 15, 2013 p. e247. pmid:24240579
  5. 5. Ebert DD, Van Daele T, Nordgreen T, Karekla M, Compare A, Zarbo C, et al. Internet- and Mobile-Based Psychological Interventions: Applications, Efficacy, and Potential for Improving Mental Health: A Report of the EFPA E-Health Taskforce. Eur Psychol. 2018;23: 167–187.
  6. 6. Linardon J, Cuijpers P, Carlbring P, Messer M, Fuller-Tyszkiewicz M. The efficacy of app-supported smartphone interventions for mental health problems: a meta-analysis of randomized controlled trials. World Psychiatry. 2019;18: 325–336. pmid:31496095
  7. 7. IQVIA. IQVIA Institute for Human Data Science Study: Impact of Digital Health Grows as Innovation, Evidence and Adoption of Mobile Health Apps Accelerate—IQVIA. 2017 [cited 17 Oct 2019].
  8. 8. Weisel KK, Fuhrmann LM, Berking M, Baumeister H, Cuijpers P, Ebert DD. Standalone smartphone apps for mental health—a systematic review and meta-analysis. npj Digit Med. 2019;2: 118. pmid:31815193
  9. 9. Sucala M, Cuijpers P, Muench F, Cardoș R, Soflau R, Dobrean A, et al. Anxiety: There is an app for that. A systematic review of anxiety apps. Depress Anxiety. 2017;34: 518–525. pmid:28504859
  10. 10. Sander L, Schrondanner J, Terhorst Y, Spanhel K, Pryss R, Baumeister H, et al. Help for trauma from the app stores?’ A systematic review and standardised rating of apps for Post-Traumatic Stress Disorder (PTSD). Eur J Psychotraumatol. 2019; accepted.
  11. 11. Mathews SC, McShea MJ, Hanley CL, Ravitz A, Labrique AB, Cohen AB. Digital health: a path to validation. npj Digit Med. 2019;2: 38. pmid:31304384
  12. 12. Knitza J, Tascilar K, Messner E-M, Meyer M, Vossen D, Pulla A, et al. German Mobile Apps in Rheumatology: Review and Analysis Using the Mobile Application Rating Scale (MARS). JMIR mHealth uHealth. 2019;7: e14991. pmid:31381501
  13. 13. Salazar A, de Sola H, Failde I, Moral-Munoz JA. Measuring the Quality of Mobile Apps for the Management of Pain: Systematic Search and Evaluation Using the Mobile App Rating Scale. JMIR mHealth uHealth. 2018;6: e10718. pmid:30361196
  14. 14. Bardus M, van Beurden SB, Smith JR, Abraham C. A review and content analysis of engagement, functionality, aesthetics, information quality, and change techniques in the most popular commercial apps for weight management. Int J Behav Nutr Phys Act. 2016;13: 35. pmid:26964880
  15. 15. Meßner E, Terhorst Y, Catic S, Balci S, Küchler A-M, Schultchen D, et al. “Move it!” Standardised expert quality ratings (MARS) of apps that foster physical activity for Android and iOS. 2019.
  16. 16. Stoyanov SR, Hides L, Kavanagh DJ, Zelenko O, Tjondronegoro D, Mani M. Mobile App Rating Scale: A New Tool for Assessing the Quality of Health Mobile Apps. JMIR mHealth uHealth. 2015;3: e27. pmid:25760773
  17. 17. Masterson Creber RM, Maurer MS, Reading M, Hiraldo G, Hickey KT, Iribarren S. Review and Analysis of Existing Mobile Phone Apps to Support Heart Failure Symptom Monitoring and Self-Care Management Using the Mobile Application Rating Scale (MARS). JMIR mHealth uHealth. 2016;4: e74. pmid:27302310
  18. 18. Schoeppe S, Alley S, Rebar AL, Hayman M, Bray NA, Van Lippevelde W, et al. Apps to improve diet, physical activity and sedentary behaviour in children and adolescents: a review of quality, features and behaviour change techniques. Int J Behav Nutr Phys Act. 2017;14: 83. pmid:28646889
  19. 19. Santo K, Richtering SS, Chalmers J, Thiagalingam A, Chow CK, Redfern J. Mobile Phone Apps to Improve Medication Adherence: A Systematic Stepwise Process to Identify High-Quality Apps. JMIR mHealth uHealth. 2016;4: e132. pmid:27913373
  20. 20. Grainger R, Townsley H, White B, Langlotz T, Taylor WJ. Apps for People With Rheumatoid Arthritis to Monitor Their Disease Activity: A Review of Apps for Best Practice and Quality. JMIR mHealth uHealth. 2017;5: e7. pmid:28223263
  21. 21. Mani M, Kavanagh DJ, Hides L, Stoyanov SR. Review and Evaluation of Mindfulness-Based iPhone Apps. JMIR mHealth uHealth. 2015;3: e82. pmid:26290327
  22. 22. Machado GC, Pinheiro MB, Lee H, Ahmed OH, Hendrick P, Williams C, et al. Smartphone apps for the self-management of low back pain: A systematic review. Best Pract Res Clin Rheumatol. 2016;30: 1098–1109. pmid:29103552
  23. 23. Thornton L, Quinn C, Birrell L, Guillaumier A, Shaw B, Forbes E, et al. Free smoking cessation mobile apps available in Australia: a quality review and content analysis. Aust N Z J Public Health. 2017;41: 625–630. pmid:28749591
  24. 24. Meßner E, Terhorst Y, Sander L, Schultchen D, Plaumann K, Sturmbauer S, et al. “When the fear kicks in”- Standardized expert quality ratings of apps that aim to reduce anxiety. 2019.
  25. 25. Messner E-M, Terhorst Y, Barke A, Baumeister H, Stoyanov S, Hides L, et al. Development and Validation of the German Version of the Mobile Application Rating Scale (MARS-G). JMIR m u Heal. 2019; accepted.
  26. 26. Domnich A, Arata L, Amicizia D, Signori A, Patrick B, Stoyanov S, et al. Development and validation of the Italian version of the Mobile Application Rating Scale and its generalisability to apps targeting primary prevention. BMC Med Inform Decis Mak. 2016;16: 83. pmid:27387434
  27. 27. Payo RM, Álvarez MMF, Díaz MB, Izquierdo MC, Stoyanov SR, Suárez EL. Spanish adaptation and validation of the Mobile Application Rating Scale questionnaire. Int J Med Inform. 2019;129: 95–99. pmid:31445295
  28. 28. Baumel A, Faber K, Mathur N, Kane JM, Muench F. Enlight: A Comprehensive Quality and Therapeutic Potential Evaluation Tool for Mobile and Web-Based eHealth Interventions. J Med Internet Res. 2017;19: e82. pmid:28325712
  29. 29. Dunn TJ, Baguley T, Brunsden V. From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. Br J Psychol. 2014;105: 399–412. pmid:24844115
  30. 30. Revelle WW, Zinbarg R. R. Coefficients Alpha, Beta, Omega and GLB: Comments on Sijtsma. Psychometrika. 2009;74: 145–154. Available:
  31. 31. McNeish D. Thanks coefficient alpha, we’ll take it from here. Psychol Methods. 2018;23: 412–433. pmid:28557467
  32. 32. Stewart LA, Tierney JF. To IPD or not to IPD? Advantages and disadvantages of systematic reviews using individual patient data. Eval Health Prof. 2002;25: 76–97. pmid:11868447
  33. 33. Browne MW, Cudeck R. Alternative Ways of Assessing Model Fit. Sociol Methods Res. 1992;21: 230–258.
  34. 34. Moshagen M, Erdfelder E. A New Strategy for Testing Structural Equation Models. Struct Equ Model A Multidiscip J. 2016;23: 54–60.
  35. 35. Moshagen M. The Model Size Effect in SEM: Inflated Goodness-of-Fit Statistics Are Due to the Size of the Covariance Matrix. Struct Equ Model A Multidiscip J. 2012;19: 86–98.
  36. 36. Hu LT, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct Equ Model. 1999;6: 1–55.
  37. 37. Enders CK. Applied Missing Data Analysis. Library. 2010.
  38. 38. Rosseel Y. The lavaan tutorial. 2019.
  39. 39. MacCallum RC, Roznowski M, Necowitz LB. Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychol Bull. 1992;111: 490–504. pmid:16250105
  40. 40. Holm S. A Simple Sequentially Rejective Multiple Test Procedure. Scand J Stat. 1979;6: 65–70.
  41. 41. McDonald RP. Test theory: A unified treatment. Test theory A unified treatment. 1999. p. 485.
  42. 42. Zhang Z, Yuan K. Robust Coefficients Alpha and Omega and Confidence Intervals With Outlying Observations and Missing Data: Methods and Software. 2016.
  43. 43. George D, Mallery P. SPSS for Windows step by step: A simple guide and reference. 4th T4-. Boston: Allyn & Bacon; 2003.
  44. 44. Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016;15: 155–63. pmid:27330520
  45. 45. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. 2018. p. {ISBN} 3-900051-07-0.
  46. 46. IBM. IBM SPSS Advanced Statistics 24. IBM. 2016; 184.
  47. 47. Rosseel Y. lavaan: An R package for structural equation modeling. J Stat Softw. 2009;30: 1–3. pmid:21666874
  48. 48. Revelle W. psych: Procedures for Psychological, Psychometric, and Personality Research. 2018.
  49. 49. Schultchen D, Terhorst Y, Holderied T, Sander L, Baumeister H, Messner E-M. Using apps to calm down: A systematic review of mindfulness apps in German App Stores. Prep. 2019.
  50. 50. Terhorst Y, Messner E-M, Paganini S, Portenhauser A, Eder A-S, Bauer M, et al. Mobile Health Apps for Pain? A systematic review of content and quality of pain apps in European App Stores. Prep. 2019.
  51. 51. Bauereiß N, Bodschwinna D, Wölflick S, Sander L, Baumeister H, Messner E-M, et al. mHealth in Cancer Care—Standardised Expert Quality Ratings (MARS) of Mobile Health Applications in German App Stores Supporting People Living with Cancer and their Caregivers. Prep. 2019.
  52. 52. Portenhauser A, Terhorst Y, Schultchen D, Sander L, Denkinger M, Waldherr N, et al. A systematic review and evaluation of mobile applications for the elderly. Prep. 2019.
  53. 53. Walter B, Terhorst Y, Sander L, Schultchen D, Schmidbaur S, Messner E-M. A systematic review and evaluation of apps for gastrointestinal diseases for iOS and android. Prep. 2019.
  54. 54. Domhardt M, Messner E-M, Eder A-S, Sophie E, Sander L, Baumeister H, et al. Mobile-based Interventions for Depression, Anxiety and PTSD in Youth: A systematic review and evaluation of current pediatric health apps. Prep. 2019.
  55. 55. Baumeister H, Reichler L, Munzinger M, Lin J. The impact of guidance on Internet-based mental health interventions—A systematic review. Internet Interv. 2014;1: 205–215.
  56. 56. Domhardt M, Geßlein H, von Rezori RE, Baumeister H. Internet- and mobile-based interventions for anxiety disorders: A meta-analytic review of intervention components. Depress Anxiety. 2019;36: 213–224. pmid:30450811
  57. 57. Torous J, Andersson G, Bertagnoli A, Christensen H, Cuijpers P, Firth J, et al. Towards a consensus around standards for smartphone apps and digital mental health. World Psychiatry. 2019;18: 97–98. pmid:30600619
  58. 58. Huckvale K, Torous J, Larsen ME. Assessment of the Data Sharing and Privacy Practices of Smartphone Apps for Depression and Smoking Cessation. JAMA Netw Open. 2019;2: e192542. pmid:31002321
  59. 59. Grundy Q, Chiu K, Held F, Continella A, Bero L, Holz R. Data sharing practices of medicines related apps and the mobile ecosystem: traffic, content, and network analysis. BMJ. 2019;364: l920. pmid:30894349
  60. 60. Stoyanov SR, Hides L, Kavanagh DJ, Wilson H. Development and Validation of the User Version of the Mobile Application Rating Scale (uMARS). JMIR mHealth uHealth. 2016;4: e72. pmid:27287964
  61. 61. Baumel A, Yom-Tov E. Predicting user adherence to behavioral eHealth interventions in the real world: Examining which aspects of intervention design matter most. Transl Behav Med. 2018;8: 793–798. pmid:29471424
  62. 62. Baumeister H, Kraft R, Baumel A, Pryss R, Messner E-M. Persuasive e-health design for behavior change. In: Baumeister H, Montag C, editors. Mobile sensing and digital phenotyping: new developments in psychoinformatics. Berlin: Springer; 2019.
  63. 63. Baumel A, Birnbaum ML, Sucala M. A Systematic Review and Taxonomy of Published Quality Criteria Related to the Evaluation of User-Facing eHealth Programs. J Med Syst. 2017;41. pmid:28735372
  64. 64. Venkatesh V, Morris MG, Davis GB, Davis FD. User Acceptance of Information Technology: Toward a Unified View. MIS Q. 2003;27: 425–478. Available: