The association between self-reported stress and cardiovascular measures in daily life: A systematic review

Background Stress plays an important role in the development of mental illness, and an increasing number of studies is trying to detect moments of perceived stress in everyday life based on physiological data gathered using ambulatory devices. However, based on laboratory studies, there is only modest evidence for a relationship between self-reported stress and physiological ambulatory measures. This descriptive systematic review evaluates the evidence for studies investigating an association between self-reported stress and physiological measures under daily life conditions. Methods Three databases were searched for articles assessing an association between self-reported stress and cardiovascular and skin conductance measures simultaneously over the course of at least a day. Results We reviewed findings of 36 studies investigating an association between self-reported stress and cardiovascular measures with overall 135 analyses of associations between self-reported stress and cardiovascular measures. Overall, 35% of all analyses showed a significant or marginally significant association in the expected direction. The most consistent results were found for perceived stress, high-arousal negative affect scales, and event-related self-reported stress measures, and for frequency-domain heart rate variability physiological measures. There was much heterogeneity in measures and methods. Conclusion These findings confirm that daily-life stress-dynamics are complex and require a better understanding. Choices in design and measurement seem to play a role. We provide some guidance for future studies.


Introduction
Stress is one of the largest environmental risk factors for mental illness. According to diathesis-stress models, prolonged exposure to stressors can lead to severe mental illness in vulnerable individuals [1][2][3]. Simultaneously, prolonged exposure to stressors sensitizes the stress system, resulting in altered affective reactivity to relatively minor stressors, such as daily hassles [4]. An individual's affective reactivity to these minor stressors is therefore thought to reflect an underlying risk of developing mental illness. In line with this theory, diary studies have indicated that increased affective reactivity to daily hassles mediates the effect of childhood adversity on psychopathology [5][6][7]. Increased affective reactivity to daily stressors has been associated with a number of mental illnesses [8][9][10][11][12], making it an important indicator of mental health.
Daily stress can be measured using ambulatory assessment methods. To date, the majority of studies investigating everyday stress have done so using structured diary techniques (i.e., experience sampling methodology [ESM]; ecological momentary assessment [EMA]) to assess the subjective experience of being stressed and its effects on affective states. Typically, study participants are provided with a device that sends a signal each time a diary entry is required. Such diary entries consist of questions on momentary experiences, contexts, and appraisals, providing insight into the participants' daily lives while keeping recall bias low [13,14]. Using these diaries, self-reports on experienced stress levels (henceforth referred to as self-reported stress) have been studied in a variety of ways, in part depending on the underlying theory [15,16]. Stress as a concept has many definitions. The most prominent theories posit that stress is bodily strain in response to demand [17] or an allostatic reaction to a perceived threat [18] that occurs when a situation is appraised as more challenging, unpleasant, and important than the individual can cope with [19] or when perceived demands are greater than perceived control over the situation [20]. Although different, these definitions all seem to assume that stress depends on an individual's perception of a given situation. Circularly, this means an individual is under stress when they perceive a situation as stressful (i.e. demanding, threatening, etc.). As a result, self-reported stress has been operationalized as appraisals, perceived stress, or affective distress (or negative affect [NA]) in countless different ways [15,16]. Two reviews have investigated how these studies assessed stress in daily life and found much heterogeneity in measures [15,16]. This heterogeneity is most likely a reflection of the variety in theoretical definitions, terminologies, and approaches.
Due to advances in mobile technology, the last decade has seen a steep increase in the number of studies assessing the autonomic nervous system (ANS) response to daily stressors using wearable devices [15]. An advantage of these ambulatory remote monitoring methods is that they do not require immediate action from the participants in order to collect data; many wearable sensors can gather data passively throughout the study period. ANS measures collected using wearable sensors include blood pressure, heart rate, and skin conductance, which have all been positively associated with psychosocial stress [21,22]. Heart rate variability (HRV) is another physiological measure that has been linked to stress, typically in the form of a negative relationship. Considering the relevance of stress reactivity for mental health, being able to detect instances of stress reactivity that signal psychopathological vulnerability through passively monitored ANS markers could potentially have a large impact on early intervention strategies in mental healthcare.
However, none of these measures are stress-specific, which reveals the method's Achilles' heel; when an experiential perspective is lacking, there is no certainty that changes in physiology reflect instances of acute stress. Yet, even combining daily life remote monitoring of physiology and an ESM assessment on self-reported stress may not provide the answer. In fact, over the years, several studies have tried to predict self-reported stress based on wearable sensor data, with varying levels of success [23]. A systematic review on reactivity to a standardized psychosocial stress task under laboratory conditions reported that only 25% of the studies they reviewed found an association between self-reported and cardiovascular measures of stress [24]. Moreover, suppressing the ANS [25], or both the ANS and the endocrine system [26] did not affect stress reports during a psychosocial stress task, begging the question whether these systems are associated at all. Still, theoretical frameworks assume this coherence between selfreported stress and physiological measures exists [27,28]. Over the years, several studies have combined ESM and daily life remote monitoring of physiology when investigating the stress response in daily life, and have looked at associations between both types of measures. No systematic review to date, however, has compared these studies and their findings to evaluate the evidence that they assess the same underlying process. Moreover, much like the heterogeneity in self-report measures, several different physiological variables have been used to capture the stress response [15] and it is unclear what their individual relationships to self-reported stress are. Similarly, little is known about how differences in stress reactivity observed in different study populations affect the relationship between daily-life self-reported and physiological measures of stress. The same goes for choices on study devices such as wearable sensors or diary equipment. Finally, the study protocol and its compliance can have an influence when stressful moments are not sampled frequently enough.
This systematic review has three main aims: First, we aim to identify how studies have investigated the broad concept of daily life stress using simultaneous ambulatory measures of self-reported stress and cardiovascular and skin conductance features. Second, we will review the evidence that these measures of self-reported stress and cardiovascular and skin conductance features are associated. Third, we will explore the influence of choices on self-reported stress measures, physiological measures, study population, study methods, and compliance, on these associations. To do so, we will systematically review all studies that have assessed daily stress using ambulatory methods and associated ratings of self-reported stress with cardiovascular and skin conductance measures associated with the stress response.

Search strategy
A systematic literature search was conducted using three databases (S1 File): Comprehensive Biomedical Literature Database (EMBASE), Archive of Biomedical and Life Sciences Journal Literature (PubMed), and Web of Science (WOS) Core Collection. The search was performed for studies published until 6 th June 2019. An updated search was conducted from the same databases on studies published between 7 th June 2019 and 25th October 2020. Fig 1 presents the combined flowchart of the study selection.
Inclusion criteria were designed by the research team taking into account only studies investigating the association between self-reported stress and cardiovascular measures and skin conductance measures in daily life. Self-reported stress was defined as an item or a feature of all forms inquiring about the subjective experience of stress. As one of the aims of this review was to investigate how self-reported stress in daily life is assessed, we opted for a liberal approach in our literature search. Since some authors assess stress using NA scales (either including items on experience of stress or not), we also included search terms referring to affect. Studies that only assessed positive affect were excluded. Also, as some of the main stress theories may result in different operationalizations, we included several other terms. Specifically, appraisal theory states that stress may be a response to unpleasant or negative events (i.e. hassles) [29]. Other main theories use terms such as strain [20], demand [17], or threat [30].

PLOS ONE
The association between self-reported stress and cardiovascular measures in daily life: A systematic review Ambulatory cardiovascular and skin conductance measures were defined as any remote assessment measuring physiological data (e.g., ambulatory devices and/or wearable devices) in daily life. Both self-reported stress and cardiovascular and/or skin conductance measures in daily life needed to occur in a natural environment without the presence of a healthcare professional or a researcher, and assessing both measures simultaneously. For purposes of ecological validity and considering the dynamic process that is stress, we only included studies that reported more than one stress assessment per day (i.e., no end-of-day diary studies) and took into account the multilevel nature of the data in the statistical analyses. Also, our search strategy only included interventional and observational studies that were published in English. Studies publishing results from the same dataset were only included if they reported on different variables. Systematic reviews, discussion papers, study abstracts, qualitative studies, and study protocols were excluded.
A researcher (A.R.) performed the searches in the selected databases with the collaboration of the research team. Search terms included various self-reported and cardiovascular and skin conductance stress terms (e.g., "stress � ", "distress", "threat", "cardiovascular", "skin conductance") and momentary or remote assessment protocol terms (e.g., "experience sampling", "momentary", "diary"). Search terms included either medical subject headings (MeSH) or keyword headings. The original search strategy is described in S1 File.

Data extraction
Two reviewers (A.R. and T.V.) independently screened the titles and abstracts of the studies in line with the Preferred Reporting for Systematic Reviews and Meta-analysis (PRISMA) guidelines using the defined search strategy [31,32]. Next, relevant studies were independently evaluated for full-text assessment by two reviewers (A.R. and T.V.). An updated search was conducted with the same approach by two reviewers (A.R. and N.O.). A third reviewer (I.M.-G.) evaluated the studies in case of a disagreement. If needed, corresponding authors of the included studies were contacted for further information.
We extracted the following details from the included studies: publication year, study sample (study population, age, and sex), study methods (study length, frequency of the assessments, sampling design, and user devices), participant compliance to the protocol, self-reported stress (type of stress, number of stress items, description of the stress items, type of scales), cardiovascular and/or skin conductance measures (i.e., type of measure, variable used), and finally, the type of analysis used for associations of the stress assessments and covariates included, and its findings.

Methodological quality of the studies
Methodological quality of the included studies was evaluated using the Downs and Blacks Scale [33]. The checklist consists of 27 items and includes domains for study reporting (10 items), external validity (3 items), internal validity (bias and confounding) (13 items), and power (1 item) [33]. An item was scored 1 (Yes) if the criterion was fulfilled or 0 if inadequately reported, unable to determine, or not applicable. Overall quality rating per study was assessed using the corresponding quality levels as previously reported with a total possible score of 28 for randomized and 25 for non-randomized studies [34]: excellent (26)(27)(28); good (20)(21)(22)(23)(24)(25); fair (15)(16)(17)(18)(19); and poor (� 14). Study quality assessment was performed independently by one reviewer (AR), and in case of uncertainty, other members of the research team were consulted.

Statistical synthesis
Descriptive analyses were performed on all extracted variables. In case multiple studies reported on the same dataset, only the study with the largest sample size was considered for the descriptive analysis of the study sample (in case the sample sizes were identical, only the original publication was reported). Associations between self-reported stress and cardiovascular and/or skin conductance measures descriptively reported and linked with the type of measure (i.e., heart rate, heart rate variability, blood pressure, or skin conductance). If applicable, descriptive analyses were performed for associations based on study length and devices.

Results
The search identified overall 1,466 studies after the removal of duplicates. Screening of 297 full-text studies resulted in a total of 38 studies that fulfilled the inclusion criteria. A flow chart of the screening process is presented in Fig 1 and the extracted data are presented in Tables 1-4. Our search results identified only two studies reporting associations with self-reported stress and the level of skin conductance as a physiological variable [35,36]. Since we considered this too few to draw any meaningful conclusions, this systematic review focuses only on studies investigating the association between self-reported stress and cardiovascular measures (36 studies).
3.1.2 Self-reported stress measures. As expected, the literature search revealed that selfreported stress was assessed in a large variety of ways (see Table 1 for an overview of all selfreport measures used). In an attempt to categorize the approaches, we identified three main different ways in which researchers quantified self-reported stress: 1. The most common method in which self-reported stress was measured was NA, which was assessed either in the form of an average or sum score over several NA items or as a single item in 18 different studies. There was much heterogeneity in measures of NA. Most notably, some of the NA scales used in the studies reviewed here included low-arousal items such as "unhappy" [41,50,62,69], "ashamed" or "embarrassed" [46], "sad" [41,45,62], or "lonely" [62], which arguably do not adequately reflect the subjective experience of stress. We therefore also looked specifically at analyses on high-arousal only measures of NA (i.e., scales that solely consist of high-arousal NA items).

2.
Thirteen studies assessed self-reported stress directly through perceived stress, typically using only one item asking about current perceived feelings of stress. One study used a combination of perceived stress and NA measures to compute self-reported stress [63]. As this mixture might confound our results, we included this study only in the overall analyses and not in any of the sub-analyses on specific stress types.
3. All other types of self-reported stress were situational, inquiring about experienced stress related to current or recent situations or events. Specifically, three studies assessed associations with measures that involved recent stressful events, which we categorized as eventrelated stress. In both of these studies, participants were asked about the occurrence of a minor stressful event either in the past 60 or 30 minutes. Six studies reported findings on stress or strain related to a current or a recent task or activity, which we identified under the term activity-related stress, although each of the included studies opted for slightly different approaches (i.e., task strain, work-related stress, activity-related stress, and cognitive appraisal). Finally, seven studies included a measure of stress in social company or situations, which we clustered and viewed as social stress measures. However, each of these studies operationalized social stress differently (i.e., social conflict, feeling of annoyance, pleasantness of social interaction, and social-evaluative threat).
Scales that were used to measure self-reported stress were very heterogeneous; the most commonly used was the Likert scale, but the range of the anchor points varied from 5 to 11 points. Other less frequently used scales were the visual analog scale (VAS 0-10 or 0-100) and a binary (Yes/No) response option.

Study sample.
The selected studies with different datasets included a total of 4,393 participants, of which 3,678 (84%) were healthy participants and 715 (16%) clinical or at-risk populations. Mean age was 38.6 (SD = 12.0) years and 58% were female. Clinical or at-risk study populations were only included in 9 different datasets consisting of individuals with cardiovascular diseases (n = 135) [67], individuals at-risk for cardiovascular disease (e.g., pre-or mild hypertension) (n = 194) [38,41], individuals diagnosed with post-traumatic stress

Hawkley (2003)
Cognitive appraisal ratings: Ratio of how demanding they found the main activity to the degree to which they felt capable of meeting the demands of the activity Likert (5-point) 1 Kamarck (1998) Task Strain: Interaction between Demand (average over ["hard work", "fast work", and "juggling tasks"]) and Control (average over ["can change activity" and "chose activity"]) Likert (4-point) 3 + 2

PLOS ONE
The association between self-reported stress and cardiovascular measures in daily life: A systematic review   [37,56], individuals diagnosed with borderline personality disorder (n = 50) [59], individuals diagnosed with psychosis (n = 20) [62], individuals diagnosed with psychosis and individuals at-risk for psychosis (n = 67) [63], individuals at-risk for psychopathology (n = 91) [48], and individuals diagnosed with substance use disorder (n = 40) [61]. More information on sample characteristics is presented in the S2 File.

Study methods and compliance.
For self-reported daily life stress, the most frequently used diary design had a random sampling scheme (50% of studies) or a fixed sampling scheme (47% of studies). One study (3%) used a mixed sampling of both random and fixed   sampling schemes [43]. The random sampling schemes varied from 2 times per hour to 6 times per day and the fixed sampling scheme varied from every 20 minutes to three times per day. Studies measuring blood pressure (N = 19) were mostly 1-day study protocols in (N = 11; 56%) studies with an average of 2.5 (SD = 2.4; range = 1-10) study days. The frequency of blood pressure assessments varied from every 20 minutes to 5 times per day. For heart rate studies (N = 21), the length of the study days was on average 3.0 (SD = 5.0; range = 1-23) days, with the most common protocol being also a 1-day study protocol (N = 10; 48%). Study length in heart rate variability studies (N = 12) was also heterogeneous ranging from 1 to 6 days with  12 physical activity, 13 substance use (intake of alcohol, nicotine, caffeine, or recreational drugs), 14 food intake, 15 medication use, 16 temperature, 17 talking, 18 location, 19 mood, 20 presence of another person,    12 physical activity, 13 substance use (intake of alcohol, nicotine, caffeine, or recreational drugs), 14 food intake, 15 medication use, 16 temperature, 17 talking, 18 location, 19 mood, 20 presence of another person,

PLOS ONE
The association between self-reported stress and cardiovascular measures in daily life: A systematic review an average of 2.1 (SD = 1.1, range = 1-4) study days, with the most common protocol being again a 1-day (N = 4; 33%) or a 2-day (N = 4; 33%) study protocol. The frequency of HR/HRV assessments varied from 15-seconds intervals to only 5 times per day. The data collection method for self-reported daily life stress was reported in 27 (75%) studies, of which the most commonly used device was a dedicated device (i.e., built-in software on a personal digital assistant, mobile phone software, or handheld computer) in 17 (47%) studies. Six (6%) studies used a smartphone application and four (10%) studies used a traditional paper-and-pencil diary. Ambulatory devices to detect daily life stress varied across the studies (S1 Table). Ten different devices were used to measure blood pressure, with the most common device being the Spacelabs model 90207 (N = 5; 31% for SBP and 22% for DBP) [39,40,42,50,53]. Fifteen different devices were used to measure heart rate, with the most common devices being a Holter device in four (19%) studies [56,58,62,74], Spacelabs model 90207 in three studies [39,42,53], and VU-AMS in two (10%) studies [64,65]. For heart rate variability, seven different devices were identified and the most used devices were either a Holter monitor [56,58,62] or a Movisens EcqMove [70][71][72] device in three studies each (25%). The overall compliance rate for self-reported stress assessments was reported only in 15 (42%) studies, ranging from 58% to 99% with an average of 81% (SD = 12.1) compliance rate. Outliers, artifacts, or other forms of thresholds used to minimize the noise in the retrieved objective data for cardiovascular stress measures were reported in 29 (81%) studies. Only 11 of these studies (31% of all included studies) reported the values of missing data due to outliers, artifacts, or other thresholds in cardiovascular measures.

Methodological quality of the included studies.
The overall methodological quality of the studies was fair (S2 Table). Most of the studies were characterized by good study reporting (domains 1 to 10). The major issue in the quality of the studies was the lack of external validity related to insufficient reporting of the proportion of the source population where the participants were derived. Also, the quality assessment revealed an increased risk of selection bias in recruitment over a period of time, which was not clearly reported in most of the studies.

Descriptive results on the associations between self-reported stress and cardiovascular measures
The 36 studies yielded a total of 135 separate analyses associating self-reported stress measures with ambulatory cardiovascular measures. Overall, studies reporting 38 out of 135 analyses (28%) revealed statistically significant associations in the expected direction. Another nine (7%) analyses were only marginally significant (i.e., p < .10) in the expected direction. Included studies in this review used different statistical approaches to test associations between self-reported stress and cardiovascular measures; 32 studies used multilevel modelling (MLM), three studies used generalized estimating equation (GEE), and one study used correlation analysis at the within-subject level.

Cardiovascular measures.
The 18 studies reporting on associations with BP made use of 16 different datasets and provided a total of 76 analyses. Of those analyses, 22 (29%) indicated a significant positive association with self-reported stress and three more (4%) showed a marginally significant positive association. For SBP, 12 out of 36 analyses (33%) showed a significant association in the expected direction with another three (8%) only reaching marginal significance; 10 out of 36 (28%) analyses were significant in the expected direction for DBP. PP was positively associated with self-reported stress in one analysis and unrelated in the other (50%), for MAP no (marginally) significant associations were reported in either of the two analyses.
As with BP, analyses of associations with self-reported stress for HR provided significant associations only in a minority of analyses. Results in the expected direction were obtained in 9 (26%) out of 35 analyses. Marginally significant positive associations were found in four (11%) analyses.
Seven (29%) out of 24 analyses on associations with HRV showed significant associations in the expected direction; two (8%) more only reached marginal significance. Looking at only the 13 analyses on frequency-domain measures (i.e., HF/LF-HRV), the distribution showed a significant association in the expected direction in six (46%) of the analyses and a marginally significant association in the expected direction in one (8%) of the analyses. For the eleven analyses on time-domain measures (i.e., mean r-r interval, MSSD, RMSSD), this was one (9%) and one (9%) out of 11 analyses, respectively.

Study populations.
In healthy participants, 24 (24%) out of 100 analyses were significant in the expected direction and 8 (8%) were marginally significant. In contrast, of the 22 analyses done in patient samples (or a combined sample including patients), nine (41%) analyses showed a significant and one (5%) analysis a marginally significant association in the expected direction. At-risk samples showed a significant association in the expected direction in five (71%) out of seven analyses.

Study methods and compliance.
Associations per study characteristic (i.e., length of the study and sampling [fixed vs. random] technique) and used devices are reported in the S1 Table. Due to the lack of reporting compliance in 21 (58%) of the studies and large heterogeneity in reporting the frequency of the assessments (Tables 2-4), no descriptive analyses were conducted of their possible influences on the associations. Based on our descriptive analyses of the extracted data available, studies that found association with HR included more study days (average of 6.0 study days) than studies that did not find associations with HR (average of 2.2 study days). Also, there was a marginally significant association between selfreported stress and cardiovascular analyses in study protocols using a fixed sampling compared to a random sampling scheme (S1 Table).
Regarding the used devices, we found that within the analyses using Spacelabs 90207 equipment, significant associations were reported in 15% for SBP (2/13), 18% for DBP (3/17), and 17% for HR (1/6) of the analyses. For the Spacelabs 90217 equipment, significant associations were found in 25% for SBP (2/8) and 25% for DBP (2/8) of the analyses. For HRV, the most commonly used device (i.e., Holter monitoring) was used in six analyses resulting in significant association only in 33% (2/6) of the analyses.

Discussion
The purpose of this review was to investigate how self-reported stress and ambulatory cardiovascular measures are operationalized in daily-life studies, and what the evidence is for an association between self-reported stress and cardiovascular responses indicative of ANS activity in these studies. Based on our descriptive synthesis, much heterogeneity was evident between studies in terms of self-reported stress assessment, methodology, devices, and study population. Overall, the studies reviewed here showed an association in the expected direction between self-reported stress and cardiovascular parameters in 28% of analyses (35% when including marginally significant associations). This percentage is slightly higher than the 25% found in a previous systematic review of 12 laboratory studies investigating associations between self-reported and cardiovascular measures of stress [24]. Results did, however, show variability among self-reported stress and cardiovascular measures. Significant and marginally significant associations were observed in analyses on perceived stress measures (55%), and least likely (18%) in analyses on activity-related stress measures. With (marginally) significant associations in 54% of the analyses, frequency domain measures of HRV yielded the most positive results, whereas MAP and SCL were not associated with self-reported stress.

Descriptive findings of the association between self-reported stress and cardiovascular measures in daily life
The fact that self-reported stress and cardiovascular measures were only significantly associated in less than a third of all studies is not completely unexpected. For instance, the experienced intensity of an emotion is considerably stronger associated with behavior than with physiology, emphasizing the social nature of this inter-system coherence [28]. Interestingly, Mauss et al. [28] found that the association between perceived intensity and both behavior and physiology was weaker for negative emotions than for positive emotions. According to them, this may have to do with social norms according to which individuals are expected to control negative emotions, more so than positive emotions, creating incoherence. This may certainly explain why stress, while not technically an emotion, does often not present itself as a very cohesive construct across different systems. Still, it must be emphasized that neither based on prior research, nor on the current review, can we conclude that there is no association between these systems. What we can conclude is that the likelihood of finding an association between a single self-report stress measure and a single physiological variable is moderate at best. On the other hand, machine-learning models, such as support-vector machines, random forest models, and Bayesian networks, have been able to, based on a set of physiological features, predict self-reported stress with high accuracy [75][76][77][78]. Such findings encourage the idea that there is cohesion among these systems, even though "simpler" models may not be able to detect its complex patterns. Moreover, there is much inter-individual variation in the physiological stress response, calling for personalized models that can capture the individual's physiological signature. In the end, a single stress measure provides incomplete information about an individual's stress level, and in order to provide a better picture, stress should be assessed on all three levels: experientially, behaviorally, and physiologically, while keeping in mind its highly personalized character.

Self-reported stress assessment
The large heterogeneity in self-reported stress measures used in the studies reviewed here complicates any inference to be made. The most homogeneous self-reported stress measure types, perceived stress and event-related stress, showed the most consistent results. Taking a liberal stance, of all analyses reported here on perceived stress in daily life, more than half showed a significant or a marginally significant association with cardiovascular variables in the expected direction. Given that NA scales that only included high-arousal items more often showed associations with cardiovascular measures than scales including low-arousal items, it can be argued that they better capture the subjective experience of physiological arousal, similar to the perceived stress measure. Inclusion of such low arousal items may obscure a potential association between the high-arousal items and physiology, hence should be avoided.
Results for event-, activity-, and social-related stress showed associations in only 26% of the analyses. A factor of influence here could be the variability between self-reported stress items of the same type. Social stress in particular varied widely in its operationalization, with only two studies having a similar approach. Although this heterogeneity may explain part of the mixed findings, it hampers the comparability of the studies. For activity-related stress, although operationalizations differed between and within studies, most measures were variations to the demand and control model of Karasek [20], meaning they are rooted in theory. However, the 12% success rate should, at the least, urge researchers to re-evaluate self-reported activity-related stress measures in association with cardiovascular measures. Event-related stress, on the other hand, is a retrospective measure, meaning that associations with physiology assessed up to an hour later, or averaged over the past hour, may be generally weaker. Indeed, the time-frame in which the stressful events were allowed to occur seems to play a role, with better results for shorter (i.e. 30 minutes) than longer (i.e. 45-75 minutes) time-windows. However, based on only a handful of studies, these conclusions need to be taken with caution. Still, out of the three types of situational stressors, event-related stress performed best, with (marginally) significant associations in two-thirds of all analyses.
Taken together, we observed high heterogeneity in between and within studies, which also indicates that there seems to be no consensus on how to assess ambulatory self-reported stress in the current research field of detecting stress in daily life. Factors that are likely contributing to this heterogeneity are decisions made on study sample, measures, sampling methods, and statistical analyses, in addition to the variability within the measures themselves. Future studies should opt for more evidence-based measures. Based on this review, perceived stress, higharousal NA, and event-related stress measures with short time intervals were most convincing, with associations in the expected direction in about 40-60% of all analyses. Needless to say that this is far from convincing evidence in favor of an association between self-reported stress and cardiovascular measures in daily life.

Study populations
This descriptive review's findings suggest that studies investigating the associations between self-reported stress and cardiovascular measures in daily life are relatively more often reported in patient samples than in samples of healthy volunteers. In most cases, patient samples consisted of individuals diagnosed with a psychiatric disorder (i.e. PTSD, psychosis, BPD, substance abuse); in one study, they were diagnosed with cardiovascular disease. Although these conditions are very different, stress has been shown to play a role in all of them. Moreover, analyses on individuals at risk for stress-related disorders showed significant associations in the expected direction in five out of seven analyses. In psychosis, even compared to patients, at-risk individuals show an increased affective reactivity to daily-life stress [11,79]. Possibly, increased reactivity denotes a stronger coupling of subjective experience and physiology, which results in stronger associations between the stress systems.
One possible explanation for sample differences is that patients and individuals at risk may experience more fluctuations in their stress levels during the day, providing for more variability in the data and hence increase the likelihood of finding an association. An alternative explanation comes from the fact that, for all studies reviewed here, the onset of stress most likely occurred at some point in between diary entries. Consequently, all results reported here reflect associations in the recovery phase of acute stress or even chronic stress levels. The recovery phase is an interesting, yet often overlooked phase of the stress response that is delayed in individuals at risk for and at early stages of mental illness [79,80]. Interindividual differences in recovery may therefore contribute to a weakening of the association over time. This could explain the finding that associations between self-reported stress and cardiovascular measures were found in the majority of studies on individuals with a clinical diagnosis or individuals at risk, as recovery may be delayed in these populations. For healthy participants, an association between the two measures may be present during the acute stress response but diminished by the time of the first assessment moment. This possibility is, however, not supported by the findings of Campbell and Ehlert [24] reporting associations to acute stress in only about 25% of all studies.

Study methods and compliance
Our descriptive analyses identified a vast heterogeneity in used devices. High heterogeneity in used devices may be explained by the increase in technological developments that enables more ambulatory devices to be used in a real-world setting. However, it also indicates that one device is not shown to be more beneficial to observe associations between self-reported stress and cardiovascular measures, although our findings are limited to show this direction statistically. We also acknowledged that the validation of these used devices was not always well reported, and some devices have shown controversial results on its validity. For example, the most commonly used ABP monitor (Spacelabs model 90207) has raised concerns of its possible limitation as the readings were altered by venous blood redistribution [81], and a direct effect of cuff inflation lead to the underestimation of ongoing HR during a cuff-based ABP [82]. Besides these limitations, the model has been shown to be a valid monitoring tool to measure ABP [82,83].
For study protocols, our findings did not indicate differences between the associations based on the study length except for studies that used HR devices. Based on the data available within HR analyses, our findings indicated that associations were more often found in studies with longer study period (i.e., average of 6.0 study days) than studies with less study days (average of 2.3 study days). This finding may suggest that more study days may be recommended for studies combining self-reported stress and HR measure. However, this needs to be interpreted with caution while only nine (26%) of the analyses found significant association from the total HR analyses and the length of the study is also driven by the research questions and hypotheses, and therefore, it is challenging to recommend a certain type of protocol to be more favorable of another. Furthermore, our findings showed differences in used sampling techniques. As can be expected, studies using blood pressure measures more often opted for a fixed sampling scheme. Compared to studies with random sampling schemes, those using a fixed sampling scheme tended to show better associations between self-reported stress and cardiovascular measures (S1 Table). However, these findings need to be interpreted carefully as the distribution of sampling techniques varied between and within studies.
From a study quality perspective, most studies reported the description of the study procedure in detail, but compliance rates of the diary protocol were only reported in half of the studies. Also, thresholds to be used to minimize the noise in physiological data were fairly consistently reported, but the actual amount of excluded data due to this was only reported in a few studies. These methodological findings call for a more precise approach when it comes to the description of the missing data to increase the quality of their study protocol. Lastly, one noteworthy methodological finding was that the scales and the chosen items to measure selfreported stress were heterogeneous across the studies. This confirms that, although numbers are increasing, studies on daily life stress are still scarce, and the definitions of stress are driven by different approaches of individual studies.

Strengths and limitations
One strength of this systematic review is that it provides the first literature overview of associations between self-reported stress and cardiovascular measures measured simultaneously in a real-world setting. Also, this review provides insight into the relationship between selfreported stress and cardiovascular measures in a real-world setting. At the same time, we need to consider some limitations. Firstly, the studies included in this review were very heterogeneous in terms of their statistical analyses. Although most studies used MLM, the large diversity between models prevented us from conducting meta-analyses based on the reported values of the associations per stress measure. Therefore, an accurate estimate of the strength of the association could not be provided. Future investigations should put in effort to mirror the statistical models and methods previously used to estimate the association between self-reported and cardiovascular measures of stress so that meta-analyses can be conducted. Secondly, all our conclusions are based on group analyses. There are large interindividual differences in stress responses and these findings have to be interpreted bearing that in mind. Thirdly, the overall level of the study quality was fair, where the major lack of reporting was related to external validity and selection bias (S2 Table). The most concerning issues in study quality were in reporting the proportion of the population source and the time period of the recruitment. Therefore, more attention should be given to the reporting of the population source and the timing period of the recruitment in future studies. Finally, we chose the somewhat arbitrary cut-off of 1-day monitoring (i.e., 24h) as an exclusion criterion. ESM is a method that aims to capture multiple snapshots of an individual's everyday life, and study periods of less than a day may not be long enough to do so. Despite these limitations, we believe that our review gives important insights into the use of self-reported stress and cardiovascular measures in a real-world setting, which hopefully will raise the awareness to investigate this topic more in the future.

Conclusions and recommendations
Overall, this systematic review shows that daily-life self-reports of stress and cardiovascular measures were associated in 28% of analyses (35% when including marginally significant findings). Analyses on perceived stress, high-arousal NA, or event-related stress measures, frequency-domain HRV measures, or in patients or at-risk populations had a larger proportion of analyses that were statistically significant; analyses on activity-related stress or low-arousal NA measures, time-domain HRV measures, or studies using Spacelabs ambulatory blood pressure equipment had lower rates. Therefore, based on this review, we recommend researchers to use the following when investigating the association between self-reported stress and cardiovascular measures: • Perceived stress or high-arousal NA self-report measures • High-quality wearable sensors that have been validated in ambulatory settings • Time windows of max 30 minutes prior to self-report for the calculation of continuously assessed cardiovascular measures • Study periods of at least 6 days with multiple assessments per day to capture enough variability in both measures • Multilevel modelling for the statistical analyses Although the results reviewed here are far from convincing, even when accounting for a possible publication bias, if the experiential stress response would be unrelated to its physiological counterpart, it would be a strong claim that all analyses showing a significant association in the expected direction rely solely on type-I errors. As stress is marked by an increase in perceived stress and activation of the ANS, it is difficult to imagine that these two systems are in no way correlated. However, how they interact and are related over time is still largely unknown and this review provides a first step in disentangling their relationship.