Trends in voice characteristics in patients with heart failure (VENTURE) in Switzerland: Protocol for a longitudinal observational pilot study

Introduction Heart Failure (HF) is a major health and economic issue worldwide. HF-related expenses are largely driven by hospital admissions and re-admissions, many of which are potentially preventable. Current self-management programs, however, have failed to reduce hospital admissions. This may be explained by their low predictive power for decompensation and high adherence requirements. Slight alterations in the voice profile may allow to detect decompensation in HF patients at an earlier stage and reduce hospitalizations. This pilot study investigates the potential of voice as a digital biomarker to predict health status deterioration in HF patients. Methods and analysis In a two-month longitudinal observational study, we collect voice samples and HF-related quality-of-life questionnaires from 35 stable HF patients. Patients use our developed study application installed on a tablet at home during the study period. From the collected data, we use signal processing to extract voice characteristics from the audio samples and associate them with the answers to the questionnaire data. The primary outcome will be the correlation between voice characteristics and HF-related quality-of-life health status. Ethics and dissemination The study was reviewed and approved by the Cantonal Ethics Committee Zurich (BASEC ID:2022-00912). Results will be published in medical and technical peer-reviewed journals.


Introduction
Heart Failure (HF) is a major health and economic issue worldwide. HF-related expenses are largely driven by hospital admissions and re-admissions, many of which are potentially preventable. Current self-management programs, however, have failed to reduce hospital admissions. This may be explained by their low predictive power for decompensation and high adherence requirements. Slight alterations in the voice profile may allow to detect decompensation in HF patients at an earlier stage and reduce hospitalizations. This pilot study investigates the potential of voice as a digital biomarker to predict health status deterioration in HF patients.

Methods and analysis
In a two-month longitudinal observational study, we collect voice samples and HF-related quality-of-life questionnaires from 35 stable HF patients. Patients use our developed study application installed on a tablet at home during the study period. From the collected data, we use signal processing to extract voice characteristics from the audio samples and associate them with the answers to the questionnaire data. The primary outcome will be the correlation between voice characteristics and HF-related quality-of-life health status.

Ethics and dissemination
The study was reviewed and approved by the Cantonal Ethics Committee Zurich (BASEC ID:2022-00912). Results will be published in medical and technical peer-reviewed journals.

Introduction
Heart Failure (HF) is characterized by the heart's incapacity to pump sufficient blood to meet the body's metabolic needs [1]. It afflicts over 64.3 million people worldwide and is increasing in prevalence [2]. According to data from 2015 to 2018, an estimated 6 million American adults suffered from HF [3]. High hospital readmission rates not only pose a tremendous burden on a patient's health status, morbidity, and mortality but also significantly increase national healthcare costs [4]. The average cost per patient per hospitalization in Europe is approximately €10,000 [5]. The total cost of care for heart failure in 2020 is estimated at $43.6 billion in the US [6]. While HF hospitalizations constitute 60% of the total expenditures associated with HF [1], many hospital admissions are considered preventable [7]. It implies that these costs, mainly caused by repeated and lengthy hospitalizations, can be significantly reduced.
To handle this amount of patients and to reduce costs, self-management programs empower patients as they can actively contribute to managing their disease and participate in continuous education, self-care promotion, and therapy adherence [8]. Current self-management programs advise patients to monitor their daily weight and call their physician if they experience rapid weight gain or clinical signs of congestion such as peripheral edema [9]. Selfmanagement programs (e.g., weight, blood pressure), however, have limited predictive power for decompensation and require high adherence and constant commitment [10,11]. Not surprisingly, clinical trials involving a broader population and using traditional self-management programs have not yet reduced hospitalizations [12].
On this basis, we argue that passive, non-invasive, and user-friendly tools for HF patients are required to facilitate self-management. Remarkably, the monitoring of voice has the potential to identify physiological changes and health conditions [13]. Speech sounds are created by pressure in the lungs and then propagated through the vocal tract [14]. As decompensation approaches, i.e., when the heart can no longer maintain efficient circulation, HF patients develop fluid retention throughout the body, including the lungs. In particular, the vocal folds consist of thin tissue layers that might be particularly sensitive to HF-related edema [15]. We hypothesize that the vocal folds are more sensitive to fluid accumulation than weight changes. Thus, voice monitoring may allow timely detection of fluid volume imbalance (e.g., volume overload) and congestion and prevent acute decompensated heart failure events that require hospitalization.
Mobile devices such as smartphones and tablet computers have become ubiquitous in our lives, e.g., 85% of US adults owned a smartphone in 2021 [16]. Additionally, voice-based conversational agents (VCAs), such as Apple's Siri, are now used on more than 2.5 billion smartphones, tablets, smart speakers, and wearable devices. To this end, voice samples of HF patients can be easily collected at home from their voice commands to VCAs. Thanks to the widespread use of information and communication technologies, voice analysis can be used as a scalable, tailored, and cost-effective health monitoring system. Further, a voice monitoring system may ease the interaction with technology for the elderly, as studies have shown the use of voice interfaces appears to be easier and more acceptable for older and frail people [17][18][19]. Moreover, the voice monitoring system can especially support individuals with low literacy or intellectual [20], and motor or cognitive disabilities [21].
This research aims to contribute to developing a digital vocal monitoring tool as a new biomarker for HF. We use voice features collected from mobile devices and integrate voice analysis into patient self-care. Voice can be readily captured as a health parameter to identify the likelihood of progression of HF. Collecting voice samples passively while being of clinically valuable quality can contribute to two major issues of remote care for HF patients: the lack of adherence and the need for robust and clinically relevant measurements in patients' homes.

Hypothesis and objectives
We hypothesize that fluctuations in voice features extracted from recorded voice samples from HF patients using smart devices can discriminate HF-related health status changes. The primary objective is to investigate how the variations in the overall health status of HF patients correlate with the fluctuations in voice characteristics.
The secondary objectives of this study are to: • compare voice characteristics with the current standard of care (i.e., weight and blood pressure).
• investigate how the health status correlates with the combination of nocturnal cough frequencies and physiological data (e.g., steps).
• investigate how voice characteristics fluctuate with mental health conditions (e.g., depression and anxiety).
• evaluate the technology acceptance of the developed application in HF patients.

Study design
In a 2-month prospective, longitudinal observational study, we collect voice samples daily and calculate voice characteristics fluctuations from those. The primary endpoint is the potential association between voice characteristics and health status variations. The variations in health status are assessed using the 23-item Kansas City Cardiomyopathy Questionnaire (KCCQ) [22]. The KCCQ quantifies physical limitations, symptoms, self-efficacy, social interference, and quality of life of HF patients. While the KCCQ is valid for two weeks, this study includes the HF Symptom Tracker (HFaST) questionnaire to assess daily symptoms in HF patients [23]. The study duration per patient is 57 days, respectively 56 nights, which enables five repeated measurements of the KCCQ score.
To compare the new vocal biomarker to current surrogate biomarkers (i.e., weight and blood pressure), HF patients measure weight and blood pressure daily following Swiss Heart Foundation self-management guidelines [24]. Besides, we collect nocturnal cough frequency, physiological data, mental health, and technology acceptance as secondary research endpoints. Nocturnal cough monitoring data are collected every night with an application developed in our previous work [25 -27]. The application has a cough detection model developed based on 26,166 cough samples from 94 adults with asthma. The model yielded an accuracy of 99.8% under 15 new patients. Physiological data (e.g., heart rate, steps, stress) are measured daily with a smartwatch, Garmin Vivoactive 4s. The mental health conditions (e.g., depression and anxiety) are assessed biweekly with the 9-item Patient Health Questionnaire (PHQ) [28] and the 7-item Generalized Anxiety Disorder (GAD) questionnaire [29]. Both questionnaires are brief, well-validated measures for detecting and monitoring depression and anxiety [30]. All the measurements are recorded using a Lenovo K10 tablet at home. At last, we apply the technology acceptance model [31] to examine whether our technology is acceptable and userfriendly.

Eligibility criteria
A selected group of HF patients will be studied to maximize the likelihood of health fluctuations observed in the biweekly KCCQ. Table 1 demonstrates the applied inclusion and exclusion criteria.

Sample size
The study is powered on the primary objective: the association between the patient's health status changes and daily collected voice feature fluctuations. We simulate the number of required patients to reach power (1-β) of 95% with a two-tailed type 1-error probability (α) of 5%, rejecting a zero bivariate correlation with a dropout rate of 15%. The R simulations (n = 1000) relied on the MASS:mvrnorm function to generate two correlated random samples to fit a linear mixed-effects model using the lmerTest:lmer function [32,33]. Based on previous studies [15,34,35], it is reasonable to assume: (i) a correlation of 0.3 between the voice characteristics and the KCCQ score, (ii) a mean KCCQ value in stable HF NYHA class II of 73±19, and (iii) a mean voice biomarker value of 141 ± 16 with five repeated KCCQ measures. The sample size is calculated at 35 patients, which gives a power of 95%. Assuming a dropout rate of 15%, we will recruit 41 patients.

Study setting & initial meeting
Patients are recruited at the University Hospital of Zurich (USZ) based on their clinical HF diagnosis and are included in the study. Health professionals perform medical assessments during baseline and follow-up visits. Table 2 shows the patient's assessments and measurements. During the baseline visit, the study team will recruit patients with high concentrations of NT-proBNP. The study team will give informed consent to the eligible patients and ask for their willingness to participate in the study.
The first participant was included in September 2022. At the time of manuscript submission, two participants have completed the study. Data collection is expected to conclude in the third quarter of 2023. During the data collection phase, one patient is required to complete measurements at least 80% of the time throughout the study period (i.e., more than 45 days in

PLOS ONE
Trends in voice characteristics in patients with heart failure (VENTURE) in Switzerland Protocol 57 days). The KCCQ questionnaire should appear five times in one study phase, and the patient needs to finish it at least four times.

Measures
During the study period, all measurements are taken using the tablet on which the study applications are installed. The study consists of active and passive measurements. The duration and frequency of the different measurements are summarized in Table 3.
Active measurements. The patients are required to measure their weight, perform voice exercises, fill out questionnaires, and measure their blood pressure in sequence. It takes around 10 minutes per day to perform all the measurements. Audios are recorded with a sampling rate of 44.1kHZ, 16 bits per sample, and the pulse-code modulation codec. Besides the routine

PLOS ONE
Trends in voice characteristics in patients with heart failure (VENTURE) in Switzerland Protocol daily measurements, the KCCQ, GAD, and PHQ questionnaires are alternately displayed in the study app, depending on the week. The KCCQ questionnaire appears on the 1st, 15th, 29th, 43rd, and 57th day, representing the health status in the last two weeks, while the questionnaires about depression and anxiety, GAD and PHQ, appear on the 8th, 22nd, 36th, and 50th day. The answers to all the questionnaires are consistently and uniformly assessed. Passive measurements. Nocturnal cough frequency is measured automatically at night during patients' sleeping time. Patients are instructed to place the tablet on the nightstand or near their bed. Optionally, patients are equipped with a smartwatch that passively collects continuous physiological data. The data from the smartwatch is automatically synchronized when the smartwatch is close enough to the study tablet.

Outcomes
The primary outcome is the association between voice characteristics and health status fluctuations in HF patients. We hypothesize that the voice contains sufficient correlates of healthrelated information and could serve as an effective monitoring biomarker. The secondary outcomes are: • comparison of health status prediction using voice characteristics and current standard of care (i.e., weight and blood pressure).
• the correlation between health status fluctuations and the combination of nocturnal cough frequencies and physiological data.
• the correlation between voice characteristics and mental health status.
• the technology acceptance of the voice capture application.

Data management
We have designed a study application to collect all active measurements (e.g., audio data, blood pressure, weight, and questionnaire answers). Firstly, we store all measurements in a local SQLite database on Android. Then we transmit the measurements with the tablet's metadata over a secure protocol (i.e., HTTPS) and save them remotely on a password-protected MySQL server hosted by ETH Zurich. Nocturnal cough audio data is saved on the tablet, and only cough frequencies are uploaded to the server. The physiological data is measured with a smartwatch and uploaded to Labfront, which were utilized in psychological studies and clinical trials [37]. Participants' smartwatches directly connect to the Labfront app via a Bluetooth connection. The study team will manually transfer data from the Labfront cloud to the database. Fig 1 demonstrates the data pipeline from the study app to the web server. Once a patient has completed the study, data are temporally saved on an encrypted local hard drive and then uploaded to ETH Zurich's Leonhard Med Secure Scientific Platform [38] for data storage and analysis.
We have developed a dashboard for monitoring data and the status of patients. Physicians have access to the measured data of each patient (pseudonymized by a unique hash code) through a website. Fig 2 depicts the dashboard that visualizes measurements, tablet's metadata, and study progress (e.g., 50% means the patient has conducted the study for one month).
All clinical data are taken from the electronic patient file and transferred to paper case report forms. Paper case report forms are stored in a locked room in the University Heart Center Zurich study center.

Retention
We promote adherence to the data collection phase using several methods. First, participants receive an artistic picture as a small reward after completing their daily measurements. The app changes pictures daily and depicts one scenic location worldwide with a description. Fig 3  shows four screenshots of the study application.
Second, the study application involves a reminder system for measurement. If an unexpected issue happens, e.g., the participant fails to upload the data, the tablet has a low battery  The "Status" dashboard shows the tablet's metadata, e.g., storage, network connection, and the study progress. https://doi.org/10.1371/journal.pone.0283052.g002

PLOS ONE
Trends in voice characteristics in patients with heart failure (VENTURE) in Switzerland Protocol (�40%), or there is no network connection, the participant will receive an SMS notification on their private phone or an email. In the event of non-adherence for two consecutive days, the study team will call the participant. If the tablet encounters an issue (e.g., it is turned off unintentionally by the patient), the study team will access the tablet with remote control and fix the issue.
Third, if the patient does not complete the biweekly KCCQ questionnaire on the given day, the app will display the questionnaire over the next three days until the patient finishes that. It maximizes the chance that the patient will complete the KCCQ questionnaire. If the patient has not completed the questionnaire after three days, it is considered a failed attempt, and the patient will be excluded from the study.

Feature extraction
First, we use signal processing to calculate parameters that describe energy changes and amplitude changes in time and frequency from collected audio samples. Based on previous studies [39][40][41]) and pre-established libraries (i.e., Praat, librosa, and openSMILE), we will extract the following voice features, e.g., fundamental frequency, formants, jitter, shimmer [42], creak, and Mel-Frequency Cepstral Coefficients [43].

Correlation analysis
We will investigate the correlation between voice characteristics and health status, represented by KCCQ scores, at between-and within-subject levels. We will perform the repeated measures correlation (rmcorr) method using a statistical programming package available in R (rmcorr) or Python (pingouin.r_corr) [44].

Predictive modeling
KCCQ scores constitute the ground truth, and the selected voice features serve algorithm features. We will predict the health status using typical classification methods, e.g., random

PLOS ONE
Trends in voice characteristics in patients with heart failure (VENTURE) in Switzerland Protocol forests, support vector machines, neural networks [45]. Afterwards, to further analyze and predict longitudinal time series data, we will explore recurrent neural networks and long shortterm memory [46,47].

Missing data
Various techniques for imputing missing data points will be considered (e.g., imputing mean, median, most frequent, or constant values) [48]. Depending on the type of variables, we will consider more advanced techniques (e.g., regression, interpolation, extrapolation, k-NN, multivariate imputation, and datawig) [49,50].

Ethics considerations
The study was approved by Swissethics (BASEC-ID: 2022-00912). This research project will be conducted following the Declaration of Helsinki [51], the principles of Good Clinical Practice, the Human Research Act (HRA), the Human Research Ordinance (HRO) [52] as well as other locally relevant regulations.

Novelty
Current studies have shown research potential for voice analysis in HF patients. Murton et al. analyzed the voice of HF patients treated for decompensated HF and showed changes in patient voice between baseline and study end measurements [15]. Sara et al. have shown that in HF patients, there is an association between invasively measured indices of pulmonary hypertension and a linear combination of the 223 acoustic features (e.g., Mel cepstrum representation and formant measures) extracted with "Vocalis Health" application [53]. Using the same application to extract voice features, Maor et al. have shown that vocal biomarkers calculated from patients' recorded speech correlate with the likelihood of hospitalization and death [54]. Amir et al. compared voice recordings of 40 patients with acute decompensated HF to baseline recordings and successfully identified them as significantly different in 87.5% of cases [55].
While preliminary evidence suggests an association between voice signals and adverse outcomes in HF patients, previous studies have focused on the extreme states (i.e., decompensation and non-decompensation) but not on the continuous voice changes over time. Moreover, there is no evidence that voice characteristics would add value to the standard of care. It is crucial not only to assess whether the features of the voice are informative but also whether this measurement is complementary to the established measures. Very few researchers focused on the prediction power of voice, and there was no remote monitoring system based on a vocal biomarker.
Our approach differs from previous research in the following aspects. First, we look into how longitudinal changes in voice can be associated with changes in health status over time. Second, we assess the added value of voice monitoring compared to the current standard of care, weight, and blood pressure measurements. Third, we use advanced machine learning methods on real-world audio data to predict health deterioration.

Limitations
The study has certain limitations. First of all, voice analysis can be subject to various biases. Voice depends on different factors, among others, age, sex, and ethnicity. Since 70-80% of HF patients are male, the gender ratio is not set at 1:1 for better recruitment. Selection bias due to the technology adoption rate among seniors and German-speaking ability might also undermine generalizability. Besides, voice differs when the patient has upper respiratory infection symptoms in the study period. Thus, we propose the control question to control the confounding effects that might affect the voice.
Secondly, scales and blood pressure monitors are not uniformly provided. Patients use their own devices instead (unless they do not have them).
Thirdly, the applied cough detection model has yet to be developed based on data from HF patients and has not been used on tablets. To guarantee the functionality of the cough detection model, the tablet also records sound during the night. In this way, we can correct for wrongly counted coughs and post-hoc, acoustically verify the cough frequencies reported by our models to evaluate the performance of our models.

Conclusion
The proposed longitudinal study focuses on the association between continuous voice fluctuations and the health status of HF patients over time. We evaluate the potential of using voice characteristics to predict health deterioration. Furthermore, the comparison of voice analysis to the standard of care and patient feedback will reveal to what extent voice is accepted as a new monitoring tool. The study aims to lay the groundwork for using the voice to recognize clinical deterioration and initiate early intervention and management.