COVID RADAR app: Description and validation of population surveillance of symptoms and behavior in relation to COVID-19

Background Monitoring of symptoms and behavior may enable prediction of emerging COVID-19 hotspots. The COVID Radar smartphone app, active in the Netherlands, allows users to self-report symptoms, social distancing behaviors, and COVID-19 status daily. The objective of this study is to describe the validation of the COVID Radar. Methods COVID Radar users are asked to complete a daily questionnaire consisting of 20 questions assessing their symptoms, social distancing behavior, and COVID-19 status. We describe the internal and external validation of symptoms, behavior, and both user-reported COVID-19 status and state-reported COVID-19 case numbers. Results Since April 2nd, 2020, over 6 million observations from over 250,000 users have been collected using the COVID Radar app. Almost 2,000 users reported having tested positive for SARS-CoV-2. Amongst users testing positive for SARS-CoV-2, the proportion of observations reporting symptoms was higher than that of the cohort as a whole in the week prior to a positive SARS-CoV-2 test. Likewise, users who tested positive for SARS-CoV-2 showed above average risk social-distancing behavior. Per-capita user-reported SARS-CoV-2 positive tests closely matched government-reported per-capita case counts in provinces with high user engagement. Discussion The COVID Radar app allows voluntarily self-reporting of COVID-19 related symptoms and social distancing behaviors. Symptoms and risk behavior increase prior to a positive SARS-CoV-2 test, and user-reported case counts match closely with nationally-reported case counts in regions with high user engagement. These results suggest the COVID Radar may be a valid instrument for future surveillance and potential predictive analytics to identify emerging hotspots.


Results
Since April 2nd, 2020, over 6 million observations from over 250,000 users have been collected using the COVID Radar app. Almost 2,000 users reported having tested positive for SARS-CoV-2. Amongst users testing positive for SARS-CoV-2, the proportion of observations reporting symptoms was higher than that of the cohort as a whole in the week prior to a positive SARS-CoV-2 test. Likewise, users who tested positive for SARS-CoV-2 showed above average risk social-distancing behavior. Per-capita user-reported SARS-CoV-2 positive tests closely matched government-reported per-capita case counts in provinces with high user engagement.

Introduction
The world is in the throes of the coronavirus-disease-2019 (COVID-19) pandemic with more than 100 million cases and over 2 million confirmed deaths worldwide as of December 2020 [1]. In the Netherlands, the first case of COVID-19 was diagnosed in February 2020 and since then over one million cases and 17,500 deaths have been confirmed [2]. To date more than 60,000 COVID-19 patients have been admitted to Dutch hospitals, with over 12,000 of these eventually admitted to intensive care [2]-this in a country with just over 1,000 intensive care beds [3]. The strategies of Test Trace and Isolate (TTI), and of measures intended to reduce social contact, have been widely adopted to "flatten the curve" [4,5]. An important limitation of the TTI strategy is transmission of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) by COVID-19 carriers without symptoms. Given their lack of symptoms, they may not be tested and remain unidentified by the TTI process despite being a possible source of viral transmission [6]. Recent studies show that this subpopulation may account for as much as half of COVID-19 transmissions [6,7]. An instrument to continuously monitor social-distancing behavior and symptoms in the population at a local level may support and improve the TTI process by decreasing the delay in identification of risk areas and populations. Research using voluntary symptom self-reporting apps performed in the United Kingdom, the United States of America, and Israel show promising results in the local prediction of COVID-19 using symptom-based tracking [8][9][10]. However, we find no apps using voluntary social-distancing behavior-reporting to track local COVID-19 hotspots.
During the first COVID-19 wave in the Netherlands, the Leiden University Medical Center (LUMC) and the tech company ORTEC developed and introduced the COVID Radar app. This questionnaire-based app allows individuals to anonymously report COVID-related symptoms and social-distancing behaviors on a regional and population level. The app provides users with direct feedback on, and peer comparison with, their reported social-distancing behavior and symptoms. Our theory is that tracking of symptom and social-distancing behavior data at a population level can be used to identify regions where more COVID-19 cases will subsequently occur, allowing (regional) policy makers and healthcare professionals to affect changes to regulations earlier, and thus more effectively.
In this first descriptive study, our aim is to observe the associations between self-reported symptoms, social-distancing behavior, and self-reported COVID-19 infection by the app's users (i.e. criterion validity), and the associations between these variables and state-reported COVID-19 infections by the National Institute for Public Health and the Environment (i.e. external validation).

COVID radar app
The COVID radar app was released on the 2 nd of April 2020 following a short publicity campaign in the local and national media [11]. The app is free to download and allows for multiple user accounts from the same household on one smartphone. The app is not age-limited, meaning children are allowed to download and use the app. Over 85% of the households in COVID radar's user population with minors under 18 years of age are linked to an adult smartphone. Upon first use of the app, users are asked to provide informed consent to share the following information with the research institution as stipulated by the conditions of the European General Data Protection Regulation. Users may opt out by either removing the app or by requesting the data manager to remove all data collected from that individual. Users are asked to register by entering the four digits of their postal code, gender (Male/Female/Other/Not Specified), age category (0-5, 6-11, 12-18, 19-29, ten-year increments from 30-80 and a category for 80+), and occupation (healthcare, education, catering industry, or other occupation with high risk of close contact). Following the initial setup, users are asked to report their symptoms and behavior daily via a questionnaire. A push-reminder is sent every-other day to users reminding them to do so. Fig 1 shows screenshots of the app and S1 Table shows a list of the questions users are asked.

PLOS ONE
Each observation is comprised of questions assessing symptoms, social-distancing behavior, whether or not the user has been exposed to an individual with COVID-19 in the past 2 weeks, and a user's COVID-19 test history. The questions asked were periodically updated with the addition/removal dates of each question detailed in the online supplement. Via maps displayed within the app, users are presented with regional incidences of symptoms and personal feedback on their social-distancing behavior compared to regional and national means (Fig 1).
Data are transferred daily to a safe data environment within the Information Technology system of the LUMC (S2 Fig). Following importation of the daily data, we exclude observations from users who had requested to opt out, observations listing nonexistent postcodes, and double measurements within one user. Given users are asked if they have tested positive for SARS-CoV-2 within the past two weeks, we considered users SARS-CoV-2 positive/negative if they indicated a SARS-CoV-2 test result at least twice in the app, with the date of the first report used as day zero. More details on the development of the app, selection of the chosen questions, (external) data sources, and data cleaning is available in the supplement. Ethical approval was provided by the Medical Ethical Board of the LUMC (dossier number N20.067), which gave permission to refrain from obtaining consent from parents or guardians as data collection was anonymous. Only data on age category, profession and four digit postal code was collected rendering the data was untraceable to an individual.

Comparison of included/excluded observations
Following the data cleaning process detailed in the online supplement, we compared the available data in the excluded cohort with that of the included cohort. For each of the binary (symptom) variables collected by the app, we compared the proportion of excluded and included observations reporting this symptom. For each of the continuous social-distancing behavior variables, we compared the mean values for the included/excluded cohorts.

Descriptive statistics
To describe participant characteristics, we used histograms to explore age distributions of the app users, the number of times the app was used each day, and the number of times individual users used the app. We further compared age, gender, and profession for users ever having tested positive with those never having tested positive for SARS-CoV-2.

Validation testing
Given the eventual goal of the COVID Radar app is to predict emerging hotspots, we tested the expected associations between symptoms/behavior and SARS-CoV-2 test outcome. We used user-reported test results as our outcome measure for criterion validity testing and cases reported by the National Institute for Public Health and the Environment (RIVM) as our outcome measure for external validation [2].

Criterion validity
As a measure of criterion validity, we explored associations between the binary symptom variables (e.g., cough, sore throat, loss of smell/taste) and the continuous social-distancing behavior variables (e.g., number of house outside house, number of people within 1.5m) within the cohort of users ever reporting a SARS-CoV-2 test. For users within this ever-tested cohort, we used the date of the test as day 0 and observed the 21 days before and after the test. We calculated the daily mean or proportion for each variable for the entire user-cohort. We then calculated the difference between ever-positive or ever-negative users' reported values and the mean values for the entire user-cohort on that day. By comparing data from the same days, we eliminated bias introduced by variations in time due to the various lock-down measures implemented during the observation window, as well as seasonal effects on symptoms. The mean values and 95% confidence intervals for these differences were then plotted to show how the ever-positive and ever-negative cohorts compared to the cohort as a whole with regard to these variables in the days surrounding a test. Given the formulation of the question ("Have you tested positive/negative for SARS-CoV-2 in the past two weeks"), the date of the test cannot be determined for those answering this question in the 14 days following the implementation of the question about testing in the app. Given this and the fact that this analysis involved looking at the 14 days prior to a test, users reporting a SARS-CoV-2 test in the 14 days following implementation of the question about testing were not included in this analysis.

External validation
As a measure of external validation, we compared per-capita user-reported COVID-19 status among the 12 Dutch provinces with per-capita rates as reported by RIVM over the course of the pandemic [2]. Within each province, we plotted 7-day backward looking moving averages of the daily proportion of users reporting each symptom variable alongside the daily nationally reported COVID-19 case counts and the weekly proportions of users reporting each symptom variable alongside the number of Rhinovirus cultures reported by Dutch laboratories [12]. We further plotted daily means and 7-day backward looking moving averages of each social-distancing behavior variables and qualitatively observed how well they reflect nationally applied lockdown-measures and holidays.

Sensitivity analyses
We repeated the above-described analyses for (a) the cohort of users using the app an abovemedian number of times during the observation period, (b) the cohort excluding healthcare professionals, and (c) the cohort excluding inhabitants of the province 'Zuid-Holland', the home province of the LUMC where the app was created and users were most exposed to COVID Radar app-related media and advertisements. All statistical analyses were performed in STATA 16.1 (StataCorp, College Station, USA). STATA syntaxes for all analyses are provided in the online supplement.

Results
In the period 2 April, 2020 to 31 January, 2021 (305 days), the COVID Radar app was down-

Comparison of included/excluded observations
The data for the 102,445 (1.65%) excluded observations were fairly representative of the included observations' data in terms of symptoms and behavior. However, excluded observations were less often from a health professional and showed a slightly different age distribution (i.e. older age groups are over-represented in the excluded cohort) (see S2 Table).

Descriptive statistics
The age distribution of the app's users showed a fairly consistent distribution of users 18-69 years old, and an under-representation of young (<18) and old (>70) users. Female users were overrepresented compared to national figures (See S5 Fig). The number of observations (questionnaires answered) per day dropped from over 100,000 in the first week of the app to a steady-state of around 10,000 observations per day during the course of the observation window (2 April, 2020 to 31 January, 2021) (See Fig 2).
The effects of the push reminder sent every-other day to all users is seen in the periodicity in the number of observations between even and odd days. The number of daily observations was highest in the province Zuid-Holland, the home province of the LUMC where the app was conceived and advertised (see Fig 3).

Criterion validation
From a total of 278,523 unique users, 1,981 (0.71%) reported ever testing positive and 1214 (0.44%) negative for SARS-CoV-2. Ever-positive users were more likely to be women, older than 40 years of age, and healthcare professionals ( Table 1).
The proportion of users reporting the eight symptom variables increased beginning approximately 7 days prior to a positive test. This increase was smaller in the cohort of negative tested users (Fig 5a and 5b).
The continuous social-distancing behavior-based variables likewise showed above-mean values in this ever-positive cohort until approximately 7 days prior to a positive test, at which point they sharply decreased to remain below-mean in the week before and after a positive test. These fluctuations were not seen in users testing negative for SARS-CoV-2 (see Fig 6a and  6b).

External validation
As of early January 2021, almost one million cases of COVID-19 had been reported in the Netherlands by the National Institute for Public Health and the Environment (RIVM). The RIVM-reported daily case counts varied from 0 to over 13,000 cases per day. Positive SARS-CoV-2 tests reported in the COVID Radar app alongside the case count as reported by the RIVM for each province show that the association between these two is highest in provinces with a higher number of users, especially Zuid-Holland (Fig 7).
Symptoms and social-distancing behavior varied over time, with both showing a clear temporal association with RIVM-reported case counts over time (Figs 8 and 9).
Plotting the RIVM-reported number of reported positive cultures of Rhinovirus alongside our symptom data suggests variables 'fever', 'pain in the chest' and 'loss of smell' are associated with COVID-19 case count while variables 'coughing' and 'sore throat' correlated more closely with Rhinovirus cultures (Fig 10).
The daily mean number of people within 1.5 meters declined sharply around the middle of September, reflecting the national lockdown measures introduced, and showed peaks during national holidays (Fig 11).
The variable 'number of visitors' likewise showed peaks in the period around Christmas and New Year's Eve (Fig 12).

Sensitivity analyses
These analyses were repeated using (a) only users reporting an above median number of observations (referred to as 'faithful' users), (b) only users outside the province Zuid-Holland, and (c) only non-healthcare professionals. Differences in the results for these three sensitivity

PLOS ONE
analyses were minimal and none of the trends seen here were reversed (data shown in supplements).

Discussion
Since April 2020, the COVID Radar app has collected over 6 million user-provided questionnaires detailing COVID-related symptoms and social-distancing behaviors from over 275,000 unique users within the Netherlands. Symptom and behavior data were temporally associated with user-reported SARS-CoV-2 tests. A correlation between in-app reported case count and national-reported case counts was likewise seen, especially in provinces with high user-engagement. Social-distancing behavior variables showed the expected pattern in relation to national applied lockdown measures and holidays.

Criterion validity
Our qualitative (visual) association testing showed clear associations between both userreported symptoms and user-reported social-distancing behavior, and user-reported SARS-CoV-2 test results. While not here quantified, some variables (e.g. 'fever', 'pain in the chest' and 'loss of smell') were more closely associated with case-count than others (e.g. 'coughing' and 'sore throat'), which seemed as associated with Rhinovirus as with SARS-CoV-2. These associations are supported by prior research [13][14][15][16]. The pattern of social-distancing behaviors within the cohort of users who eventually report a positive SARS-CoV-2 test was particularly interesting. This cohort showed above-mean risk social-distancing behavior (e.g. more people within 1.5m, more visitors at home) between 20 and 10 days prior to a positive test (i.e. the period during which transmission likely occurred), at which point their social-distancing behavior quickly drops to a below-mean value as they became symptomatic and decided to be tested. The extent of above mean risk behavior was lower in users eventually testing negative.

External validity
Comparing COVID radar data to external data sources showed logical (temporal) associations in symptoms, social-distancing behavior, and test results. The strongest associations were observed in regions with high user-engagement. Given the symptoms tracked by the app are common both to SARS-CoV-2 and other respiratory tract infections, future efforts directed at prediction will need to correct for Rhinovirus and other viruses using viral surveillance data from Dutch laboratories. The extent and types of restrictions imposed on the Dutch population varied during the observation period and their effects were clearly visible in the social-distancing variables reported by users.
Comparison of excluded and included observations showed slight differences in age distribution but relative consistency in other variables. The small size of the excluded cohort minimized the risk of bias being introduced via this exclusion step. There was a large variance in the number of observations per user, with some users answering questionnaires daily while others filled in the app only once during the observation period. While it is reasonable to assume more faithful users may provide more accurate data, sensitivity analyses performed using data from users with an above-median number of app entries show no significant differences as compared to our primary analyses. The lack of a clear difference in the results when analyzing users of different engagement-levels suggests any bias introduced by differences in the reporting habits of these users was small.
There was an overrepresentation of users from the province Zuid-Holland in our data, due to Zuid-Holland being the home province of the LUMC, the hospital in charge of app design/ analysis. This also likely explains the over-representation of health care professionals, to whom the app was thoroughly advertised within the environment of the LUMC. Despite this overrepresentation, our sensitivity analyses excluding Zuid-Holland users and healthcare professionals showed similar results, suggesting any bias introduced by their overrepresentation is minimal. COVID radar users were more often female and middle-aged. This was due to the overrepresentation of healthcare workers (who were more often female and mid-aged). However the sensitivity analysis excluding healthcare workers resulted in no different conclusions.
Noteworthy too is the fact that fully 30% of those users reporting a positive SARS-CoV-2 test reported no symptoms on the day of the positive test (data shown in S8 Fig). This is in line with the estimated number of COVID-19 carriers without symptoms, as reported by other studies [6,7]. Our analysis likewise showed loss of smell and cough may continue for weeks following the positive SARS-CoV-2 test, as also confirmed in previous studies [17].

Limitations
All data in the app was self-reported and thus subjected to differences in personal interpretation of the questions. However, we do not expect differential misclassification as we see logical trends in symptoms and behavior on both individual and national levels. State-reported case counts were those reported by RIVM, whose data should include tests performed in private practices as they are required to be forwarded to RIVM. However, as there is no oversight for this process, the RIVM-reported case-counts likely represent under-estimates of the number of confirmed cases [18].
COVID radar additionally provided direct feedback to users on how their symptoms and behavior compared to their peers which likely has an effect on user behavior. This may bias the generalizability of COVID radar data, especially behavioral data. The effect of this feedback loop on users' behavior would be expected to lead to an overly conservative estimate of the behavior of the population. Despite this, expected changes in reported behavior in the periods following national holidays and changes to social distancing policies are observed in COVID radar data. Additionally, altered behavior due to app-feedback would be expected to be observed in more loyal users of the app. Our sensitivity analysis on loyal users showed no significant difference in reported behavior. Given these realities, while we accept that app feedback altering user behavior has the potential to bias our results, we feel any bias introduced has been shown here to be small.
Testing capacity in the Netherlands was low during the developmental stage of the app and has increased during the study period. In the final months of 2020, testing was expanded to include those without symptoms. As a result, the prevalence of COVID-19 in the Netherlands could be underestimated. Because of this change in testing policy, the question regarding negative tests was implemented at a later date resulting in less data about negative tests compared to data on positive tests during a shorter period of time. Nonetheless, we were able to show that the association between symptoms and a negative test is less apparent than their associations with a positive test, suggesting our conclusions remain valid. Also, testing of the underaged (<12 years) was rare during the study period, resulting in a relatively old SARS-CoV-2 positive cohort in this study.

Future implications
Having validated the expected associations between symptoms, social-distancing behavior, and COVID case-count, our next steps will involve attempted prediction of emerging hotspots by combining symptom and social-distancing behavior data to quantify risk of COVID-19 cases. Such predictions could be used to help guide COVID-19 policy. Our study indicated the quality of the submitted data is best where user-engagement is high. Prediction-based goals will thus be aided by increasing user count. Regional predictions may additionally be improved through incorporation of data from general practitioners, more detailed demographic data, and mobility data using a machine learning based approach. Another possibility for further research is testing of associations between regional SARS-CoV-2 cases, symptoms, behavior and other regional data related, for example, to the physical environment.

Conclusion
The COVID Radar app successfully collects anonymous, user-reported data on COVID-19-related symptoms and social-distancing behavior. Initial validation showed symptoms and behavior reported within the app are correlated with in-app reporting of a SARS-CoV-2 test. The predictive potential of the COVID Radar is demonstrated as external validation showed in-app reported positive SARS-CoV-2 tests track well with state-reported case counts. Future research will focus on regional predictions using these data.