Predictors of incident viral symptoms ascertained in the era of COVID-19

Background In the absence of universal testing, effective therapies, or vaccines, identifying risk factors for viral infection, particularly readily modifiable exposures and behaviors, is required to identify effective strategies against viral infection and transmission. Methods We conducted a world-wide mobile application-based prospective cohort study available to English speaking adults with a smartphone. We collected self-reported characteristics, exposures, and behaviors, as well as smartphone-based geolocation data. Our main outcome was incident symptoms of viral infection, defined as fevers and chills plus one other symptom previously shown to occur with SARS-CoV-2 infection, determined by daily surveys. Findings Among 14, 335 participants residing in all 50 US states and 93 different countries followed for a median 21 days (IQR 10–26 days), 424 (3%) developed incident viral symptoms. In pooled multivariable logistic regression models, female biological sex (odds ratio [OR] 1.75, 95% CI 1.39–2.20, p<0.001), anemia (OR 1.45, 95% CI 1.16–1.81, p = 0.001), hypertension (OR 1.35, 95% CI 1.08–1.68, p = 0.007), cigarette smoking in the last 30 days (OR 1.86, 95% CI 1.35–2.55, p<0.001), any viral symptoms among household members 6–12 days prior (OR 2.06, 95% CI 1.67–2.55, p<0.001), and the maximum number of individuals the participant interacted with within 6 feet in the past 6–12 days (OR 1.15, 95% CI 1.06–1.25, p<0.001) were each associated with a higher risk of developing viral symptoms. Conversely, a higher subjective social status (OR 0.87, 95% CI 0.83–0.93, p<0.001), at least weekly exercise (OR 0.57, 95% CI 0.47–0.70, p<0.001), and sanitizing one’s phone (OR 0.79, 95% CI 0.63–0.99, p = 0.037) were each associated with a lower risk of developing viral symptoms. Interpretation While several immutable characteristics were associated with the risk of developing viral symptoms, multiple immediately modifiable exposures and habits that influence risk were also observed, potentially identifying readily accessible strategies to mitigate risk in the COVID-19 era.


Introduction
The global SARS-CoV-2 pandemic has affected communities in every habitable continent and every state in the US. Given what is generally known about respiratory viruses, strategies to mitigate transmission have included government orders to practice regular hand hygiene, physical distancing, including the closures of public locations commonly associated with community gatherings, and more recently to wear masks [1][2][3] Studies thus far have largely focused on comparisons among those seeking medical care for the disease [4][5][6][7] or evaluations of large administrative datasets [8]. It can be difficult to track individual-level characteristics and behaviors, particularly as they are dynamic and changing over time, as they relate to incident disease. Members of the public may benefit by understanding strategies under their direct control that may influence their own risk of infection and viral transmission.
Tracking viral infection is hindered by the absence of universal and repeated testing. In the absence of such testing, recent evidence suggests that symptoms themselves may be useful markers of SARS-CoV-2 infection [9]. While the virus may be asymptomatic, a variety of symptom clusters associated with the disease have been identified, often including fever, but ranging from typical respiratory symptoms to gastrointestinal afflictions to somewhat idiosyncratic findings such as anosmia/ageusia and conjunctivitis [10][11][12][13][14][15][16][17][18]. In the past, ascertainment of viral symptoms has relied on assessments of those seeking medical care or retrospective surveys that may be prone to recall bias. Given the current near-ubiquity of smartphones and use of related mobile apps, technology is now available to regularly and repeatedly query large numbers of individuals over time, providing access to symptom development as it arises. Although monitoring for viral symptoms may be neither sufficiently sensitive nor specific for SARS-CoV-2 infection, these outcomes are, by their nature, inherently experienced by the individual, potentially providing valuable information that may be best leveraged by modern mobile technology.
We sought to use prospectively collected information about exposures and modifiable behaviors, along with daily symptom reporting, to identify risk factors for incident viral symptoms using a globally-available, smartphone mobile application-based study, the COVID-19 Citizen Science Study. maps of symptom clusters, and location of participants around the world can be found at https://covid19.eurekaplatform.org/. All participation is remote, without geographic restriction. Verification of cell phone numbers via text was required before proceeding from study registration to remote-based study consent and subsequent study participation. Retention strategies include daily notifications, data visualizations, and intermittent study update blog posts. Our Citizen Scientist participants contribute study question ideas that are then included into the study and reported to participants as participant-generated.
Participants complete surveys written in lay language meeting Flesch-Kincaid criteria for an 8 th grade reading level (https://readabilityformulas.com). At baseline, surveys collected information about demographics, education, occupation, SARS-CoV-2 status (referred to in surveys as "the novel coronavirus, the virus that causes COVID-19"), behaviors, living conditions, attitudes regarding the SARS-CoV-2 pandemic, local government restrictions related to the disease, medical conditions, and medications (the surveys are included in the S1 File). Questionnaires included pre-tested survey instruments employed in multiple Health eHeart Study and Eureka Digital Research Platform studies for demographics and past medical conditions; however, as the study was launched during the beginning of the SARS-CoV-2 pandemic, questionnaires specific to related risk factors and viral symptoms were new. Perceived socioeconomic status was assessed using the MacArthur subjective social status ladder [19,20]. All participants received an optional invitation to share their smartphone-based geolocation data.
Participants receive a daily survey, timed to occur synchronously to their local same time of day when they engaged with the first baseline survey, via mobile application-based push notification. The daily survey includes queries about current viral symptoms, updated according to new information, using "check all that apply" including: "A scratchy throat"; "A cough (worse than usual if you have a baseline cough)"; "A painful sore throat"; "A temperature greater than 100.4˚F or 38.0˚C"; "A runny nose"; "Symptoms of fever or chills"; "Muscle aches (worse than usual if you have baseline muscle aches)"; "Shortness of breath"; "Nausea, vomiting or diarrhea" (added March 30, 2020); "Unable to taste or smell" (added March 31, 2020); "Red or painful eyes" (added April 13, 2020); or "none of the above." The daily survey then includes questions regarding current symptoms among household contacts and the number of individuals outside the household the participant interacted with within six feet (about 1.83 meters) in the previous 24 hours.
Participants received weekly surveys to update information regarding sleep, exercise, hand hygiene, social and physical distancing behaviors, habits such alcohol consumption, and SARS-CoV-2 infection status. All surveys remained open for 24 hours.
For those that consented to geolocation tracking, smartphone-based geolocation using a combination of the Global Positioning System and cell phone tower triangulation was collected every 5 minutes for Android phones and whenever the phone accelerometer exhibited movement (in order to minimize battery drain) for iOS smartphones. Geolocation latitude and longitude coordinates were clustered within an individual for every day of the study using the HDBScan clustering algorithm [21,22]. The most prevalent cluster of geolocation coordinates for a user was defined as "home." Time spent at a cluster was calculated as the time difference between the current location cluster and any future cluster change. Daily time spent at home was calculated as the time spent at the cluster identified as "home" divided by the time between the first and last coordinate collected daily. In addition, distance travelled was calculated as the sum of the successive distances between all consecutive coordinates collected within a user on a daily basis. Long-distance travel was defined as movement of at least 1,000 kilometers within 24 hours.
Occupation was dichotomized into healthcare workers versus not; sleep was determined as the average number of hours per day over each week; exercise, defined as physical activity for at least 20 minutes that resulted in breathing heavily or to "break a sweat," was dichotomized into more or less than once weekly; alcohol was assessed as average daily standard drinks; cigarette, e-cigarette, and marijuana use were dichotomized into any use in the last 30 days versus not; household symptoms were dichotomized into any versus none in the previous 6-12 days; and the maximum number of contacts within six feet (about 1.83 meters) reported in the previous 6-12 days were derived from the daily survey responses. The lag time of 6-12 days was employed to allow for the incubation period of SARS-CoV-2 [23] and other common respiratory viruses [24] and in order to accommodate days without responses and the expectation that viral symptoms would last several days.
For the current analyses, all participants reporting a previous positive test for SARS-CoV-2 and those with any symptoms upon entry to the study were excluded. Those with baseline medical conditions that might themselves contribute to the symptoms of interest, including atrial fibrillation, coronary artery disease, congestive heart failure, chronic obstructive pulmonary disease, and asthma, were excluded. Two sensitivity analyses were conducted: one also excluded all participants reporting anemia; a second included participants with atrial fibrillation, coronary artery disease, congestive heart failure, chronic obstructive pulmonary disease, and asthma. Based on the daily surveys, incident viral symptoms were defined as the first report of a combination of fever or chills plus at least one other symptom on the same day. The absence of completion of a daily survey was assumed to represent an absence of symptoms in statistical analyses. Given the protean manifestations of COVID-19 [5,[10][11][12][13][14][15][16][17][18]25], we allowed for any of these viral symptoms, requiring fever given that each of those symptoms could possibly occur in the absence of a viral infection. Follow-up for the current study ended May 3, 2020. The study was approved by the University of California, San Francisco Institutional Review Board. All participants provided informed electronic consent.

Statistical analyses
Normally distributed continuous variables are presented as means ± SD and compared using t-tests, where continuous variables with skewed distributions are presented as medians with interquartile ranges (IQR) and compared using Wilcoxon rank sum tests. Categorical variables were compared using chi-squared tests. Pooled logistic regression models were used to identify factors associated with incident symptoms, potentially including baseline characteristics (demographics, medical conditions, habits, and behaviors related to viral infection risk such as hand hygiene), and time-updated information from daily and weekly surveys. Exposures that expected to influence the risk of viral infection that would then manifest as future symptoms several days later were evaluated using survey data for 6-12 days earlier. Consequently, only participants with at least one daily survey at least 6 days after the first were included in the pooled logistic regression models. Beginning with the subset of variables associated with incident symptoms at p < 0.1 in pooled logistic regression models adjusting only for age, sex, race, and calendar date (with linear and non-linear components), backward deletion was used to select multivariable models retaining covariates with p < 0.05. Statistical analyses were performed using Stata, version 16 (College Station, TX). Two-tailed p-values < 0.05 were considered statistically significant.

Results
After exclusions were applied, 14,335 participants were available and contributed to the incident analyses. Differences between these participants and those that entered the study reporting at least one viral symptom are shown in Table 1. Participants resided in all 50 states and in 93 countries outside the US. While a mean 42% ± 12% of all participants completed the daily survey each day, 95%-100% of all participants completed at least one daily survey per week throughout the study period, and weekly surveys were completed 66 ± 26% of the time (S1 Table).

PLOS ONE
In minimally adjusted logistic models adjusting only for age, sex, race, ethnicity, and date, a higher level of education and subjective social scale, exercising at least once weekly, a longer average sleep duration, and sanitizing one's phone were each associated with a lower risk while a history of anemia, hypertension or some immunodeficiency, cigarette smoking, e-cigarette use, marijuana use, having pets at home, having household members with viral symptoms, and the number of individuals with which the participant interacted with within six feet (about 1.83 meters) each predicted a higher risk of incident viral symptoms (Table 2). Pertinent characteristics that failed to exhibit statistically significant relationships included HIV status, hand washing practices, reported government restrictions, and, per geolocation measurements, amount of time at home and daily distance traveled. In the backwards stepwise logistic model, the following were retained: a higher level of subjective social status, exercise, and sanitizing one's phone were each associated with a significantly lower risk of developing viral symptoms, whereas female sex, a history of anemia, hypertension, recent cigarette smoking, recent household contacts with viral symptoms, and the maximum number of individuals recently in contact with the participant within six feet (about 1.83 meters) were each associated with a significantly heightened risk of developing viral symptoms (Table 3).
In sensitivity analyses, excluding all participants with anemia did not meaningful change the results (S3 Table). After including participants with atrial fibrillation, coronary artery disease, congestive heart failure, chronic obstructive pulmonary disease, and asthma: those with congestive heart failure and anemia exhibited a higher risk for incident symptoms, and all of the previously observed statistically significant relationships remained (S4 Table).

Discussion
Among an international cohort involving collection of prospective and time-updated data, female sex, anemia, hypertension, recent cigarette smoking, living with someone with viral symptoms, and the maximum number of recent contacts within six feet (about 1.83 meters) outside the home each predicted a higher risk of developing viral symptoms. Conversely, a higher self-perceived social status, regular exercise, and sanitizing one's phone were each associated with a lower risk of subsequently reporting viral symptoms.
As of July 25, 2020, there were more than 15 million confirmed cases of SARS-CoV-2 infections and more than 640,000 COVID-19-related deaths around the world [26]. In response, tremendous investment and efforts are being dedicated to enhance the availability of testing [27], identify effective therapies [28], and ultimately to develop a vaccine [29]. Although these remedies are being pursued at an unprecedented pace, the number of infections and deaths continues to grow, and, even after these new technologies, drugs, and vaccines are developed, additional time will be required to disseminate them. While studies of hospitalized patients are valuable, ultimately the characteristics, behaviors, and exposures of individuals in the general community associated with the development of viral symptoms can be helpful in several ways: to identify those at highest risk of developing these symptoms, which may help prioritize protecting the most vulnerable; to provide novel insights regarding the biology of current viral disease; and ideally to identify low-risk and modifiable behaviors that individuals might practice or avoid to reduce their own individual-level risk.
Clearly, self-reported symptoms of a viral infection do not equal SARS-CoV-2 infection. However, the specific viral symptoms repeatedly queried in our population were developed based on available evidence regarding the nature of SARS-CoV-2 infection [10][11][12][13][14][15][16][17][18], and similar viral diseases, including the common cold and influenza for example, very likely share common properties related to an individual's susceptibility to infection [30]. In addition, independent of commonalities across easily transmissible viral diseases, there are shared Somewhat less n/a n/a, n/a n/a Much less n/a n/a, n/a n/a  phenomena related to the human immune system's general vulnerability to viral infection [31]. Finally, due to the lack of universal testing, there is some evidence that surveillance for viral symptoms may itself have several advantages, often reflecting, either directly or indirectly, underlying COVID-19 disease [9,32,33]. The higher incidence of viral symptoms among women in our cohort runs counter to the prevailing evidence that men are at higher risk of both SARS-CoV-2 infection and related morbidity and mortality [34,35]. Indeed, population-based studies examining seropositivity for antibodies against SARS-CoV-2 have generally failed to demonstrate any differences by biological sex [36,37]. While this may suggest the viral symptoms detected by the current study fail to capture patterns most relevant to SARS-CoV-2, these differences may have occurred because our models adjusted for related mediators (such as smoking) [38] or because women are either more likely to experience or report more mild symptoms. Previous communitybased studies often adjust or weight for sex distributions based on the population, which can often hinder a direct assessment of biological sex as a predictor itself [33,39]. Anemia and hypertension as risk factors are consistent with the general notion that other systemic comorbidities enhance the susceptibility to viral infection. While anemia may be a marker of general, non-specific disease, hypertension, not generally considered a risk factor for infectious diseases, has emerged as a consistent predictor of SARS-CoV-2 infection and associated complications [4][5][6][7]. The reasons for this are unclear, although angiotensin-converting-enzyme-2 (ACE2)-dependent cellular entry of the virus has been posited as a biologically plausible mechanism, assuming some connection with ACE2-dysregulation that is also associated with hypertension [40,41]. While optimizing blood pressure control reduces overall morbidity and mortality [42], the consequences on incident viral infection have not yet been fully elucidated.
Three at least theoretically readily modifiable exposures, each bolstered by biological plausibility and previous evidence, arose as risk factors for incident viral infection: smoking, household contacts, and the maximum number of recent interactions with other individuals within six feet (about 1.83 meters). The impact of smoking on the risk of SARS-CoV-2 has been difficult to study, as reliance on hospitalization data fails to provide a foundational study base to make comparisons. Conflicting data on the subject exists, with some evidence that smokers may have a lower risk of SARS-CoV-2 infection and other evidence they may experience a higher risk [43]-the great majority of these studies rely on data among those already infected rather than providing information from cohort studies that include a geographically heterogenous cohort such as ours. Smoking may reduce the effectiveness of the immune response and may also upregulate ACE2, rendering individuals more prone to infection [44]. Or this observation may be related to an overall greater propensity to the symptoms of interest in the current study, rather than SARS-CoV-2 infection per se. The observation that sick household contacts predicted incident symptoms may provide evidence that these symptoms were in fact often due to a transmissible disease. For example, although the current analyses excluded those with prevalent symptoms (which reduces the chance symptoms arose from some chronic, ever-present, problem), shared symptoms within a household may have represented some common exposure or predisposition, such as an allergy-however, household symptoms preceding participant symptoms arose as a statistically significant predictor of incident symptoms, supporting viral infections as a culprit. Although physical distancing as a method to mitigate spread of infection is supported by the general understanding of the nature of infectious diseases, particularly respiratory viruses, our observation from prospective, repeatedly updated, individual-level data that the number of human to human physical interactions predicted viral symptoms may provide useful evidence in support of physical distancing.
Protective factors included a higher subjective social status, at least weekly exercise, and sanitizing one's phone. We utilized the MacArthur subjective social status ladder as a validated single-item question to capture socioeconomic status [19,20]. A higher self-perceived social status may influence viral infection risk in several ways: more education may translate into a better understanding of disease risks and healthy behaviors, and employment among those of a higher socioeconomic status may be more flexible and less often involve high-risk environments. Conversely, stress and the allostatic load related to social determinants of health among those with a lower subjective social status may adversely affect the immune response to infection [45]. Regular exercise is an established means to improve immune function and the response to viral infection [46], now with evidence for beneficial effects specifically in the COVID-19 era. We recognize that sanitizing one's phone as a protective factor may simply serve as a marker of more fastidious behaviors to minimize risk in general, but the ubiquity and frequent use of the smartphone, likely while shopping, while at work, and while interacting with others, would seem to make it a potentially potent fomite that could result in repeated exposure throughout the day and into the home.
Although we examined multiple predictors of incident viral symptoms, we do not believe that adjustment for multiple hypothesis testing would be appropriate for several reasons. First, all of our predictors had biological plausibility. Second, all covariates were adjusted for oneanother in our multivariable models. Third, beyond such mutual adjustment, we employed a backward stepwise elimination of covariates to only retain those achieving our prespecified statistical significance. Of note, the majority of statistically significant findings did exhibit particularly small p values that would have withstood even the most conservative multiple hypothesis testing adjustment (such as Bonferroni), but, for the reasons described above, we do not believe this would be appropriate as it would risk reclassifying true positives as false positives. These are also standard and well-accepted approaches to this sort of analysis [24].
Our study has several important limitations. The outcome of interest was viral symptoms, which relied on self-report. These findings therefore do not directly reflect any particular disease, including infection with SARS-CoV-2. Indeed, while we selected fever as a required symptom in hopes of capturing infection and at least one other symptom known to be associated with SARS-CoV-2 to help with specificity, the symptoms described are more applicable to respiratory viruses in general and could also occur outside the realm of infection. Importantly, we were able to leverage the prospective nature of our study with repeated assessments and exclude those with prevalent symptoms at baseline, which should help mitigate against contamination by those with chronic and non-infectious conditions. Some investigators have proposed and validated the creation of symptom-based scores for the accurate inference of COVID-19 [47], and future similar efforts may be able to harness the relative accessibility of patient or research participant self-report without reliance on biological assays. Multiple studies have sought to identify particular symptoms or combinations of symptoms most indicative of SARS-CoV-2 infection [14,25,32,33,47]-while the symptoms we employed generally match those associated with the disease in these studies, variability in populations and study design currently preclude the identification of an optimal approach to strike the intended balance regarding sensitivity versus specificity in such longitudinal cohort studies. Indeed, there is evidence the evolving nature of COVID-19 may produce variable manifestations over time, potentially hindering attempts to accurately characterize a true diagnosis by specific symptoms alone [48]. It is important to emphasize that we incorporated information into the study as it arose from the medical literature, such as (as described in the methods) gastrointestinal symptoms, loss of taste and smell, and eye symptoms, soon after they were reported in reputable sources-therefore, not every symptoms was assessed at baseline, which could have led to under-ascertainment of SARS-CoV-2-related symptoms in some. However, those updates were incorporated into both baseline and daily surveys for all participants as they arose. As the study required smartphone use, it is possible our population represents a more technically savvy and perhaps more highly educated and affluent group than the general population. However, this would primarily limit generalizability and should not serve as a threat to internal validity. We assumed that the absence of a completed daily survey represented an absence of symptoms, which may have resulted in a loss of power to detect some relationships but addressed missing data in a fashion that did not risk creating spurious false positive associations. In addition, the participants were fairly geographically diverse, representing every state in the US and multiple countries. Although less than 80% of the study participants were non-Hispanic white, African American representation was relatively poor. Finally, although the data were collected prospectively and in a time-updated fashion, the study was observational, prone to residual and unmeasured confounding that should temper assumptions of causal effects.
In conclusion, female sex, anemia, hypertension, recent cigarette smoking, living with someone with recent viral symptoms, and the maximum number of recent contacts within six feet (about 1.83 meters) outside the home each predicted a higher risk of developing viral symptoms during the current COVID-19 pandemic. At the same time, a higher subjective social status, regular exercise, and sanitizing one's phone each predicted a lower risk of developing viral symptoms.  Table. Sensitivity analysis of independent predictors of incident symptoms including participants with atrial fibrillation, coronary artery disease, congestive heart failure, chronic obstructive pulmonary disease, and asthma. Derived backwards stepwise elimination of covariates (see Methods). � overall heterogeneity. † heterogeneity of non-reference levels. # linear trend.