Figures
Abstract
Background
Some of those infected with SARS-CoV-2 suffer from post-COVID syndrome (PCS). However, an uniform definition of PCS is lacking, causing uncertainty about the prevalence and nature of this syndrome. We aimed to improve understanding of PCS by operationalizing different classifications and to explore clinical subtypes.
Methods
We used data from Nivel Primary Care database from 2019–2020 which consists of electronic health records (EHR) from general practices (GPs) combined with sociodemographic data for n = 10,313 individuals infected with the SARS-CoV-2. In addition, data from n = 276 individuals who had been infected with the SARS-CoV-2 in 2021, collected via a longitudinal survey, was used. In the GP-EHR data, we operationalized two classifications of PCS (based on symptoms and diagnosis recorded in GP-EHR data and healthcare utilization 3–12 months after acute infection) to calculate frequency and characteristics and compared this to the survey results. In a subgroup of the EHR data we conducted community detection analyses to explore clinical subtypes of PCS.
Results
The frequency of PCS was 15% with on average 4.6 symptoms for which the GP was consulted using the narrow definition and 32% with on average 6.8 symptoms for the broad definition. Across all methods and classifications, the mean age of individuals with PCS was around 53 years and they were more often female. There were small sex differences in the type of symptoms and overall symptoms were persistent for 6 months. The community detection analysis revealed three possible clinical subtypes.
Discussion
We showed that frequency rates of PCS differ between methods and data sources, but characteristics of the affected individuals are relatively stable. Overall, PCS is a heterogeneous syndrome affecting a substantial group of individuals who need adequate care. Future studies should focus on care trajectories and qualitative measures such as quality of life of individuals living with PCS.
Citation: Bos I, Bosman L, van den Hoek R, van Waarden W, Berends MS, Homburg MS, et al. (2025) Comparison of observational methods to identify and characterize post-COVID syndrome in the Netherlands using electronic health records and questionnaires. PLoS ONE 20(1): e0318272. https://doi.org/10.1371/journal.pone.0318272
Editor: Dong Keon Yon, Kyung Hee University School of Medicine, REPUBLIC OF KOREA
Received: July 18, 2024; Accepted: January 13, 2025; Published: January 29, 2025
Copyright: © 2025 Bos et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data requests for usage of data from Nivel Primary Care Database or the Corona Survey Cohort may be submitted to the applicable governance bodies via gegevensaanvragen@nivel.nl. In this manuscript data from Nivel Primary Care Database was linked to data available at Statistics Netherlands and we used the Microdata Platform of Statistics Netherlands as a secure research environment. The linked database is only available at the Microdataplatform of Statistics Netherlands and can be made accessible if certain conditions are fulfilled.
Funding: This study was performed as part of the Long COVID MM project which is financed by ZonMw (https://www.zonmw.nl/en) under project number 10430302110004. The following authors received funding under this grant: IB, TOH, JM, LP and KH. The funder did not play a role in the study design, data collection, data analysis, decision to publish or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Already in the first year of the global corona pandemic it became clear that a substantial part of the individuals infected with SARS-CoV-2 suffer from persistent symptoms, also called ‘Post-COVID syndrome’ (PCS) [1]. Although literature about PCS—or other related terminology like ‘Long COVID’ and ‘Post-acute sequelae of COVID-19’- is now growing rapidly [2–6], there is still much debate about the prevalence and characteristics of people affected by PCS which is partly due to a lack of uniform definition [7]. The most commonly used definitions for PCS are those published between 2020 and 2021 by the World Health Organization (WHO) [8], the UK National Institute for Health and Care Excellence (NICE) [9] and the US Centers for Disease Control and Prevention (CDC) [10]. These three definitions already have discrepancies between them regarding the included symptoms and regarding the starting point of PCS: 4 weeks or 3 months after the infection. In addition, Chaichana and colleagues recently showed that of all the 295 studies conducted on PCS, more than 65% of the studies conducted used another definition than the three definitions listed above or no definition at all [7]. It is unclear what the direct impact of this heterogeneity in definitions among studies is on the outcomes like prevalence and characteristics of individuals with PCS.
Prevalence estimates reported so far on PCS vary widely between studies ranging from 13–80% [5, 6, 11, 12]. Besides the heterogeneity in used definitions the wide variety in reported prevalence estimates is also at least partly due to variations in the investigated population (e.g. hospitalized individuals due to the corona virus only vs. nationwide or vaccinated vs. non-vaccinated) [11, 13, 14]. Moreover, various studies are solely based on self-reported symptoms via questionnaires [6, 15, 16] and many also lack the ability to compare to control groups without persistent symptoms or without a SARS-COV-2 infection, leaving them unable to identify specific PCS symptoms and associations with for instance demographic factors [17, 18]. Altogether this makes it difficult to disentangle what PCS really is, what the impact of the syndrome is on societies and most importantly it hampers development and optimization of treatment strategies. In addition, besides clear diagnostic guidelines it would also aid to have more clarity on which symptoms often occur simultaneously in which type of individuals, also called clinical subtypes. Insights in clinical subtypes of PCS are useful in order to personalize clinical management strategies. There have been a few studies trying to identify clinical subtypes of PCS [19, 20] but these point in different directions so further research and validation is needed.
In this study, we therefore aimed to examine the direct impact of using different classifications and data sources on the estimated frequency of PCS and the characteristics of individuals suffering from it. By comparing patient reported outcomes measures (PROMs) to real world data from electronic health records (EHRs) we are able to demonstrate the differences in outcomes as every method and data source has its challenges, biases and advantages which should be taken into account [21–23]. With this study we aim to provide crucial novel insights into the differences which are needed for future research into PCS but also epidemiological research in general. In addition, operationalization of different classification is also useful for clinical practice as it could aid to further specify diagnostic guidelines. In addition, we aimed to explore whether we can identify clinical subtypes of PCS in routine healthcare data from electronic health records (EHR) using network analysis. To that extend we formulated the following research questions: 1) What is the impact of using different classifications and data sources on the frequency and characteristics of PCS?; and 2) Which clinical subtypes of PCS can be identified using routine healthcare data?
Methods
The current study is part of the Long COVID MM (Long COVID Mixed Methods) project in which various methods are combined to provide insight into post-COVID syndrome. Meta-data regarding this project can be found in the Health-RI COVID-19 data portal (https://covid19initiatives.health-ri.nl).
Data sources
GP-EHR database.
This database consists of electronic health records (EHR) from general practitioners (GP) and GP out-of-hours-services (OOH-services) combined on individual-level with demographic and socio-economic data. The EHR data from GPs and GP out-of-hours-services was obtained in November 2021 via Nivel Primary Care Data base (Nivel-PCD; approved under number NZR-00321.052) which uses an opt-out system permitted under the Dutch Medical Treatment Contracts Act (WGBO). Nivel-PCD covers about 10% of the Dutch population and the patients included form a representative sample of the population regarding age and sex [24]. General practices included in Nivel-PCD are located in all provinces of the Netherlands and the variation in urbanization level (scale 1–5) among practices is fairly similar as to all Dutch GP practices [24]. The GP-EHR data includes information on age, sex, prescriptions (coded via Anatomical Therapeutic Chemical classification, ATC), contacts, referrals, lab results and diagnosis or symptoms (International Classification of Primary Care-1, ICPC-1, coded). The OOH-services data include: contacts with diagnosis or symptoms (ICPC-1 coded), prescriptions (ATC-coded) and triage registration (ICPC-1 coded). All EHR data was pseudonymized and linked on individual-level by a trusted third party (ZorgTTP). The GP-EHR data was uploaded to the data platform of Statistics Netherlands and combined with demographic and socio-economic data collected at Statistic Netherlands including: age, sex, migration background, education level, household income and mortality data (date and cause of death). For the current study we used data from Nivel-PCD and Statistics Netherlands from 2019 and 2020 for n = 958,739 individuals in total. A flowchart of the included study population can be found in Supplementary Material (S1 Fig).
Corona Survey Cohort.
As described elsewhere [25], this population-based cohort was initiated in May 2020 to study the long term effects of SARS-COV-2 infection and is based within Nivel-PCD. As part of the Long COVID MM study the initial cohort was extended with more participants and an additional follow-up survey. In short, n = 1851 individuals who had been flagged in their electronic patient file as having SARS-COV-2 infection by their GP (ICPC code R83.03) were invited to participate between January and September 2021 in this study by their general practitioner (GP). Individuals were sent four surveys: direct after inclusion and after 3, 6 and 12 months. The surveys contained questions on symptoms, used health care and care experiences, quality of life, ability to work, vaccination status and selfcare. All participants signed informed consent forms allowing researchers to link the survey data to their GP EHR. This enables the unique opportunity to combine the survey data with EHR data on morbidities and prescriptions for this specific group. More details about the Corona Survey Cohort are described in a previous publication [25] and S1 Fig shows a flowchart of the included population.
Ethical approval.
This project was conducted according to the Declaration of Helsinki and ethical approval was obtained from the medical ethics committee (METc) from the VU University Medical center Amsterdam for the longitudinal questionnaire component (METc protocol number 2020.0709) and from the METc of the University Medical Center Groningen for the electronic health records component (METc protocol number 2021/473). Conditions are fulfilled under which the use of electronic health records for research purposes in the Netherlands is allowed. Under these conditions, neither informed consent from study subjects nor approval by a medical ethics committee is obligatory for this type of observational studies, containing no directly identifiable data (art. 24 GDPR Implementation Act jo art. 9.2 sub j GDPR). All participants of the Corona Survey Cohort were adults (older than 18 years of age) and gave written informed consent before starting the survey and could additionally provide written informed consent for linkage of EHR data to questionnaire data.
Classifications of post-COVID syndrome
GP-EHR database.
In the GP-EHR database we conducted the following analyses to explore manners to operationalize different PCS classifications using routine healthcare data. First we selected individuals with SARS-CoV-2 infection (n = 10,313) based on the EHR data from Nivel-PCD (ICPC code R83.03) registered by their GPs directly, or who had been identified via a developed algorithm selecting patients based on symptoms and episode titles between April and June 2020 [26].
Each individual with SARS-CoV-2 infection was matched to four control individuals withoutSARS-CoV-2 infection. Matching to control patients was only used for the operationalization of the classifications. Controls were similar to individuals with SARS-CoV-2 infection in age and sex and were followed over the same period in the data as the individuals with SARS-CoV-2 infection to adjust for seasonal or circumstantial effects like lockdowns due to COVID-19 pandemic. Characteristics of the matched control group can be found in S1 Table. To validate the matching, we compared the matched control group to the total group with SARS-CoV-2 infection using t-tests for continuous outcome variables t-tests and Chi-square test for dichotomous variables (all p-values were > 0.05). To create a list of symptoms related to PCS, which is needed for the definition of PCS, we compared the ICPC codes recorded in the SARS-CoV-2 infection group, 3–12 months after infection, to the ICPC codes recorded in the same individuals a year before infection and to the ICPC codes recorded in the control group. Similar to the WHO definition we choose 3 months after SARS-CoV-2 infection as the cut-off point between acute SARS-CoV-2symptoms and PCS [8].
We created symptom lists following several steps. First, we ranked the ICPC codes by prevalence in the SARS-CoV-2 infection group nine months before and 3–12 months after infection. Thereafter, we calculated the difference in prevalence before and after infection and ranked these differences. We used the top 30 list of ICPC codes of which the prevalence was increased most after infection. We then excluded ICPC codes that had also increased in the matched control group. This list was reviewed by four GP-researchers to exclude symptoms that were unlikely to be related to PCS. A total of n = 25 symptoms were included in the ‘data-derived list’. In addition, we compiled a list of symptoms published by the WHO [11] and expanded this with symptoms reported by participants of the Corona Survey Cohort and a panel of 8 patients (age range: 32–75 years, 4 males and 4 females) who provide advice and feedback during the project. This ‘patient reported list’ included a total number of n = 37 symptoms. Furthermore, a GP (MH) and a medical microbiologist (MB) independently reviewed the entire list of ICPC codes for symptoms that could be related to acute SARS-CoV-2 infection and possibly also to PCS. This ‘clinicians (acute) SARS-CoV-2 infection list’ included n = 30 possible symptoms. We compared the symptoms on these three lists (i.e. ‘data-derived list’, ‘patient reported list’ and ‘clinicians (acute) SARS-CoV-2 infection list’) and symptoms that were included on at least two lists were considered ‘core symptoms’, while the remaining symptoms were considered ‘additional symptoms’ (S2 Table). We used the core and additional symptoms listed in S2 Table to classify the individuals with SARS-CoV-2 infection as having PCS using their EHR data according to a broad and narrow classification. According to the broad classification patients should have consulted the GP for at least one core symptom or at least two different additional symptoms, 3–12 months after SARS-CoV-2 infection. According to the narrow classification patients should have consulted the GP for at least two symptoms of which minimal one core symptom and at least two consultations for these symptoms at the GP. We created these two classifications with current literature and definitions of PCS in mind (e.g. but not limited to [8, 14, 17]), input from GPs (JM, TOH, BK) and the involved researchers. We used these two classifications because there is currently no uniform definition and we aimed to investigate the influence of using different classifications. The broad classification was created to be inclusive to possible heterogeneity of the syndrome and the narrow classification to depend more on care usage and the core symptoms.
Corona Survey Cohort.
Of the total number of participants in the Corona Survey Cohort (n = 442), n = 276 (62%) participants were selected for the current analysis as they completed the first questionnaire within 3 months after SARS-CoV-2 infection (between January–September 2021) and could answer the questions regarding acute symptoms more accurately. Individuals were classified as having PCS when they reported at least one symptom, from a selected list of symptoms, three months after the SARS-CoV-2 infection and experienced discomfort in their daily living (first survey) or reported not be recovered after the initial SARS-CoV-2 infection (second survey). Individuals were classified as non-PCS (n = 93) when they reported not to experience discomfort or reported to be recovered. Individuals were classified as ‘unknown’ (n = 92) when relevant data to classify individuals as PCS or non-PCS was missing or when there was a discrepancy in the answers to the questionnaire (i.e. report no symptoms, but also report not to be recovered).
Outcomes and covariates.
Age categories were divided into: children and adolescents (age 0–23 years of age), adults (24–70 years of age), elderly (≥ 70 years of age). Migration background was dichotomized as: both parents were born in the Netherlands (0) and at least one parent is not born in the Netherlands (1). Education level was divided into low (primary school of pre-vocational education), medium (secondary or vocational education) and high (professional higher education or university) education level. Income level was only available in the GP-EHR database and was divided according to standardized household income in the Netherlands into low (0–40 percentile), medium (40–80 percentile) and high (>80 percentile). GP consultations were defined as long, medium and short consultations including consultations by phone and email and long and short visitations. Long and short consultations with the nurse practitioner were also included.
Statistical analysis
We used descriptive statistics to describe the sample characteristics of the PCS patients in the combined EHR database and the Corona Survey Cohort. For continuous outcome variables t-tests were used to compare groups and for dichotomous outcome variables Chi-square test were used. In a subgroup of the EHR database we performed a network analysis, Louvain Community Detection [27], to identify symptoms that often co-occur in individuals with PCS. For these analyses we only included individuals who consulted the GP for two different symptoms (n = 1,503) and we used the R-package ‘igraph’ for visualization. A community was included in the network when at least 1% of individuals have this combination of symptoms and we used a cut-off of >0.3 on the modularity score to ensure the quality of the communities and network [28]. We then classified the 1503 individuals with at least two symptoms in one community and described the demographic characteristics of these individuals. For all analyses, p-values below 0.001 were deemed statistically significant. Statistical analyses were performed in STATA (version 16.1) and R (version 4.1.3).
Results
In the GP-EHR database we selected n = 10,313 individuals who were all infected with the SARS-CoV-2 virus in 2020. Of these individuals, n = 452 (4.3%) were hospitalized due to SARS-CoV-2 infection during the acute phase (0–3 months after infection). Table 1 describes the characteristics of these individuals, classified according to the broad and narrow classifications of PCS. The selection of the Corona Survey Cohort we used for the current analysis included n = 276 individuals who had been infected bySARS-CoV-2 virus. Of these individuals n = 18 (6.6%) had been hospitalized during the acute phase (0–3 months after infection). Table 2 describes the characteristics of the individuals from the Corona Survey Cohort. The percentages of individuals classified as having PCS ranged from 15–33% depending on the classification and data source (Tables 1 and 2).
GP-EHR data: Demographics and other characteristics using broad and narrow PCS classifications
In the GP-EHR data comparisons were made between individuals in the PCS group and the non-PCS group according to the broad and narrow classifications (Table 1). Results of the comparisons were similar for both classifications and therefore we only mention the results using the narrow definition in the text. Individuals with PCS were more often female (69% vs. 57%, p≤0.001) and were older (53.4 vs. 51.1 years, p≤0.001) compared to the non-PCS group. There were significantly fewer children in the PCS group compared to the non-PCS group and more adults and elderly in the PCS groups (Table 1). There was no difference between the PCS group and the non-PCS group in education level, household income or migration background. The average number of symptoms for which the GP was consulted by individuals with PCS was 6.8 (SD 5.4) symptoms per patient versus 0.9 (SD 1.8) in the non-PCS group (p≤0.001). The average number of GP consultations was, by definition, higher in the PCS group compared to the non-PCS group (5.5 vs. 0.8 consultations, p≤0.001).
Corona Survey Cohort: Demographics and other characteristics in PCS and non-PCS group
In the Corona Survey Cohort (Table 2) we compared the PCS group to the non-PCS group. The characteristics of the unknown group (n = 92) can be found in S3 Table. Unlike in the GP-EHR data, there was no significant difference in age and sex between the PCS group and the non-PCS group. Individuals in the PCS group more often had a lower education level (p≤0.001) compared to the non-PCS group. The average number of self-reported symptoms in the PCS group was significantly higher compared to the non-PCS group at 3 months after infection (9.2 vs. 2.8; p≤0.001) and also 6 months after infection (7.2 vs. 2.1; p≤0.001). Twenty-one (23%) individuals with PCS reported that they are working less or stopped working due to PCS symptoms after 3 months and 11 (16%) after 6 months (Table 2).
GP-EHR data: Frequency of symptoms stratified by sex
Fig 1 shows the frequencies of patients that visit the GP for a particular symptom 3–12 months after the SARS-CoV-2 infection stratified by PCS classification and sex for the top 10 most prevalent core symptoms based on the narrow definition. For all symptoms we found that males consulted their GP less often for these symptoms compared to females. The most prevalent symptoms in females were psychological symptoms (22–25%) including anxiety and depression, while respiratory symptoms (15–19%) like coughing or dyspnea were most prevalent in males.
Barplot showing the frequency of occurrence of category of symptoms in the GP-EHR data for the broad (dark grey) and narrow classifications (light grey) stratified for females (left) and males (right). Top 10 symptoms was based on the narrow definition.
Corona Survey Cohort: Frequency of symptoms over time in males and females
Fig 2 shows the frequency of symptoms for the PCS patients in the Corona Survey Cohort at 3 and 6 months, stratified by gender. The top 10 most prevalent symptoms at 3 months are shown. Overall, symptom frequencies are considerably higher in the Corona Survey Cohort compared to the GP-EHR data and different symptoms are reported (Fig 1). In the Corona Survey Cohort, the most prevalent and persistent symptom was fatigue in both males (3 months: 89%, 6 months 78%) and females (3 months: 89%, 6 months 86%). Similar to the GP-EHR data, males reported less symptoms than females and males more often reported respiratory symptoms compared to females at 3 months (58% vs. 45%), while this is opposite at 6 months (44% vs. 51%). Overall, in females the reported frequencies decreased for four symptoms while it increased for six symptoms. In males we found that the frequency decreased or stayed the same for seven symptoms while it increased for three symptoms (Fig 2).
Barplot showing the frequency of self-report symptoms in the Corona Survey Cohort at 3 months (dark grey) and 6 months (light grey) after infection stratified for females (left) and males (right). The top 10 most prevalent symptoms at 3 months are shown in the graph.
GP-EHR data: Community detection analyses to explore clinical subtypes
To explore possible clinical subtypes of PCS community detection analyses were performed in a subgroup of individuals in the GP-EHR data (n = 1,503) who visited the GP for at least two different symptoms. Logically, this is roughly the same group of individuals as the PCS group defined by the narrow classification (n = 1,533) except that we did not include data from individuals who visited the GP OOH-services (n = 30) for these analysis. Fig 3 shows the results of the community detection of the combination of symptoms that often occur together. We identified a network with a modularity score (possible range: -0.5 to 1.0) of 0.302 indicating a network with average strength in which three communities with symptoms were identified (Fig 3). Community A includes psychological and generalized symptoms and was statistically significant (p = 0.045), Community B includes cardiorespiratory symptoms (p = 0.494) and Community C includes gastrointestinal symptoms (p = 0.617). The communities are solely based on symptoms that co-occur and not on how many individuals have only these combinations of symptoms. Therefore, we subsequently analyzed how many individuals of this subgroup (n = 1,503) could be classified as experiencing this combination of symptoms (i.e. having at least two symptoms within one community). When classifying the group into individuals with at least two ‘community symptoms’ we found that n = 248 (17%) had symptoms across communities and n = 458 (30%) had symptoms that were not included in the network. In addition, there were individuals with only a single ‘community symptom’, n = 360 (24%) in community A, n = 126 (8%) in community B and n = 150 (10%) in community C. Table 3 shows the characteristics of the individuals who could be classified as experiencing distinct community symptoms. The group with neuro-respiratory symptoms (Community A) was the largest group (n = 109, 7%) and often females with an average age of 54.2. Thirty-two (2%) individuals experienced only symptoms from community B which are gastrointestinal symptoms. These individuals were younger (mean age 51.2) and included the lowest percentages of females (68%). The group with cardiopulmonary symptoms was the smallest including n = 18 (1%) who had an average age of 55.3 years. The percentages of individuals with a migration background was similar across the community groups (22–24%; Table 3).
Communities of symptoms that co-occur that were detected in a subgroup (n = 1503) individuals of the GP-EHR database. Three communities were detected which are displayed with different colors: Community A including psychological-generalized symptoms (dark grey), Community B including gastrointestinal symptoms (light grey) and Community C including cardiorespiratory symptoms (white). The size of the circles shows how often symptoms occur in the data with bigger circles occurring more frequent. The numbers indicate how often symptoms co-occur.
Discussion
This study describes the extent to which classifications and data collection methods are associated with the frequency of post-COVID syndrome as well as its constituting symptoms. By combining the results of the analyses of GP-EHR data and longitudinal questionnaire the main findings are: 1) the frequency of PCS among individuals infected by the SARS-CoV-2 between April–July 2020 in the Netherlands, ranged from 15–33% depending on the classification and data source used; 2) individuals with PCS were on average 53 years old and more often female; 3) individuals with PCS consulted the GP most often with psychological problems, while fatigue was the most often self-reported symptom; 4) three communities of possible related PCS symptoms were identified but require further examination and validation to define clinical subtypes of PCS.
Thus far worldwide prevalence rates of PCS vary widely between studies depending on populations, methods and definitions. Two recent meta-analysis regarding the prevalence of PCS for instance showed that prevalence rates were higher among individuals who were hospitalized during the acute phase compared nonhospitalized populations [14, 17]. Our study includes both nonhospitalized and hospitalized individuals in the GP-EHR data (4% hospitalized) as well as in the Corona Survey cohort (7% hospitalized), although our group of hospitalized individuals are small in comparison to hospitalization rates due to acute SARS-CoV-2 infection in the Netherlands at that time [29, 30]. This might cause for a slight underestimation of the PCS frequency in our study. Another important factor influencing the variety among prevalence rates reported is whether a control group and comparisons on individual level with pre-COVID situation regarding symptoms and comorbidities has been included [14]. The few studies that have also included this crucial correction for a control group, like our study, generally report lower prevalence rates, similar to our results [6, 31]. Other obvious but central differences between studies on prevalence rates are whether PCS is defined based on self-report and whether patients are included based on only one symptom or on multiple symptoms. Our results underline and clarify the influence of these factors and the impact it has on the characterization of the group individuals suffering from PCS as we compared a broad (minimal 1 symptom) and narrow (minimal 2 symptoms and multiple consultations) classifications in the EHR data and the self-report data from the Corona Survey Cohort. Besides the obvious influence the narrow and broad definition have on the size of the PCS group it did not influence the characteristics (i.e. age, gender, migration background) of the individuals included. On the other hand, when comparing the PCS group in self-report survey data (Corona Survey Cohort) we did find a noteworthy difference in the level of education between the PCS and the non-PCS group which we did not find in the EHR data. In the survey data we found a higher percentage of individuals with a low education in the PCS group compared to the non-PCS group and the total group. This finding is in line with a German study which also showed that higher level of education was associated with a lower risk of PCS [13]. The lack of association in the GP-EHR data could be due to the large number of individuals for whom the level of education was unknown (38%), although the distribution among the education categories in the group of individuals in the PCS group for whom this is known (62%) is similar to the distribution in the total COVID group. Future studies should further investigate the association between PCS and education level to validate our findings.
In general, findings thus far published regarding sex and PCS are quite consistent and also in line with our results as most studies report a higher occurrence of PCS in females compared to males [3, 17, 32, 33]. In addition, we also found that females report or seek help for different PCS symptoms than males. Females with PCS more often consult the GP for mental health symptoms, while males consult the GP most often for respiratory symptoms. A previous study also reported sex differences in relation to PCS symptoms but only included somatic symptoms and no psychological or mental symptoms [6]. Nevertheless, our results are in line with a large body of literature showing that males are less likely to seek medical help, in particular for mental health problems [34]. In general the most prevalent PCS symptoms for which the GP is consulted are psychological symptoms including anxiety and depressed mood, digestive symptoms including diarrhea and obstipation and respiratory symptoms including dyspnea and trouble breathing. Surprisingly, fatigue is not the most often reported symptom at the GP while a meta-analysis reported that this is the most common symptom of PCS [35]. Yet when focusing on self-reported symptoms in our survey data, fatigue is found to be the most common symptom. This emphasizes the differences between using routine healthcare registry data and self-report data which has been reported before [22, 36, 37], but requires further examination in relation to PCS.
We found that in the Corona Survey Cohort 23% of the individuals with PCS stopped working or worked less after three months compared to 0% in the non-PCS group. After six months 11% of individuals with PCS stopped working or worked less. It has been reported in other studies as well that work ability can be severely affected by PCS which may have large consequences at individual but also on societal level [38–40]. Tailored interventions for PCS in relation to work focusing on management of symptoms, impact on work ability, possible workplace adjustments and job modifications should be considered. On the other hand we found that there is also a substantial group of individuals with PCS (77%) who were able to maintain work ability which may be related to their type of work or to the type of healthcare provided to cope or recover from PCS. Large scaled analysis focusing on healthcare utilization patterns in PCS combined with outcome measures such as work (dis)ability are needed to further assess the relationship between PCS and work. In addition, further research which follows a patient group as well as a reference group, in this case the non-PCS group, over a longer period of time and also after the COVID-19 pandemic is important as well. In this current study we only focused on a relatively short period and found very little changes regarding work in the non-PCS group (0% at 3 months and 1% at 6 months) which could be the result of an inclusion bias or related to the fact that ‘experiencing discomfort in daily living’ was part of the definition of PCS.
To examine whether we could use machine learning to identify specific clusters of symptoms that often co-occur we performed a community detection analysis. We identified three communities which only partly overlap with clusters identified in a previous study which identified clusters across different SARS-CoV-2 variants [20]. Similar to our findings, Canas and colleagues (2022) also identified a cluster with mainly cardiorespiratory symptoms which was associated with the wild-type variant of the virus (i.e. first stages of the pandemic). The other clusters and communities associated with the early variant of the virus were however different from our findings [20]. Another study using electronic health records and a data-driven approach to identify symptom patterns in PCS identified two groups based on the prevalence of symptoms rather than the combination of co-occurring symptoms [41]. In addition literature studies on symptoms that often co-occur do however also often show a cardiorespiratory cluster [42–44], a generalized-mood or neuropsychiatric cluster [45, 46], and a gastrointestinal cluster [18, 47]. Overall it has become clear that not all literature points in the same direction as also other clusters have been mentioned depending on population, virus variant and clustering method [48]. Our results are not conclusive on the clinical subtypes and should be interpreted with caution as groups were small and there were many individuals with symptoms in multiple clusters or other combinations of symptoms not identified by the analyses. It is also important to mention that the community detection analyses were conducted in a subgroup of individuals whom consulted the GP for various symptoms, which could indicate a more severe phenotype of PCS and might not occur in all individuals with PCS. Future studies, perhaps using data with biological and continuous parameters, should validate and further examine possible phenotypes of PCS as EHR might be not be well suited for these types of analysis due to the categorical coding and registration limitations [21].
The major strengths of this study lie in generalizability of the findings and the combination of methods which serve as an internal validation of the findings. Our findings from the GP-EHR data are generalizable to the Dutch population as this database includes a representative sample of the Dutch population. In addition, data from before the COVID-19 pandemic regarding healthcare usage and matched controls to define our PCS groups and compared to two reference groups (non-PCS and non-COVID). In addition, the combination of methods (using EHR data and surveys) allows for internal validation and interpretation of the results and provides a unique opportunity to compare frequency rates and symptoms reported in self-report data and routine healthcare data. Also, by using different sources of input (data-driven, list of WHO and experts) to create a list with core and additional symptoms added rigor to our study. However, also some limitations of this study should also be acknowledged. First, we identified patients as having had a SARS-CoV-2 infection when they visited their GP with COVID related symptoms and not by including all patients who were tested positive for the virus by the national testing authorities as public testing was not yet available during this time period. This limitation also applies to the control cases of whom we assumed that they did were infected with SARS-CoV-2 at the time of the study. Second, in the survey data (Corona Survey Cohort) there may be a selection bias as individuals who are experiencing persistent symptoms may be more likely to complete questionnaires compared to individuals not experiencing symptoms. In addition, there was a subgroup (the unknown group, 33%) of the Corona Survey Cohort which were not able to classify as having PCS or not due to missing data or discrepancies in the data and was therefore excluded from the analysis. These are insuperable biases when using survey data which should be taken into account when interpreting the results. Lastly, in this paper we only focused on individuals who were infected with SARS-CoV-2 in the first period of the COVID-19 pandemic and therefore not all variants of the virus are included. In future studies it would be possible to examine the relationship between virus variants and PCS.
Conclusions
In conclusion, our results indicate how classifications and the choice of data sources may affect the frequency of PCS and the characteristics of the individuals affected by it as well as symptoms that are regarded as part of it. Frequency rates differ between methods and data sources (15–33%). Using the EHR cohort characteristics of the PCS population were stable across methods as we found that is mostly affects middle-aged females. In the survey cohort however, the PCS group did not differ from the non-PCS group regarding age and sex. The insights from this study form a solid basis for subsequent analyses on quality of life, care trajectories and risk factors for developing PCS. These analyses have been conducted in parallel studies to improve understanding and care for individuals with PCS, which is desperately needed.
Supporting information
S1 Fig. Flowchart of study populations.
Schematic overview of the included study populations of the GP-EHR cohort and the Corona Survey Cohort.
https://doi.org/10.1371/journal.pone.0318272.s001
(PNG)
S1 Table. Characteristics of matched control group.
Demographic and socioeconomic characteristics of the matched reference group that was used for classification.
https://doi.org/10.1371/journal.pone.0318272.s002
(DOCX)
S2 Table. List of core and additional symptoms and diagnosis used in GP-EHR cohort to classify PCS.
Symptoms and diagnosis are based on ICPC coding and are classified as core or additional symptom. Symptoms and diagnosis are also classified in categories which are used in Fig 1.
https://doi.org/10.1371/journal.pone.0318272.s003
(DOCX)
S3 Table. Characteristics of the unknown group in Corona Survey Cohort.
Demographic and socioeconomic characteristics of individuals in the Corona Survey Cohort who could not be classified and were therefore excluded from the analyses.
https://doi.org/10.1371/journal.pone.0318272.s004
(DOCX)
References
- 1. Carfì A, Bernabei R, Landi F, Group GAC-P-ACS. Persistent Symptoms in Patients After Acute COVID-19. JAMA. 2020;324(6):603–5. pmid:32644129
- 2. Peter RS, Nieters A, Kräusslich H-G, Brockmann SO, Göpel S, Kindle G, et al. Prevalence, determinants, and impact on general health and working capacity of post-acute sequelae of COVID-19 six to 12 months after infection: a population-based retrospective cohort study from southern Germany. BMJ. 2022:2022.03.14.
- 3. Taquet M, Dercon Q, Luciano S, Geddes JR, Husain M, Harrison PJ. Incidence, co-occurrence, and evolution of long-COVID features: A 6-month retrospective cohort study of 273,618 survivors of COVID-19. PLoS Medicine. 2021;18(9):e1003773. pmid:34582441
- 4. Meza-Torres B, Delanerolle G, Okusi C, Mayor N, Anand S, Macartney J, et al. Differences in Clinical Presentation With Long COVID After Community and Hospital Infection and Associations With All-Cause Mortality: English Sentinel Network Database Study. JMIR Public Health and Surveillance. 2022;8(8):e37668. pmid:35605170
- 5. Lopez-Leon S, Wegman-Ostrosky T, Perelman C, Sepulveda R, Rebolledo PA, Cuapio A, et al. More than 50 Long-term effects of COVID-19: a systematic review and meta-analysis. Sci Rep. 2021;11. pmid:34373540
- 6. Ballering AV, van Zon SKR, olde Hartman TC, Rosmalen JGM. Persistence of somatic symptoms after COVID-19 in the Netherlands: an observational cohort study. The Lancet. 2022;400(10350):452–61. pmid:35934007
- 7. Chaichana U, Man KKC, Chen A, Wong ICK, George J, Wilson P, et al. Definition of Post–COVID-19 Condition Among Published Research Studies. JAMA Network Open. 2023;6(4):e235856–e. pmid:37017970
- 8. WHO. A clinical case definition of post COVID-19 condition by a Delphi consensus, 6 October 2021 2021 [updated 2021]. Available from: https://www.who.int/publications/i/item/WHO-2019-nCoV-Post_COVID-19_condition-Clinical_case_definition-2021.1.
- 9. Excellance. NIfHaC. COVID-19 rapid guideline: managing the long-term effects of COVID-19 2021. Available from: https://www.nice.org.uk/guidance/ng188.
- 10. Prevention. CfDCa. Long COVID or Post-COVID Conditions 2022. Available from: https://www.cdc.gov/coronavirus/2019-ncov/hcp/clinical-care/post-covid-conditions.html.
- 11. Fernández-de-las-Peñas C, Palacios-Ceña D, Gómez-Mayordomo V, Florencio LL, Cuadrado ML, Plaza-Manzano G, et al. Prevalence of post-COVID-19 symptoms in hospitalized and non-hospitalized COVID-19 survivors: A systematic review and meta-analysis. European Journal of Internal Medicine. 2021;92:55–70. pmid:34167876
- 12. Sudre CH, Murray B, Varsavsky T, Graham MS, Penfold RS, Bowyer RC, et al. Attributes and predictors of long COVID. Nature Medicine. 2021;27(4):626–31. pmid:33692530
- 13. Bahmer T, Borzikowsky C, Lieb W, Horn A, Krist L, Fricke J, et al. Severity, predictors and clinical correlates of Post-COVID syndrome (PCS) in Germany: A prospective, multi-centre, population-based cohort study. eClinicalMedicine. 2022;51:101549. pmid:35875815
- 14. O’Mahoney LL, Routen A, Gillies C, Ekezie W, Welford A, Zhang A, et al. The prevalence and long-term health effects of Long Covid among hospitalised and non-hospitalised populations: A systematic review and meta-analysis. eClinicalMedicine. 2023;55:101762. pmid:36474804
- 15. Kayaaslan B, Eser F, Kalem AK, Kaya G, Kaplan B, Kacar D, et al. Post‐COVID syndrome: A single‐center questionnaire study on 1007 participants recovered from COVID‐19. Journal of medical virology. 2021;93(12):6566–74. pmid:34255355
- 16. Förster C, Colombo MG, Wetzel A-J, Martus P, Joos S. Persisting symptoms after COVID-19: prevalence and risk factors in a population-based cohort. Deutsches Ärzteblatt international. 2022;119(10):167.
- 17. Chen C, Haupert SR, Zimmermann L, Shi X, Fritsche LG, Mukherjee B. Global Prevalence of Post COVID-19 Condition or Long COVID: A Meta-Analysis and Systematic Review. The Journal of Infectious Diseases. 2022:jiac136.
- 18. Blackett JW, Li J, Jodorkovsky D, Freedberg DE. Prevalence and risk factors for gastrointestinal symptoms after recovery from COVID-19. Neurogastroenterology & Motility. 2022;34(3):e14251. pmid:34468069
- 19. Yong SJ, Liu S. Proposed subtypes of post-COVID-19 syndrome (or long-COVID) and their respective potential therapies. Reviews in Medical Virology. 2022;32(4):e2315. pmid:34888989
- 20. Canas LS, Molteni E, Deng J, Sudre CH, Murray B, Kerfoot E, et al. Profiling post-COVID syndrome across different variants of SARS-CoV-2. Lancet Digit Health. 2023. pmid:37202336
- 21. Verheij RA, Curcin V, Delaney BC, McGilchrist MM. Possible sources of bias in primary care electronic health record data use and reuse. Journal of medical Internet research. 2018;20(5):e185. pmid:29844010
- 22. Voss M, Stark S, Alfredsson L, Vingård E, Josephson M. Comparisons of self-reported and register data on sickness absence among public employees in Sweden. Occupational and Environmental Medicine. 2008;65(1):61–7. pmid:17704196
- 23. Krosnick JA. Survey research. Annual review of psychology. 1999;50(1):537–67. pmid:15012463
- 24. Heins M, Bes J., Weesie Y., Davids R., Winckers M., Korteweg L., Hellwich M., Dijk L. van, Knottnerus B., Overbeek L., Hasselaar J., Hek K., Vanhommerig J. Zorg door de huisartsNivel Zorgregistraties Eerste Lijn: jaarcijfers 2022 en trendcijfers 2018–2022. Utrecht: 2023 2023. Report No.
- 25. Veldkamp R, Hek K, van den Hoek R, Schackmann L, van Puijenbroek E, van Dijk L. Nivel Corona Cohort: A description of the cohort and methodology used for combining general practice electronic records with patient reported outcomes to study impact of a COVID-19 infection. PLoS One. 2023;18(8):e0288715. Epub 20230822. pmid:37607170; PubMed Central PMCID: PMC10443834.
- 26. Hooiveld M, Hek K, Hendriksen J, Bolt E, Weesie Y, Spreeuwenberg P, et al. Weekcijfers COVID-19-patiënten in de huisartsenpraktijk. Week 10–27, 2 maart—5 juli 2020. | Nivel 2020 [updated 09-07-2020].
- 27. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008;(10):P10008.
- 28. Xie J, Szymanski BK, editors. Community Detection Using A Neighborhood Strength Driven Label Propagation Algorithm2011 06/2011.
- 29. Coyer L, Wynberg E, Buster M, Wijffels C, Prins M, Schreijer A, et al. Hospitalisation rates differed by city district and ethnicity during the first wave of COVID-19 in Amsterdam, The Netherlands. BMC public health. 2021;21(1):1721. pmid:34551752
- 30. Mathieu E, Ritchie H, Rodés-Guirao L, Appel C, Giattino C, Hasell J, et al. Coronavirus Pandemic (COVID-19). Our World in Data. 2020.
- 31. Cazé ABC, Cerqueira-Silva T, Bomfim AP, Souza GLd, Azevedo ACA, Brasil MQA, et al. Prevalence and risk factors for long COVID after mild disease: a longitudinal study with a symptomatic control group. J Glob Health. 2022. pmid:37166260
- 32. Montenegro P, Moral I, Puy A, Cordero E, Chantada N, Cuixart L, et al. Prevalence of Post COVID-19 Condition in Primary Care: A Cross Sectional Study. International Journal of Environmental Research and Public Health. 2022;19(3):1836. pmid:35162857
- 33. Bai F, Tomasoni D, Falcinella C, Barbanotti D, Castoldi R, Mulè G, et al. Female gender is associated with long COVID syndrome: a prospective cohort study. Clinical Microbiology and Infection. 2022;28(4):611.e9-.e16. pmid:34763058
- 34. Galdas PM, Cheater F, Marshall P. Men and health help-seeking behaviour: literature review. Journal of Advanced Nursing. 2005;49(6):616–23. pmid:15737222
- 35. Pavli A, Theodoridou M, Maltezou HC. Post-COVID Syndrome: Incidence, Clinical Spectrum, and Challenges for Primary Healthcare Professionals. Archives of Medical Research. 2021;52(6):575–81. pmid:33962805
- 36. Reijneveld SA, Stronks K. The validity of self-reported use of health care across socioeconomic strata: a comparison of survey and registration data. International Journal of Epidemiology. 2001;30(6):1407–14. pmid:11821355
- 37. Kroneman M, Verheij R, Tacken M, van der Zee J. Urban–rural health differences: primary care data and self reported data render different results. Health & Place. 2010;16(5):893–902. pmid:20493756
- 38. Tan KWA, Koh D. Long COVID-Challenges in diagnosis and managing return-to-work. J Occup Health. 2023;65(1):e12401. pmid:37098838; PubMed Central PMCID: PMC10132176.
- 39. Gualano MR, Rossi MF, Borrelli I, Santoro PE, Amantea C, Daniele A, et al. Returning to work and the impact of post COVID-19 condition: A systematic review. Work. 2022;73:405–13. pmid:35938280
- 40. Kerksieck P, Ballouz T, Haile SR, Schumacher C, Lacy J, Domenghino A, et al. Post COVID-19 condition, work ability and occupational changes in a population-based cohort. The Lancet Regional Health—Europe. 2023;31:100671. pmid:37366496
- 41. Bowyer RC, Huggins C, Toms R, Shaw RJ, Hou B, Thompson EJ, et al. Characterising patterns of COVID-19 and long COVID symptoms: evidence from nine UK longitudinal studies. European journal of epidemiology. 2023;38(2):199–210. pmid:36680646
- 42. Caspersen IH, Magnus P, Trogstad L. Excess risk and clusters of symptoms after COVID-19 in a large Norwegian cohort. European Journal of Epidemiology. 2022;37(5):539–48. pmid:35211871
- 43. Xie Y, Xu E, Bowe B, Al-Aly Z. Long-term cardiovascular outcomes of COVID-19. Nature medicine. 2022;28(3):583–90. pmid:35132265
- 44. Choi Y, Kim HJ, Park J, Lee M, Kim S, Koyanagi A, et al. Acute and post-acute respiratory complications of SARS-CoV-2 infection: population-based cohort study in South Korea and Japan. Nature Communications. 2024;15(1):4499. pmid:38802352
- 45. Wong-Chew RM, Rodríguez Cabrera EX, Rodríguez Valdez CA, Lomelin-Gascon J, Morales-Juárez L, de la Cerda MLR, et al. Symptom cluster analysis of long COVID-19 in patients discharged from the Temporary COVID-19 Hospital in Mexico City. Therapeutic Advances in Infectious Disease. 2022;9:20499361211069264. pmid:35059196
- 46. Kim S, Lee H, Lee J, Lee SW, Kwon R, Kim MS, et al. Short-and long-term neuropsychiatric outcomes in long COVID in South Korea and Japan. Nature human behaviour. 2024;8(8):1530–44. pmid:38918517
- 47. Xu E, Xie Y, Al-Aly Z. Long-term gastrointestinal outcomes of COVID-19. Nature communications. 2023;14(1):983. pmid:36882400
- 48. Basharat S, Chao Y-S, McGill SC. Subtypes of post–COVID-19 condition: a review of the emerging evidence. Canadian Journal of Health Technologies. 2022;2(12).