Chikungunya outbreak (2015) in the Colombian Caribbean: Latent classes and gender differences in virus infection

Chikungunya virus (CHIKV), a mosquito-borne alphavirus of the Togaviridae family, is part of a group of emergent diseases, including arbovirus, constituting an increasing public health problem in tropical areas worldwide. CHIKV causes a severe and debilitating disease with high morbidity. The first Colombian autochthonous case was reported in the Colombian Caribbean region in September 2014. Within the next two to three months, the CHIKV outbreak reached its peak. Although the CHIKV pattern of clinical symptomatology has been documented in different epidemiological studies, understanding of the relationship between clinical symptomatology and variation in phenotypic response to CHIKV infection in humans remains limited. We performed a cross sectional study following 1160 individuals clinically diagnosed with CHIKV at the peak of the Chikungunya outbreak in the Colombian Caribbean region. We examined the relationship between symptomatology and diverse phenotypic responses. Latent Class Cluster Analysis (LCCA) models were used to characterize patients’ symptomatology and further identify subgroups of individuals with differential phenotypic response. We found that most individuals presented fever (94.4%), headache (73.28%) and general discomfort (59.4%), which are distinct clinical symptoms of a viral infection. Furthermore, 11/26 (43.2%) of the categorized symptoms were more frequent in women than in men. LCCA disclosed seven distinctive phenotypic response profiles in this population of CHIKV infected individuals. Interestingly, 282 (24.3%) individuals exhibited a lower symptomatic “extreme” phenotype and 74 (6.4%) patients were within the severe complex “extreme” phenotype. Although clinical symptomatology may be diverse, there are distinct symptoms or group of symptoms that can be correlated with differential phenotypic response and perhaps susceptibility to CHIKV infection, especially in the female population. This suggests that, comparatively to men, women are a CHIKV at-risk population. Further study is needed to validate these results and determine whether the distinct LCCA profiles are a result of the immune response or a mixture of genetic, lifestyle and environmental factors. Our findings could contribute to the development of machine learning approaches to characterizing CHIKV infection in other populations. Preliminary results have shown prediction models achieving up to 92% accuracy overall, with substantial sensitivity, specificity and accuracy values per LCCA-derived cluster.

Epidemiology reports show fluctuating typical and atypical symptoms from one epidemic to another, possibly due to: i) specific environment, ii) the genetic and immunological makeup of the infected population, iii) the viral strain and iv) the specific applied epidemiological models. This clinical heterogeneity reported across epidemiological and clinical studies of CHIKV outbreaks highlights the need for and importance of a rigorous clinical profile of the CHIKV infection in humans. [8,23,24] In this study, we comprehensively analyzed clinical data from 1160 individuals from the metropolitan area of Barranquilla, Colombia, located on the northern Caribbean coast. Patients were diagnosed with CHIKV infection to evaluate the presence of group of individuals clustering similar and unique phenotypes (latent classes), predictors of infection susceptibility, and differential phenotypic response, i.e., a set of specific symptoms that an individual generates in response to CHIKV infection. We found strong evidence for the existence of different mutually exclusive profiles of phenotypic response in this population, and that females infected with CHIKV exhibited significant and heterogeneous differential symptomatology patterns when compared to men. For the first time, these results offer information about the characteristics of at-risk populations affected by CHIKV infection and the presence of different subpopulations of phenotypic response. Although future studies are needed to better understand the contribution of demographic, immunological and genetic factors to this differential phenotypic response, especially in this understudied population, our findings could be used as a starting point for the development of machine learning approaches to characterizing CHIKV infection in other populations, in order to provide more accurate and differential diagnosis and treatment.

Study design, target population and data collection
A cross-sectional analysis of patients clinically diagnosed with CHIKV, ascertained through the "Programa de Vigilancia Epidemiológica de la Secretaría de Salud" (Health Secretary Program of Epidemiological Surveillance) in Barranquilla, Colombia, was performed to evaluate the presence of distinct subgroups of patients according to their clinical profiles. This program is responsible for surveying and reporting outbreaks of infectious diseases occurring in this geographical area of the Colombian Caribbean coast. World Health Organization (WHO) recommendations were followed to recruit CHIKV infected patients and samples. [25] The city of Barranquilla, considered the main urban area in northern Colombia, is the capital of the Atlántico region and the fourth most populous city in the country with an estimated population of~1.2 million inhabitants ( Fig 1A). [26] Barranquilla has distinctive tropical weather conditions (average relative humidity of 80% and average temperature of 27˚C). Strategically located next to the delta of the Magdalena river, the city serves as a port for river and maritime transportation within Colombia and is the main industrial, shopping, educational and cultural center of the Caribbean region of Colombia.
Patients were clinically diagnosed through face-to-face interview by the epidemiological surveillance team, following the WHO and CDC clinical evaluation recommendations. [25] The epidemiological surveillance team was formed of medical practitioners, nurses and health technicians who were responsible for gathering information about the CHIKV infectious disease emergency and carrying out case and contact investigations, in order to determine the epidemiological aspects of the CHIKV outbreak. The team collected information about suspected cases, possible contacts, disease characteristics, clinical characteristics, and possible disease exposure, in order to obtain, prioritize, and submit specimens for laboratory testing. Recruited CHIKV patients were in acute phases of the infection.
We actively collaborated with the Health Secretary Division of Barranquilla and its epidemiological surveillance team during the first and only CHIKV outbreak to date. The team received medical information about CHIKV clinically diagnosed patients from the main health providers (i.e., clinics and hospitals) in the city. These health providers were well-trained practitioners following the CHIKV diagnostic protocol recommended by the Colombian National Institute of Health (CNIH), [27] and constituted the first line of response against the infection. Patients reported by health providers were subsequently visited and surveyed by the epidemiological surveillance team.

Ethics statement
The authors assert that all procedures contributing to this work have been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments.

Subjects and case definition
Only clinically confirmed cases of CHIKV infection were included in this study. Diagnosis was assessed according to criteria described in CNIH guidelines [27] and reported during weeks 36 to 52 of 2014 (September to December; Fig 1B)-the peak period of the Chikungunya outbreak. [6] Following the CHIKV protocol from the CNIH, patients were clinical diagnosed based on signs and symptoms, together with the epidemiological history of the population. Individuals were all residents in a geographical place where CHIKV presence was previously confirmed by laboratory testing (i.e., autochthonous cases). Signs and symptoms included fever >38˚C, severe joint pain or initial acute arthritis symptoms and rash, which could not attributed to other medical conditions (and after discarding Dengue virus (DV) infection; S1 Table). Suspected cases were defined as residents of a geographical region where CHIKV presence had not been detected by laboratory testing. Signs and symptoms included fever >38˚C, severe joint pain or initial acute arthritis symptoms and rash, could not be attributed to other medical conditions (and after discarding DV infection). Suspected cases were confirmed by laboratory testing using serum-based RT-PCR to detect viral RNA. Serum was sampled within the first eight days of the onset of symptoms. Patients presenting signs and symptoms characteristic of DV infection (S1 Table) were laboratory tested for IgM using ELISA. Those with positive tests for DV were excluded from this study.

Clinical assessment
A total of 1322 individuals were interviewed and their clinical symptomatology subsequently registered. Since infection by CHIKV is a public health concern, and due to the Epidemiological Surveillance program being sanctioned by the Barranquilla Health Secretary, Colombian law allows the use of clinical material for research purposes without informed consent, which includes anonymous disclosure of results.

Statistical analysis
Demographic characterization. Measures of central tendency and dispersion were estimated for continuous variables, and frequencies and proportions for categorical variables. Continuous variables among groups (gender) were compared using a two-sample t test when the normality assumption was met and the Wilcoxon-Mann-Whitney non-parametric test otherwise. The normality assumption was contrasted using the Shapiro-Wilks test. Frequencies and frequency distributions of categorical variables (i.e., gender and age group) among groups (i.e., gender) were compared using a χ 2 test with continuity correction when the expected frequency of cells in the 2x2 contingency table was less than five. Logistic regression was used to correct for age when comparing these frequency distributions. Unless otherwise stated, statistical analyses and plotting were performed in R version 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org). The False Discovery Rate (FDR) [28,29] was used to correct for multiple testing.
Characterization of CHIKV patients based on symptomatology. Based on the clinical symptoms, we derived clinical profiles using Latent Class Cluster Analysis (LCCA) [30] as implemented in Latent GOLD 4.0 (Statistical Innovations, Belmont, MA, USA). Symptoms were registered using a binary-based system assessing the presence or absence of 26 clinical symptoms in all patients of our cohort (0: absence; 1: presence), which were further used as indicators in all LCCA models tested. Models considering 1 to 10 clusters of individuals were explored including demographic information, such as sex and age, as covariates. In order to assess the certainty of our clusters, P-values associated with L 2 statistics, by using a parametric bootstrap (500 replicates) rather than relying on asymptotic P-values, were estimated. Latent GOLD uses expectation/maximization and Newton-Raphson algorithms to find the maximum likelihood of each model after estimating model parameters. To avoid local solutions, which result from the use of maximum likelihood methods where a maximum is found that is not the true maximum of the entire sample space, a procedure automatically implemented in Latent GOLD was used.

Participants
A total of 1322 subjects, who attended consultation at health providers, and were clinically diagnosed with CHIKV infection during the main CHIKV outbreak in the Caribbean region, were included in this study. Due to incomplete or missing data collection, 162 cases (12%) were excluded.
https://doi.org/10.1371/journal.pntd.0008281.g002  In contrast, individuals in Cluster 6 (n = 97, 8.4%) are more likely to experience fever, arthralgia, difficulty grasping, general discomfort, headache, myalgias and feverish chill. Finally, Cluster 7 gathers individuals (n = 74, 6.4%) with complex symptomatology (fever, dizziness, adynamia, difficulty grasping, retroauricular and submandibular ganglia, general discomfort, nausea, cough, headache, pruritus, myalgias, skin sensitivity, feverish chill, lack of appetite, and diverse sorts of pain) (Fig 3 and Table 3). Individuals belonging to Clusters 1 and 7 represent extreme profiles of phenotypic response as they deviate from the natural history of the disease (Figs 3  and 4), and confirm the divergence of the clinical symptomatology spectrum. Interestingly, all LCCA-derived clusters are mostly constituted of females, who, as previously discussed, show a differential phenotypic response pattern when compared to men (Fig 2 and Table 2). The significance of the χ 2 -based test of association for gender (χ 2 = 34.41, degrees of freedom [df] = 6, P<0.0001), age group (χ 2 = 42.03, df = 18, P = 0.001) and gender and age group (χ 2 = 84.16, df = 39, P<0.0001) by cluster, showed that these variables are both independently and jointly associated with a differential phenotypic response in individuals with CHIKV infection. Closer inspection of the characteristics of the LCCA-derived clusters revealed that the Female:Male ratio by age group is higher in clusters 4 and 7, particularly in the AYA and adult groups, although the extreme patterns of phenotypic response occur in the latter age group for cluster 7.

Discussion
The overarching hypothesis tested by this study (i.e., the existence of significantly different latent classes of symptoms, clustering individuals infected by CHIKV during the highest PLOS NEGLECTED TROPICAL DISEASES outbreak in Barranquilla) was not rejected. Indeed, we found seven different symptom profiles clustering groups of individuals with distinguishable phenotypic response. Thus, this could potentially be used for defining extreme and intermediate phenotypes of CHIKV infection. Furthermore, the fact that i) females show distinct phenotypic response pattern compared to men (Fig 2 and Table 2), ii) all LCCA-derived clusters are mostly comprised by females (Figs 3 and 4, and Table 3), and iii) that there is a relationship between the number of symptoms and the cluster of CHIKV infected individuals (S1 Fig), highlights the importance of gender-specific and phenotypic-response-specific treatments focused on the at-risk female CHIKV population, and subpopulations with a more extreme phenotypic spectrum (i.e., clusters 1 and 7; Fig 3), instead of a one-size-fits-all approach. [31,32] A more refined characterization of CHIKV patient sub-populations (LCCA-derived clusters) may be advantageous for the implementation of specific diagnosis protocols and treatments, which could allow health providers to implement more personalized and cost-effective care for CHIKV patients, depending on their phenotypic response to the infection. Sex dimorphic immune response towards viral infection is commonly described in the literature. [33] For example, it is well known that women respond with production of higher levels of Interferon alpha (INFα) in response to viral pathogens. [33,34] Furthermore, sex dimorphic immune response to viral infections has shown higher intensity (i.e. viral load within an individual) and prevalence (i.e. number of infected individuals within a population) in males, while females can have a more favorable disease outcome. [35] Females have a stronger immune response relative to males, which can result in faster viral clearance. In vitro experiments have shown that cells from females can exhibit a 10-fold greater level of expression than cells from males. [35] However, this may also contribute to the development of autoimmunological disorders in females, such as, systemic lupus erythematosus, Graves' disease, Hashimoto's thyroiditis, multiple sclerosis, rheumatoid arthritis and scleroderma. [36] A stronger female immune response towards CHIKV infection may be associated with a complex symptomatology (fever, dizziness, adynamia, difficulty grasping, retroauricular and submandibular ganglia, general discomfort, nausea, cough, headache, pruritus, myalgias, skin sensitivity, feverish chill, lack of appetite, and diverse sorts of pain) for women compared to men in our study. Epidemiological studies have shown that males are more vulnerable to viral infections as their mortality rate is higher compared to women, [37] and women tend to have a more efficient humoral and cellular immune response against viruses. [35,38] Although a stronger immune response is clearly an advantage for certain viral infections, with CHIKV infection a potential dissemination of viral particles to lymph nodes, joints and other tissues may lead to aberrant antigenic response that could worsen the symptomatology in women as compared to men. [39][40][41] Kam et al. [42] investigated the CHIKV route of infection starting at epithelial and endothelial cells, primary fibroblasts, monocytes and monocyte-derived macrophages, further disseminating to lymph nodes, joints and other tissues. [43] Spread of the infection to the joints results in arthralgia, which mirrors rheumatoid arthritis, a condition characterized by joint pain as the result of tissue inflammation and destruction by inflammatory cytokines such as IL-1β, IL-6 and TNF-α. [44] We identified different clinical signatures in individuals with CHIKV (Fig 2 and Table 3). For instance, cluster 2 only presented 11/26 (42.3%) symptoms with probability above 50%, while cluster 7 exhibited 24/26 (92.3%) symptoms. Furthermore, sexual dimorphism is not as differentiable in clusters 2, 5 and 6, while there was a distinctive dimorphism in clusters 3, 4 and 7 (Fig 2 and Table 3). Although we demonstrate distinct CHIKV phenotypic response subpopulations, there are different factors that have shown to modulate human response to arbovirus infection and hence be responsible for the clinical symptomatology differentiation presented herein. Some of these factors include (i) vector competence, that is, the ability of the vector to acquire the virus and successfully transmit it to a susceptible host; [45] (ii) the variety of the viral strain, in which genetic studies have shown that CHIKV has evolved into three distinct genotypes-west African, East/Central/South African, and Asian; [46] (iii) environmental conditions such as ecological factors, global population growth, urbanization, lack of mosquito control measures and decay in public health; [23] and (iv) population genetics, that is, human individual differences such as genetic makeup and single nucleotide polymorphisms (SNPs), mostly associated with immunological pathways. [23] Indeed, genetic markers rs179010, rs5741880 and rs3853839 in the TLR-7 gene, and rs3764879 in the TLR-8 gene have been associated with increased CHIKV infection susceptibility. [47] These SNPs were also found to be associated with enhanced susceptibility of patients infected with CHIKV to developing fever, joint pain, and rashes. [47] As our study population has a strong African admixture, [48][49][50] [51] future studies should focus on determining the contribution of these SNPs in this understudied population. Furthermore, considering that CHIKV originated in the African continent, genetic ancestry in our population could help to elucidate its influence on host response. This will be crucial for developing monitoring and diagnostic tools as well as more accurate diagnosis and treatment options, and for outlining public health policies.
According to the epidemiologic surveillance office, the east and northeast areas of Barranquilla and its metropolitan area (Fig 1A) are the most severely impacted during arthropod virus outbreaks, such as dengue, an endemic disease in the city. Although in the present study we only included clinically confirmed cases of CHIKV infection, all cases were defined according to the criteria described by the Guidelines of the Colombian National Institute of Health, [27] and no other outbreak related to other arboviruses in Barranquilla during the study period was reported. In this regard, the CNIH reported 104,389 clinically confirmed cases and 1410 laboratory confirmed cases in 2014, [52,53] with estimates of asymptomatic CHIKV infection varying around~3-25%. [54] Nevertheless, unlike other arboviruses, the quantity of asymptomatic CHIKV infection patients is low. In addition, experience of the surveillance of arboviruses in Colombia is robust and has generated data of adequate reliability.
Co-occurrence of two or more symptoms may influence symptom recognition and harmfully affect individuals' quality of life, negatively impacting individuals' health condition, and accelerating the timing of clinical treatment. Finding that two or more symptoms co-occur or happen in particular combinations in individuals infected with CHIKV during the major outbreak in the Colombian Caribbean region may potentially help to identify specific symptom patterns more clearly in CHIKV infection and facilitate patient management. Future studies assessing the contribution of demographic, immunological and genetic factors to symptom co-occurrence, and correlating this symptomatology with viral strain and/or CHIKV genetic variation could shed some light on the severity of the clinical symptomatology and, ultimately, lead to more accurate, more efficient and differential diagnosis. It will be fascinating to revisit this cohort as new advances in diagnostic and next generation sequencing techniques become more accessible.
Despite our new findings, some limitations of our study must be acknowledged. First, our maximally expanded sample was comprised of 1161 individuals diagnosed with CHIKV infection based on clinical symptomatology. Of note, individuals were recruited during the acute phase of the CHIKV infection. This diagnosis was performed following a clinical protocol developed by the Colombian National Institute of Health based on a similar protocol previously applied to diagnosis of Dengue virus infection (S1 Table). The chikungunya outbreak occurred in the Colombian Caribbean back in 2014/15, which also impacted the Central/ Andean region of the country. This protocol was strictly followed by the epidemiological team of the Health Secretary of Barranquilla to survey all suspected CHIKV cases referred to the surveillance program by health practitioners from the city of Barranquilla. To confirm the feasibility of the clinical protocol, and to detect the presence of CHIKV, we randomly selected 300 (25.9%) of our CHIKV cases and conducted laboratory testing using RT-PCR. Interestingly, we found concordance between the clinical-symptomatology-based CHIKV infection diagnosis and that provided by the laboratory test. All in all, this suggests that, despite not having laboratory tested all individuals in our sample, the clinical diagnosis, when applied responsibly, is a reliable and easy-to-use method of providing accurate diagnosis of CHIKV infection in developing and Latin American countries, where budget and time constraints are increasing, or during outbreak episodes such as the 2014 CHIKV outbreak in the Colombian Caribbean region.
A second limitation of our study is the lack of outcome data, such as vital signs, duration of symptoms and disease evolution. Because of this, we were unable to establish correlations between such variables and the profiles of phenotypic response exhibited by individuals with CHIKV infection (Fig 3 and Fig 4). Although we identified that the number of clinical symptoms is associated with the cluster CHIKV infected individuals belong to (S1 Fig), more research studies are needed, especially those of a longitudinal nature, to better understand long-term outcomes in individuals with CHIKV infection exhibiting such patterns of phenotypic response to the virus. We argue that these studies will greatly benefit from our findings.
In fact, such studies could eventually use the differences between males and females in terms of clinical symptomatology (Fig 2 and Table 2), as well as the subpopulations we identified (Figs 3 and 4, and Table 3), to identify at-risk populations or use multi-omics approaches to detangle the genetics, epigenetics and proteomics underpinning distinct complex phenotypic (extreme) responses in these individuals (i.e., clusters 1 and 7; Figs 3 and 4).
In summary, our study provides, for the first time, strong evidence supporting the existence of different subpopulations among individuals infected with CHIKV, including important differences in patterns of phenotypic response between males and females, which highlights the importance of developing gender-specific approaches to treating this infection. In low-, midincome and Latin American countries, the use of these characterizations could enhance the development, validation and implementation of alternative ways of diagnosing CHIKV infection based on clinical symptomatology machine learning (ML) algorithms, to accurately identify such patterns in a timely fashion in the clinical setting, [55] in order to provide personalized treatment. [56] ML algorithms have recently proven to be effective, accurate and easy to use in CHIKV, [57] and other infectious diseases. [58] Preliminary results of the implementation of ML algorithms using our set of patients yielded a maximum correct classification rate (i.e., accuracy) of 91.9% (S2 Table).

S1 Fig. Beanplots for the number of symptoms in individuals diagnosed with CHICKV in
Barranquilla, Colombia, by LCCA cluster. ANOVA analysis shows that the number of symptoms differs by cluster (F 6,1151 = 744.5, P<0.00001). In particular, individuals in clusters 1 and 7 differ substantially. As mentioned in the Main text, these clusters are of special interest as represent extreme clinical profiles (i.e., phenotypic expression/symptomatology). (TIF) S1  Table. Accuracy of different Machine Learning algorithms to predict differential phenotypic response (i.e., cluster) in individuals with CHIKV infection using clinical symptomatology. Algorithms were implemented in R. Here, the accuracy corresponds percentage of individuals correctly classified in the testing data set (n = 344). (XLSX)