Cluster analysis to estimate the risk of preeclampsia in the high-risk Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) study

Objectives Preeclampsia is divided into early-onset (delivery before 34 weeks of gestation) and late-onset (delivery at or after 34 weeks) subtypes, which may rise from different etiopathogenic backgrounds. Early-onset disease is associated with placental dysfunction. Late-onset disease develops predominantly due to metabolic disturbances, obesity, diabetes, lipid dysfunction, and inflammation, which affect endothelial function. Our aim was to use cluster analysis to investigate clinical factors predicting the onset and severity of preeclampsia in a cohort of women with known clinical risk factors. Methods We recruited 903 pregnant women with risk factors for preeclampsia at gestational weeks 12+0–13+6. Each individual outcome diagnosis was independently verified from medical records. We applied a Bayesian clustering algorithm to classify the study participants to clusters based on their particular risk factor combination. For each cluster, we computed the risk ratio of each disease outcome, relative to the risk in the general population. Results The risk of preeclampsia increased exponentially with respect to the number of risk factors. Our analysis revealed 25 number of clusters. Preeclampsia in a previous pregnancy (n = 138) increased the risk of preeclampsia 8.1 fold (95% confidence interval (CI) 5.7–11.2) compared to a general population of pregnant women. Having a small for gestational age infant (n = 57) in a previous pregnancy increased the risk of early-onset preeclampsia 17.5 fold (95%CI 2.1–60.5). Cluster of those two risk factors together (n = 21) increased the risk of severe preeclampsia to 23.8-fold (95%CI 5.1–60.6), intermediate onset (delivery between 34+0–36+6 weeks of gestation) to 25.1-fold (95%CI 3.1–79.9) and preterm preeclampsia (delivery before 37+0 weeks of gestation) to 16.4-fold (95%CI 2.0–52.4). Body mass index over 30 kg/m2 (n = 228) as a sole risk factor increased the risk of preeclampsia to 2.1-fold (95%CI 1.1–3.6). Together with preeclampsia in an earlier pregnancy the risk increased to 11.4 (95%CI 4.5–20.9). Chronic hypertension (n = 60) increased the risk of preeclampsia 5.3-fold (95%CI 2.4–9.8), of severe preeclampsia 22.2-fold (95%CI 9.9–41.0), and risk of early-onset preeclampsia 16.7-fold (95%CI 2.0–57.6). If a woman had chronic hypertension combined with obesity, gestational diabetes and earlier preeclampsia, the risk of term preeclampsia increased 4.8-fold (95%CI 0.1–21.7). Women with type 1 diabetes mellitus had a high risk of all subgroups of preeclampsia. Conclusion The risk of preeclampsia increases exponentially with respect to the number of risk factors. Early-onset preeclampsia and severe preeclampsia have different risk profile from term preeclampsia.


Introduction
Preeclampsia affects 3% [1] of all pregnancies. It is a systemic disease with a multifactorial background. Preeclampsia is diagnosed when a pregnant woman develops hypertension and proteinuria after 20 weeks of gestation. Recent recommendation by the American College of Obstetricians and Gynecologists [2] further proposes that in the absence of proteinuria, preeclampsia could be diagnosed when newly diagnosed hypertension occurs in association with thrombocytopenia, impaired liver function, new development of renal insufficiency, pulmonary edema, or new-onset cerebral or visual disturbances.
Preeclampsia can be life-threatening both to the mother and the unborn child. Due to its multifactorial etiology, the outcome and progression of preeclampsia is challenging to predict. The role of the placenta is important in the pathogenesis, and the clinical findings are a consequence of endothelial dysfunction [3]. The disease can be roughly divided into early-onset and late-onset subtypes which evidently rise from different etiopathogenic backgrounds [4]. Earlyonset disease is associated with placental dysfunction. It is often accompanied by intrauterine growth restriction, the risk runs in families and women with history of early-onset preeclampsia have an increased risk of cardiovascular disease later in life [5]. Late-onset preeclampsia develops due to metabolic disturbances, obesity, diabetes, lipid dysfunction, and inflammation [6] all of which affect endothelial function. Placental hypoplasia and vascular lesions frequently present in early-onset disease are often absent in the late-onset disease [7].
Using vasoactive markers, placental growth factor (PlGF) and soluble vasoactive endothelial growth factor reseptor-1 (s-Flt1), preeclampsia can be divided into angiogenic and non-angiogenetic subgroups [8]. Early-onset and severe preeclampsia seem to form predominantly the angiogenetic subtype and non-angiogenic subtype has milder and later course of the disease [9].
Recently encouraging progress has been achieved in the field of preeclampsia prevention [10]. Meta-analyses show that low-dose aspirin initiated in the early precnancy in high-risk women may prevent early and severe preeclampsia or at least delay the onset of the disease [11,12]. However, preeclampsia is a heterogenic disorder. There are great challenges in predicting who will eventually develop severe or early-onset disease and, accordingly, benefit most from the use of aspirin.
The present study explores the features of this high-risk cohort.
The aim was to explore predicting factors affecting the onset and severity of preeclampsia. We used cluster analysis in a prospectively collected cohort of women with known clinical risk factors for preeclampsia. We will discuss the findings in the context of current recommendations for initiating aspirin to prevent preeclampsia in high-risk women.

The Predo Project
The study cohort consists of women recruited in the multidisciplinary Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) Project between September 2005 and December 2009 [13,14]. The project has three arms: obstetric [15](including an aspirin trial [13]), genetic [16], and psychological [17]. We recruited 972 pregnant women with risk factors for preeclampsia and 110 randomly selected pregnant women without known risk factors as a comparison group at 12 +0 to 13 +6 weeks and days of gestation. The recruitment took place when the women attended the first ultrasound screening in one of the ten hospital maternity clinics participating in the PREDO Project; Women´s Hospital, Kätilöopisto Maternity hospital and Jorvi Hospital at Helsinki University Central Hospital, Hyvinkää Hospital, Kanta-Häme Central Hospital, Päijät-Häme Central Hospital, Tampere University Hospital, Kuopio University Hospital, Northern Karelia Central Hospital, and Iisalmi Hospital. The study protocol has been approved by the Ethics Committee of the Helsinki and Uusimaa Hospital District and by the participating hospitals. A written informed consent was obtained from all participants.
Inclusion criteria and definitions. The inclusion and exclusion criteria for the PREDO Project are presented in Table 1. Women with one or more of the risk factors for preeclampsia were invited to participate in the order of arrival unless any of the exclusion criteria were present. We performed uterine artery blood flow measurements by Doppler ultrasound for all participants at 12 +0 to 13 +6 weeks of gestation. Women who had bilateral second-degree notch were allocated to the medication group. These women (n = 152) were randomized to receive aspirin in low dose (100 mg/day) or placebo until 35 +0 weeks of gestation, or delivery. Women with risk factors but no aforementioned ultrasound finding were allocated in the follow-up groups. We also recruited 110 women without known risk factors as a control group. All study participants filled a questionnaire concerning their health. Pregnancy data were collected from the medical records of maternity clinics and hospitals.

Participants in the cluster analysis
Participants with risk factors for preeclampsia, 903 women, were included in this cluster analysis. Women who were randomized to receive the low-dose aspirin (n = 69) [13]were excluded.
The clinical characteristics of the study groups are presented in Table 2.
Each individual outcome diagnosis was set by a jury, which consisted of two physicians and a study nurse. They met face-to-face and reviewed the hospital and maternity clinic records of each participant.

Methods of the cluster analysis
We applied a Bayesian clustering algorithm based on mixtures of binary variables, see, e.g. [20], using an implementation available in the Bayesian Analysis of Population Structure (BAPS) software [21] to classify the study participants on the basis of their risk factors. The algorithm detected 25 clusters, corresponding to different risk factor combinations. For each cluster detected, we computed the risk ratio of each disease outcome, relative to the risk in the general population. The significance and confidence intervals of the risks in the different clusters were computed using the exact binomial test (function binom.test in the R software). The false discovery rates (FDR) were computed using function p.adjust in R.
The risk of preeclampsia and its subtypes in the general Finnish population for these outcomes were estimated according to data from the National Institute for Health and Welfare registers from the year 2013; Medical Birth Register and Care Register for Health Care; preeclampsia 2.5% (from these 24% severe, 8% early-onset, 15% intermediate, 77% term, 23% preterm) with frequences obtained by request from the register authorities and gestational hypertension 4.4%, small for gestational age (SGA) 2.3%, gestational diabetes 9%.

Results
Of the 903 women 86 (9.5%) developed preeclampsia. Of those with preeclampsia 10 (11.6%) had early-onset disease and 36 (41.9%) severe disease. 465 women (51.5%) did not meet any of the primary or secondary outcome criteria whereas 438 (48.5%) had one or more of these pregnancy complications (Table 3).
In the control group of 110 women without risk factors two developed preeclampsia; one of them had severe and the other non-severe late-onset preeclampsia. Eighty-seven percent of women in the healthy control group did not meet any of our primary or secondary outcome criteria.
Systemic lupus erythematosus was inclusion criterion for four women and Sjögren's syndrome for 13 women. None developed preeclampsia. Women with preeclampsia had significantly more often induced labour or caesarean sections, and their new-borns had lower Apgar scores ( Table 4).

Results of the cluster analyses
Heat map of risk relative to general population. The heat map (Fig 1) shows the risk of developing preeclampsia in the high-risk group including severe, early-onset, term, preterm and intermediate preeclampsia as compared to the risk of developing the same outcome in a general population of pregnant women. The risk ratios of the 19 most frequent clusters are presented in Table 5. The risk ratio table including all clusters is presented in the supplement material in S1 Table. The coloured cells in the heatmap signal significant outcomes according to nominal p<0.05 level. After accounting for the number of tests the results correspond to false discovery rate FDR = 0.14 considered appropriate in this exploratory analysis. A similar figure, where significance was determined using the Bonferroni correction over the 25 clusters times 10 outcomes (250 conditions), is shown as Supplementary S1 Fig. Women with preeclampsia in a previous pregnancy, chronic hypertension, a small for gestational age new-born, or type 1 diabetes mellitus were at high risk of early-onset, severe, preterm and intermediate preeclampsia.
Preeclampsia in a previous pregnancy and obesity (body mass index (BMI) over 30 kg/m 2) were the most important single risk factors for term preeclampsia. Obese women had increased risk of any (OR 2.1, 95%CI 1.1-3.6), term (OR 2.3, 95%CI 1.1-4.2) and severe (5.2, 95%CI 2.1-10.5) preeclampsia. No association with preterm or early-onset preeclampsia was observed. Obesity combined with other risk factors, chronic hypertension, and type 1 diabetes mellitus further increased the risk of preeclampsia. Type 1 diabetes mellitus without other risk factors increased the risk of preterm, but not term preeclampsia. Age over 40 years or under 20 years in a healthy woman without other risk factors did not increase the risk of preeclampsia in our cohort.
The risk of preeclampsia increased exponentially (linearly on the logarithmic scale) with increase to the number of risk factors (Fig 2). In Table 6 we present some clusters as an example. . The size (i.e. area) of the black box illustrates the proportion of women in the particular cluster with the risk factor in question. Right side of the heatmap presents the risk ratios of the outcomes. The colour of the cell represents the estimated risk ratio of the corresponding outcome in the corresponding cluster, and the colour encoding is shown on the right side of the heatmap. Those cells are colored which are significant at the nominal 5% level (see text for discussion). The exact risk ratios are presented in Table 5. https://doi.org/10.1371/journal.pone.0174399.g001

Discussion
In this study the risk of preeclampsia increased exponentially as the number of risk factors increased. Women who developed preterm or severe preeclampsia had a different risk profile than those who developed term preeclampsia. Previous preeclampsia, chronic hypertension, and type 1 diabetes mellitus were strong risk factors for severe and preterm preeclampsia in our cohort. SGA newborn in an earlier pregnancy was a strong risk factor for early-onset preeclampsia. Obesity increased the risk of term preeclampsia and severe preeclampsia. Neither age below 20 years nor age over 40 years, gestational diabetes or fetal demise as a sole risk factors predicted increased risk.
Within this research frame the incidence of preeclampsia in a general population is of particular importance. Incidence of preeclampsia in literature is estimated as 3% [1]. For this work we wanted to determine incidence, which would be accurate in the Finnish population. Table 5. Risk ratios and 95% Confidence Intervals (CIs) of the cluster analysis. The marking C1 etc. is referring to certain row in the heatmap. If the risk factor is inside brackets, only a portion of women in the cluster had that risk factor.

Risk factors and clusters Pre-eclampsia (total)
Severe  According to combined data, from the National Medical Birth Registry and Care Register (National Institute for Health and Welfare) 2.5% of babies were born from preeclamptic pregnancies in 2013. Similar rates have been reported in other Nordic countries: according to the Danish hospital discharge registry 2.72% of women who gave birth in Northern Sealand in years 1998-2000 had preeclampsia [22]. In the medical birth registry data of Norway, covering 1 million births in years 1986-2005, preeclampsia rate was 3.7% [23]. While no validation studies exist for preeclampsia diagnoses in the Finnish healthcare registers, studies of the Norwegian [24] and Danish [22] registries have been validated against medical records according to criteria similar to those used in the present study. The prevalence of preeclampsia in the study populations was relatively similar whether obtained from registry or whether based on medical records: 4.4% and 3.8%, respectively, in the Norwegian Study and 2.7% and 2.9% in the Danish study. While some individuals are misclassified according to registry diagnoses, as indicated by the 99% specificity and 69% [22] and 43% [22] sensitivity, the similar prevalences indicate that the registry estimate of prevalence used here as reference is adequately accurate. The strength of our study is the prospectively recruited, well-characterised cohort of women with increased risk of preeclampsia. However, we had limited information of the onset and severity of preeclampsia in the earlier pregnancies of our participants, and whether preeclampsia had occurred in one or more previous pregnancies. Therefore the effect of those risk factors is impossible to evaluate. According to earlier studies, women with a history of earlyonset preeclampsia have higher risk of recurrent preeclampsia than women with late-onset disease [25,26]. Moreover, women with previous preterm preeclampsia have increased risk of adverse pregnancy outcome in their second pregnancy even in the absence of preeclampsia [27].
In this study a history of preeclampsia was a strong risk factor for recurrent preeclampsia. Duckitt and Harrington reported a seven-fold increase in incidence of preeclampsia in women who had preeclampsia in an earlier pregnancy compared to women without such a history [25]. In our cohort, history of giving birth to a SGA new-born in a previous pregnancy seemed to increase the risk of early-onset, preterm and severe preeclampsia in subsequent pregnancies. Conversely, having a preterm preeclampsia in an earlier pregnancy has shown to be associated with risk of giving birth to a SGA newborn in a later non preeclamptic pregnancy [28]. This reflects the common placenta derived pathogenesis of these pregnancy outcomes. Preeclampsia and SGA are different entities of the same placental disease. Ness and Sibai [29] hypothesized that shallow placentation is preceded by endothelial dysfunction in both preeclamptic Table 6. Important risk factors and clusters and number of women who developed preeclampsia in each group.

Risk factors
Number of women with the risk factor (s)

Number of women who developed preeclampsia (%)
Preeclampsia in a previous pregnancy 138 27 ( Cluster analysis to estimate the risk of preeclampsia and SGA pregnancies. If shallow placentation interacts with maternal metabolic disturbances preeclampsia develops, if metabolic disturbances are absent, SGA develops. Consistent with earlier studies, we found that chronic hypertension is a stronger predictor of early-onset and severe preeclampsia [1]. Chronic hypertension may predispose to placental vascular insufficiency associated with early-onset preeclampsia. Obesity increases the risk of preeclampsia. In our cohort obese women had increased risk of any, term, and severe preeclampsia. No association with preterm preeclampsia was observed. For example in a population based retrospective study with over 854 000 singleton live births the risk ratio for preeclampsia was 2.9 for all women with BMI over 30 kg/m 2 and the risk increased progressively with the increase in BMI [1]. In our cohort, if a woman had other risk factors together with obesity the risk increased. Obesity together with history of preeclampsia in an earlier pregnancy increased the risk of intermediate and term preeclampsia, but not early-onset preeclampsia.
Women in the extremes of fertile age are thought to be at increased risk of developing preeclampsia [1]. This was not seen in our cohort. Women below 20 years and over 40 years of age did not have increased risk of developing preeclampsia, unless observed together with other risk factors. Women in extremely advanced maternal age, 45 years or over may be in greater risk of preeclampsia spectrum complications [30].
Women with connective tissue diseases, systemic lupus erythematous and Sjögren's syndrome, are commonly considered at high risk for developing preeclampsia [31]. However, in our cohort none included with these risk factors developed preeclampsia. This may be due to the small sample size. The heterogeneity of these complex diseases may also play a role. It may be that only some subgroups of women, for example those with lupus anticoagulant or renal insufficiency, have an increased risk.
Recent advances in the prevention of preeclampsia in high-risk women have emphasized the identification of these women and the role of individual risk assessment. There is no consensus on the level of risk which would warrant prevention with low dose aspirin and there are only few recommendations concerning the prevention. These recommendations are based on the assumption that low dose aspirin should prevent early-onset and severe preeclampsia more efficiently than late-onset, non-severe disease [12,32]. Society of Obstetric Medicine of Australia and New Zealand [33] recommends low dose aspirin for women at increased risk. The Society of Obstetricians and Gynecologists of Canada [34] recommends low dose aspirin for women at highest risk, women with a history of early-onset preeclampsia and women with recurrent preeclampsia. The recommendations by World Health Organization (WHO) [35], National Institute of Health and Clinical Excellence (NICE) [36], American College of Obstetricians and Gynecologists (ACOG) [2] and U.S. Preventive Services Task Force (USPSTF) [35], provide more detailed guidelines. In July 2016 ACOG released an updated version of recommendations stating that they support the broader list of risk factors associated with high risk by USPSTF [37]. Each of the WHO and NICE and USPSTF recommendations identify previous preeclampsia, diabetes, chronic hypertension, renal disease, and autoimmune disease as factors associated with risk of pre-eclampsia. USPSTF includes also women who have had preeclampsia accompanied by adverse outcome. In the NICE guidelines all hypertensive diseases, including pregnancy hypertension in a previous pregnancy, are evaluated as factors with high risk. In WHO and USPSTF recommendation multiple pregnancy is included as a factor with high risk. The NICE guidelines and USPSTF determine, as well, moderate risk factors and recommends to consider low dose aspirin for women with more than one of these moderate risk factors: first pregnancy, age 35 (in USPSTF) or 40 (in NICE) years or older, pregnancy interval of more than ten years, BMI of 35kg/m 2 or more, family history of preeclampsia or multiple pregnancy. Accumulation of risk factors increasing the risk of preeclampsia is corroborated by our cluster analysis. For example age over 40 as a sole factor in a healthy participant did not increase the risk of preeclampsia in our cohort, but obese women over forty years of age had an increased risk of preterm preeclampsia. The USPSTF recommendation also includes sociodemographic characteristics in the moderate risk factors. The USPSTF recommendation, unlike others, includes the history of a SGA infant as a moderate risk factor. In our cohort, history of giving birth to a SGA baby in a previous pregnancy seems to increase the risk of early, preterm and severe preeclampsia significantly in subsequent pregnancies.
Gestational diabetes in an earlier pregnancy, obesity or advanced age as sole risk factors does not seem to increase the risk of early-onset or severe preeclampsia. Commercial prediction programs, utilizing placental growth factor, other biomarkers and biophysical measurements are under development. Currently the clinical risk factors are important in estimating the risk of preeclampsia in early pregnancy and consequently the need for preventive low dose aspirin. Based on this cluster analysis we would recommend low dose aspirin for those women who have had preeclampsia or a SGA infant in an earlier pregnancy, women with chronic hypertension and women with type 1 diabetes mellitus if present as a sole risk factor or together with other factors. Clustering of risk factors increases the risk significantly.
We conclude that early-onset and severe preeclampsia has different risk profile from the late-onset preeclampsia. Our analysis indicates exponential increase in the risk of preeclampsia as the number of risk factors increased.
Supporting information S1 Table. The risk ratios and 95%CI of the least frequent clusters. The marking C20 etc. is referring to certain row in the heatmap. If the risk factor is inside brackets, only a portion of women in the cluster had that risk factor. (PDF) ). The size (i.e. area) of the black box illustrates the proportion of women in the particular cluster with the risk factor in question. Right side of the heatmap presents the risk ratios of the outcomes, in this heat map significance is determined using Bonferroni correction over the 25 clusters times 10 outcomes (250 conditions). The colour of the cell represents the estimated risk ratio of the corresponding outcome in the corresponding cluster, and the colour encoding is shown on the right side of the heatmap. Those cells are coloured which are significant at the nominal 5% level. (TIFF)