Raman spectroscopy on blood serum samples of patients with end-stage liver disease

Raman spectroscopy has shown to be a promising method for the examination of biomedical samples. However, until now, its efficacy has not been established in clinical diagnostics. In this study, Raman spectroscopy’s potential application in medical laboratories is evaluated for a large variety (38) of biomarkers. Given 234 serum samples from a cohort of patients with different stages of liver disease, we performed Raman spectroscopy at 780nm excitation wavelength. The Raman spectra were analyzed in combination with the results of routine diagnostics using specifically developed complex mathematical algorithms, including fluorescence filtering, frequency subset selection and several overfitting circumventing strategies, such as independent validation. With the results of this cohort, which were validated in 328 independent samples, a significant proof-of-concept study was completed. This study highlights the need to prevent overfitting and to use independent data for validation. The results reveal that Raman spectroscopy has high potential for use in medical laboratory diagnostics to simultaneously quantify multiple biomarkers.

To diagnose, choose adequate therapy, and monitor the course of severe diseases, most 2 patients require laboratory diagnostics. Individual laboratory tests are ordered 3 specifically for the medical indication (targeted diagnostics). However, many different 4 biomarkers are available, and selecting the appropriate diagnostic parameter can be 5 challenging. A non-targeted diagnostic approach that could provide information on 6 various relevant clinical conditions simultaneously could reduce the risk of missed 7 diagnostics and delayed medical consequences. 8 Raman spectroscopy (RS) is a method capable of analyzing complex molecular 9 compositions. It was first predicted by Adolf Schmelka in 1923 [1] and was first 10 outside the clinical context. Airport security controls, for example, use RS, as it reveals 22 even small traces of hazardous substances [5]. Another common application is in 23 evaluating paintings' originality [6]. Several diagnostic applications have been 24 reported: [7], [8], also in the context of cancer: [9], [10], [11], [12], and in obesity-related 25 diseases: [13], [14], [15]. Other studies deal with specific applications using blood: 26 [16], [17], [18], [19], or molecular imaging: [20], [21]. Theoretically, due to the unique 27 signatures of substances, countless biomarkers and metabolites could be analyzed in 28 parallel within about one minute, enabling a fast medical response. Currently, 29 laboratory diagnostics is based on individual chemical detection reactions, each 30 requiring sample volume, reagents, and complex infrastructure. In contrast, a 31 multi-target, reagent-free method could provide a complex clinical picture of the patient 32 in a single step. 33 The aim of our study is to evaluate the applicability of RS as diagnostic tool for 34 clinical samples in comparison to the results of routine medical laboratory diagnostics. 35 Due to its highly complex composition, blood analysis by means of RS is rather analysis of whole blood with RS may be extremely challenging. To reduce the analytical 40 complexity, we used blood serum, which contains only cell-free components of the blood 41 after the consumption of coagulation factors.
The clinical parameters of the patients are displayed in Table 1. The glomerular 48 filtration rate (GFR) is a measure of renal function that includes creatinine, gender, and 49 age. It was estimated using the CKD-EPI creatinine equation [23]. 50 In our cohort, we focused especially on laboratory parameters for liver disease. 51 Consequently, we included bilirubin, which is a routinely used detoxification parameter, 52 and albumin, prealbumin, and cholinesterase as liver synthesis parameters. Furthermore, 53 we included transaminases (ALAT, ASAT), alkaline phosphatase (AP), and scores were calculated from the serum creatinine, serum bilirubin and prothrombin time 59 values according to the applicable specifications in Europe [24]. The MELD-Na score detectable in serum: glycated hemoglobin (hemoglobin A1c, HbA1c), blood cells 69 (platelets, white blood cells), and international normalized ratio (INR)). Although 70 serum samples should provide a promising starting point for determining the 71 correlations between RS spectra and laboratory results, they posed some challenges.

72
Specifically, the fact that biomarkers are naturally in solution poses a major inherent 73 technical challenge -most characteristics found in the spectra of dry counterparts of 74 these substances simply flatten out (see Figure 1).

75
Information is still contained in the spectra, although it is not as obvious as in the 76 spectra of dry substances. The characteristic Raman peaks are not apparent in the raw 77 spectra, so a crucial step in data analysis is background filtering. Due to the challenging 78 signal-to-noise ratio, many standard methods, such as principle component analysis 79 (PCA), are not immediately applicable.

80
In the first part of our analysis, we investigate the general information content in the 81 spectra. The main part of the analysis explores the quality of the data in the context of 82 medical diagnostics, namely the correlation of the spectra with the lab results collected 83 for the same patients. 84 We collected 234 spectra and reached correlations between 0.7 and 0.91 for 19 of the 85 38 available markers (see Table 2). The correlation values could be easily increased by 86 increasing the number of considered frequencies and a simple weight optimization, but 87 investigations revealed a strong tendency of overfitting. The reported correlations are 88 based on several mechanisms we introduced to counterbalance overfitting.

90
Experimental setup 91 Raman spectroscopy We used the Thermo Fisher Scientific DXR SmartRaman 92 Spectrometer (Waltham, Massachusetts, USA) with an excitation wavelength of 780 nm 93 for data collection. Outgoing Raman radiation was detected with a cooled CCD sensor 94 (at −50 • C) in the range from 50 to 3500 cm −1 . After a period of thorough testing and 95 comparing various settings, we finally chose the settings listed in Table 1 in the 96 supplementary material.

97
A crucial parameter is the duration and the repetition of the measurements taken by 98 the device. Obviously, a higher number of repetitions would be better than a lower 99 number for statistics and, most importantly, for noise cancellation. However, substantial 100 photo-bleaching was observed, especially in range between 1500 and 3000 cm −1 .

101
The device's laser can be focused on different heights. We chose a height such that 102 the background created by the carrier was minimal. The focus was slightly above the 103 level of the coverslips.

104
Throughout the entire analysis, the dominant background was of obvious concern.  (dm-drogerie markt GmbH + Co. KG, Karlsruhe, Germany). Unless specifically stated, 125 dilutions were carried out with deionized water.

126
All methods were carried out in accordance with relevant guidelines and regulations.

127
Evaluation methods

128
The the spectra were evaluated in two stages. The first consisted of a number of general 129 tests evaluating the application of RS to serum samples. In the second, the correlation 130 of RS with known parameters was analyzed. The former gives a general idea of  Background filtering Tests were performed on several datasets: first the raw data, 135 then the background, and last on the background filtered spectra. Background filtering 136 is an essential step to reveal the information content of the spectra. It is clear that the 137 background is dominated by fluorescence, which may itself contain valuable information 138 and will be subject to future investigations. We applied a common asymmetric least 139 square smoothing algorithm [26], [27] that fits a curve to the bottom of fluctuating data. 140 The algorithm can be tuned to degree to which it will follow local fluctuations.

141
These parameters may be considered the stiffness or flexibility of the smoothing curves. 142 Examples for two different settings of the regulating parameters are shown in Figure 2. 143 Correlation and significance Correlation is the central measure of our analysis.

144
The Pearson-correlation coefficient is a widespread measure for identifying common 145 patterns in data arrays [28].

146
Correlations are calculated for spectral vectors. A spectral vector S f represents a 147 certain frequency f and has the size of the number of patients for whom both marker 148 and spectrum are available. The spectra may be considered a n × k matrix S, where n 149 is the number of spectra and k the number of frequencies contained in each spectrum. 150 The vector S f is then the column of matrix S, which represents frequency f (depicted 151 in red in Figure 1 of the supplementary material.

152
The starting point of our analysis is the pairwise correlation of a spectral vector S f 153 with a marker vector L m , both of dimension n. Marker m can be any value measured in 154 the medical laboratory for a given patient (e.g. the creatinine level or mortality rate).  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 8, 2021. ; https://doi.org/10.1101/2021.06.06.21258433 doi: medRxiv preprint spectra and a weight vector ω. Thus, it can be written as follows: Keeping the number of weights at non-zero low was found to be necessary to avoid overfitting. Additionally, we calculate a z-value as an estimator of the significance of the correlation c m : where µ is the mean and σ the standard deviation of a reference random distribution.

161
The larger z is, the lower the chance of obtaining a given correlation by chance. For 162 every combination of the number of non-zero weights n ω f =0 and the number of spectra 163 n spectra , a reference distribution was calculated. For each n ω f =0 and n spectra , we 164 created 100 random marker vectors L m . The corresponding correlations were calculated 165 according to the protocol for the markers. 166

167
General analysis of spectra 168 As a first test, we investigated the impact of substances being measured in solution 169 rather than being measured dry. We collected spectra both for dry samples and in 170 solution for ascorbic acid and for creatinine ( Figure 1). These revealed a systematic and 171 dramatic issue: all Raman peaks of the solubilized substances vanished completely.

172
Thus, the extraction of relevant information would face a poor signal-to-noise ratio.

173
Noise can generally be counterbalanced by increasing the sampling, in this case, the 174 number and duration of measurements. However, we observed rather strong 175 frequency-dependent photo bleaching. The effect increased with increased frequency.

176
This is reflected in most spectra; see also Figure 2 of the supplementary material.

177
As the next qualitative test, we compared the spectra of several pure substances in 178 solution. Figure 3 shows spectra for different single-substance solutions. In some cases, 179 differences in the form of specific peaks are visible to some degree. However, such 180 specific peaks are not visible in the spectra of the serum presented in Figure

192
The following tests were then performed on the actual serum samples. We collected 193 spectra for each patient six times from the same serum sample and two additional times 194 with 12.5 µl spiked in creatinine and fibrinogen. Figure 2 in the supplementary material 195 shows these for two respective patients. The spectra show a rather large offset in the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 8, 2021. creatinine. Ideally, each of these values could lead to a predictor, meaning that one 203 could predict the creatinine concentration for a patient from a given serum spectrum, 204 for instance. It should be noted that in this study, we investigate RS's potential to be 205 used as diagnostic tool. The actual optimization of a predictor is not the goal. The 206 distinct correlations between predicted values and laboratory measured values found in 207 this study show clearly that it is possible to derive such predictors. The actual fitting of 208 predictors should, however, be grounded in larger cohorts. One of the main reasons for 209 this is the danger of overfitting, which would level out with larger cohorts.

210
Pairwise correlations We use the correlation between a marker and a single 211 spectral vector for a given frequency as a first quality estimate. The higher the 212 correlation, the less the frequency reflects the pattern of that marker. In this way, one 213 should be able to identify regions in the spectrum that carry information concerning the 214 marker. Ideally, these frequencies belong to the hidden Raman peaks of the substances, 215 revealing themselves by the massive noise. Figure 5 shows the marker with the highest 216 pairwise correlations, bilirubin S. Even in this case, the maximum correlation is only 217 around 0.6, which is far from sufficient. Pairwise correlations alone are an insufficient 218 measure to reflect the markers, which is expected based on the vanishing of peaks when 219 substances were dissolved in solution.

220
Overfitting The most crucial step toward meaningful correlations was to use a linear 221 combination of spectral vectors of different frequencies, according to Formula (1). In 222 our first attempt, we developed an optimizer that switched individual weights from zero 223 to their pairwise correlations, starting from the highest pairwise correlation and 224 accepting any weight that would increase the correlation. Subsequently, we used a 225 Monte Carlo optimizer to fine-tune the weights toward maximum correlation, leading to 226 stellar results: For most markers, we obtained correlations of up to 0.95.

227
As negative control, we used random vectors instead of markers and performed the 228 same procedure. It was found that even a list of purely random numbers can easily be 229 correlated above 80% using this procedure, rendering the correlations with the markers 230 meaningless.  This implied that a significance value was needed (see Equation (2)). At the same 238 time, the number of weights ω f = 0 had to be limited to avoid overfitting. Furthermore, 239 we introduced a minimum value for the pairwise correlations. Frequencies with a 240 pairwise correlation below that minimum would not be considered.

241
Final results. Table 2   The table lists the Pearson correlation, the significance z and the number of spectra 246 (markers) that were available for that marker. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 8, 2021.
As an additional consistency check, we selected 1000x a random subset and then  Figure 5 shows the pairwise correlation of bilirubin with the spectra, in addition to 255 the selected frequencies, used as spectral vectors, leading to a correlation of 0.91.

256
Weights were not further optimized at this point, so their values reflect the pairwise 257 correlation. The threat of overfitting should have been sufficiently avoided, given the 258 obtained z-values. For comparison, when using nine spectral vectors, bilirubin reaches a 259 z-value of 12.94, and when using 100 spectral vectors, it would reach a z-value of ∼1,38 260 (assuming this would lead to the same correlation as with nine spectral vectors).

261
Reducing the number of spectral vectors (ω f = 0) to just nine is likely to be more 262 strict than necessary. The z-values obtained using 11 spectral vectors do not differ 263 substantially from those obtained using nine. We also performed no weight optimization 264 to avoid overfitting. Larger cohorts would allow for cross-validation, rendering such 265 measures unnecessary. Our goal was to obtain absolutely reliable correlation values.

266
They can be considered the lower threshold of the expected range; the actual 267 correlations will be larger.

269
In our first independent test, we collected spectra for different dilutions of an albumin 270 infusion solution (see Figure 4). These spectra were not included when deriving the 271 weights of the spectral vectors. 272 We used the weights derived from serum spectra to predict the pure albumin

279
Considering that this is a pilot study with a limited number of patients and that 280 substances in solution show only very weak Raman signals, the agreement between 281 defined and predicted concentrations is already convincing. In addition, the complex 282 compositions of sera are essentially different than pure substances. Thus, the accuracy 283 of the concentration prediction was a convincing confirmation of our approach. 284 It is noteworthy that the albumin concentrations that were used as validation are 285 well defined concentrations and thus are not as error prone, as it was the case for the 286 laboratory markers used to derive the weights. This agreement supports the correctness 287 of our derivation and, thus, the applicability of RS as diagnostic tool.

288
For the second test, we analyzed the sera of 11 patients who were not included in the 289 optimization of our method. Figure 6 shows the laboratory values against the calculated 290 values of bilirubin for this data set. A reasonable correlation was observable for most 291 highly correlating markers. However, one spectrum can be considered an outlier for 292 most investigated markers. Interestingly, this spectrum was identified as a problem case 293 from the beginning. The serum sample contained a visible residuum, probably from a 294 blood clot. These results further underline the reliability of our strategy. 295 Finally, we completed a large independent validation, using spectra of 328 specimen 296 taken from an evaluation study for liver transplantation in which retention samples were 297 collected for quality assurance. Markers were calculated from the spectra, utilizing the 298 June 4, 2021 7/18 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 8, 2021. ; https://doi.org/10.1101/2021.06.06.21258433 doi: medRxiv preprint predictive models derived in the main study. Table 2 lists in the 'Validation' column the 299 correlations between predicted markers and laboratory values, together with the number 300 of specimen for which laboratory values were available. Several markers reached 301 correlations comparable to the ones reported from the training in the 'Pearson' column, 302 which clearly proves the diagnostic value of RS. However, for other markers, the 303 predictions did not reach similar quality. The separation of well validated markers from 304 those that could not be validated may in turn aid analyzing the z-values that were 305 reported in Table 2. The z-values represent a statistical significance estimation for the 306 correlations that should be reflected by the validation in the sense that high z-values 307 should be associated with high correlations in both training and validation. In the case 308 of HBA1C, calcium, L-lactate dehydrogenase, alpha fetoprotein, HDL cholesterol and 309 creatinine the drop in validation correlations is nicely reflected by low z-scores. 310 However, it would be premature at the current state to define a threshold for the 311 z-scores that separates robust correlations from those that are not.

313
The results of the presented proof of concept study reveal a high potential for RS to be 314 used in medical laboratory diagnostics, although some settings have not yet been 315 systematically optimized. For example, the free parameters in background filtering are 316 currently only a rough approximation, although they are supported by a significant 317 improvement of the correlations.

318
The development of novel methods to analyze the spectra has great potential for 319 further improvement. However, a detailed theoretical analysis is outside the scope of 320 this paper. The application of standard methods of spectroscopy, such as PCA and 321 spectral deconvolution, would be premature at the current state. The high level of noise 322 renders the frequency selective linear model as presented here into being more reliable 323 and thus favorable. Furthermore, it is far from clear that the key assumption of PCA, 324 that a large variation in data corresponds to relevant information, is valid here. Spectral 325 deconvolution remains a longterm goal of this project, which will require both larger 326 cohorts and an exhaustive library of reference spectra of relevant markers in solution at 327 various concentrations.

328
Independent tests The predicted albumin concentrations of the spike experiment 329 are surprisingly accurate, especially when considering that a pure substance is 330 significantly different than the complex molecular mixture that comprises a serum.

331
In patients with end-stage liver disease, different metabolites and drugs that cannot 332 be detoxified are bound to albumin. The method routinely used in the clinical lab is 333 based on the reaction of albumin with bromocresol green to create a blue green dye, 334 which is measured photometrically [29]. We considered the results of this method to be 335 a gold-standard, aware that relevant influence factors limit the precision of the routine 336 methods, especially in low concentrations [30].

337
The independent validation study, based on 328 serum samples, clearly reveals the 338 potential of RS to be used as diagnostic tool in the future. It is noteworthy that the 339 spectra were recorded months later and that the specimen come from a different source. 340 The device used for the current study showed actually not only differences in the 341 background from measurement to measurement, there seems also to be a frequency shift 342 involved that impedes the prediction of markers from spectra. It is reasonable to 343 assume that improvements of these procedural limitations may allow more robust . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 8, 2021. ; Results of RS in comparison to clinical diagnostics Correlations were higher 346 for organic compounds with higher serum concentrations. The most highly concentrated 347 compounds were total protein, albumin, and bilirubin with serum reference 348 concentrations (RC) of 64-83 g/l, 35-52g/l, and <10mg/l, respectively. These 349 parameters seem to be valid measurable by means of RS. Bilirubin, which has an 350 intensive color and fluorescence activity, may also improve the correlation, due to still 351 suboptimal background filtering. Prealbumin (RC 0.2-0.4g/l) and AFP (RC < 0.007g/l) 352 with lower serum concentrations (0.2-0.4g/l and < 0.007g/l, respectively) were less valid 353 measurements.

354
Compared to liver function markers, the measurements of renal function markers 355 seem to be more sophisticated. Creatinine is the most routinely used biomarker for 356 renal function. Creatinine is a small organic molecule with a molecular mass that has a 357 reference concentration in serum of <104µmol/l ∼ 11.8 mg/l. Due to the small 358 molecular size, only a limited number of specific Raman peaks could be expected, which 359 may be an explanation for the rather low correlation between RS and lab diagnostics. gGT are rather small (i.e. ASAT 34µg/l for healthy individuals [31]) and, due to their 371 big molecular weight, molecular concentrations are extremely low. 372 We also found distinct correlations of RS with glycated hemoglobin (HBA1C) proportion of glycated albumin is known to correlate strongly with that of glycated 377 hemoglobin [32]. Furthermore, in patients with severe disease, the concentration of 378 many molecules will be significantly altered. Therefore, different biomarkers of liver 379 function will be strongly correlated in patients with end-stage liver diseases. Spectra 380 reflect the total composition of serum rather than individual molecules. This may 381 explain the possibility of predicting some of these biomarkers even if they were filtered 382 out beforehand.

383
Finally, we were interested in RS's ability to predict the prognosis of our patients 384 with end-stage liver disease. The most established prognostic laboratory score for these 385 patients is the MELD score. Our analysis showed especially good correlations with the 386 original MELD score without the incorporation of serum sodium concentration, as well 387 as promising results for predicting hepatic encephalopathy and 90-day outcome by RS. 388 Outlook Although the size of our cohort exceeds that of many other studies in this 389 field, the reliability of predictions would immediately benefit from a further increase in 390 cohort size. Given a larger sample size, systematic optimization procedures will most 391 likely improve the signal-to-noise ratio. More precisely, a crucial triangle on which to 392 shed further light is that of exposure time and its relation to photo bleaching versus 393 noise reduction. Different spectrometers will be compared to ensure that repeated 394 measurements under the same conditions lead to identical spectra.

395
For the subsequent data processing, the background filtering parameters also will 396 June 4, 2021 9/18 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 8, 2021. ; require systematic optimization. The application of noise filtering algorithms is likely to 397 be fruitful when used with larger cohorts and optimized parameters. Furthermore, the 398 application of learning algorithms, in combination with our analytic approaches, is 399 expected to boost the prediction quality significantly.

400
In addition to optimizing RS data collection and evaluation, we anticipate to explore 401 complementary information from infrared spectroscopy and, potentially, mass 402 spectroscopy.

403
Conclusions In this report, we investigated RS's potential for application as 404 diagnostic tool for blood serum samples with a cohort of 234 patients at different stages 405 of liver disease. Thirty-eight biomarkers from were available from the clinical laboratory. 406 After establishing a suitable compensation for the background, linear correlation 407 analyses were performed to identify Raman shifts with a reasonable correlation with the 408 clinical laboratory results.

409
It has to be emphasized that the results should be considered a first estimation.

410
There is a range of technical as well as methodological aspects that require detailed 411 analysis in future studies that have the potential to improve the results. We would 412 expect many more markers to be determinable reliably from spectra. Our study did not 413 attempt to specifically improve biomarker predictions for individual parameters, but 414 rather to develop an algorithm to identify a large number of biomarkers that can 415 potentially be measured by Raman spectroscopy.

416
It was essential to counterbalance overfitting. Keeping the number of frequencies low 417 was found to be absolutely necessary to avoid overfitting. Relatively few frequencies 418 suffice to obtain good correlations with the markers. From our list of 38 collected 419 parameters, 25 reached a correlation larger than 0.6. The derivation of the weights was 420 performed rather restrictively to strictly avoid overfitting. Thus, the presented 421 correlations, although very good, can be considered conservative approximations. The 422 applied method makes few assumptions about the quality of the data and can be 423 considered more robust than classical approaches, such as PCA. Further studies with 424 larger cohorts will allow for an advanced optimization of the procedure, leading to 425 further increased correlations while preventing overfitting.

426
It must be emphasized that we did not compare the spectra with error-free reference 427 methods. Rather, we exploited other routine measurements as references. As these 428 measurements contain errors themselves, a correlation above ∼90% would be rather 429 surprising. In clinical lab practice, different methods of analyzing the same marker do 430 not reveal greater correlations than this. 431 We were able to validate our analysis using a defined solution of albumin in different 432 concentrations. Additionally we performed a large scale validation study in which about 433 10 markers showed robust correlations. However, some, especially lower concentrated 434 markers could not be validated. The results substantiate our methodological approach 435 in general. In summary, the results of our study illustrate the potential of RS to be 436 applied as an untargeted diagnostic tool for many different parameters in clinical 437 laboratory diagnostics.  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  Table 2. Performance analysis for the highly correlating markers. With a minimum weight of 0.2 and 9 spectral vectors. This is background filtered data, with a minimum correlation of 0.6. The significance of the correlation is reported as z-value. The consistency is estimated by the mean value and the standard deviation of the Pearson score for an ensemble of random subsets of 40% size. For each marker, the number of available spectra is reported. Next, the correlation of the values predicted from 238 spectra from an independent cohort with laboratory data is listed, followed by the number of spectra in the validation set. Some of these markers (MELD score, MELDNa score, INR, HBA1C) cannot be detected in serum by standard diagnostics.  Table 3. Validating the predictive model with well-defined concentrations of albumin. The predicted value is a linear combination using nine frequencies from the respective spectrum. Both de-ionized water and Ringer solution have an albumin concentration of zero.
June 4, 2021 15/18 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 8, 2021.  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 8, 2021. ; https://doi.org/10.1101/2021.06.06.21258433 doi: medRxiv preprint