Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Strategies for data normalization and missing data imputation and consequences for potential diagnostic microRNA biomarkers in epithelial ovarian cancer

  • Joanna Lopacinska-Jørgensen,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Pathology, Herlev Hospital, University of Copenhagen, Herlev, Denmark

  • Patrick H. D. Petersen,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft, Writing – review & editing

    Affiliation Department of Pathology, Herlev Hospital, University of Copenhagen, Herlev, Denmark

  • Douglas V. N. P. Oliveira,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Writing – original draft, Writing – review & editing

    Affiliation Department of Pathology, Herlev Hospital, University of Copenhagen, Herlev, Denmark

  • Claus K. Høgdall,

    Roles Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Resources, Writing – original draft, Writing – review & editing

    Affiliation Department of Gynaecology, Juliane Marie Centre, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark

  • Estrid V. Høgdall

    Roles Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    estrid.hoegdall@regionh.dk

    Affiliation Department of Pathology, Herlev Hospital, University of Copenhagen, Herlev, Denmark

Abstract

MicroRNAs (miRNAs) are small non-coding RNA molecules regulating gene expression with diagnostic potential in different diseases, including epithelial ovarian carcinomas (EOC). As only a few studies have been published on the identification of stable endogenous miRNA in EOC, there is no consensus which miRNAs should be used aiming standardization. Currently, U6-snRNA is widely adopted as a normalization control in RT-qPCR when investigating miRNAs in EOC; despite its variable expression across cancers being reported. Therefore, our goal was to compare different missing data and normalization approaches to investigate their impact on the choice of stable endogenous controls and subsequent survival analysis while performing expression analysis of miRNAs by RT-qPCR in most frequent subtype of EOC: high-grade serous carcinoma (HGSC). 40 miRNAs were included based on their potential as stable endogenous controls or as biomarkers in EOC. Following RNA extraction from formalin-fixed paraffin embedded tissues from 63 HGSC patients, RT-qPCR was performed with a custom panel covering 40 target miRNAs and 8 controls. The raw data was analyzed by applying various strategies regarding choosing stable endogenous controls (geNorm, BestKeeper, NormFinder, the comparative ΔCt method and RefFinder), missing data (single/multiple imputation), and normalization (endogenous miRNA controls, U6-snRNA or global mean). Based on our study, we propose hsa-miR-23a-3p and hsa-miR-193a-5p, but not U6-snRNA as endogenous controls in HGSC patients. Our findings are validated in two external cohorts retrieved from the NCBI Gene Expression Omnibus database. We present that the outcome of stability analysis depends on the histological composition of the cohort, and it might suggest unique pattern of miRNA stability profiles for each subtype of EOC. Moreover, our data demonstrates the challenge of miRNA data analysis by presenting various outcomes from normalization and missing data imputation strategies on survival analysis.

Introduction

Ovarian cancer, the eighth leading cause of cancer-related death among females worldwide [1], is a heterogenous disease with several histologic subtypes with epithelial (EOC) being most predominant (90–95% of cases) [13]. EOC can be further categorized into four main subtypes: serous (75%), endometrioid (10%), clear cell (10%), and mucinous (3%) that account for more than 95% of cases [2]. Its heterogeneity and lack of early screening methods accounts for the high mortality rates [3, 4].

MicroRNAs (miRNAs) are non-coding molecules that play an active role in regulation of gene expression and have been linked with many diseases, including EOC [57]. Despite numerous studies, there are not yet miRNA cancer biomarkers available, which inevitably leads to the main question: will it be ever feasible to implement miRNA biomarkers in the clinic? [8]. As there is a lack of consensus regarding optimal methodologies for performing miRNA detection, data analysis and standardizations, the conclusion from various miRNA studies is not consistent [9, 10]. Accurate quantification of miRNAs is a challenging task, because of their short length, high sequence similarities, occurrence of isoforms and O-methyl modifications [11]. Moreover, the miRNAs account only for a small fraction (ca. 0.01%) of total RNA in a sample and their expression varies from a few to thousands of copies per cell [12]. Real-time RT-qPCR is the golden standard within the field of miRNAs detection and it is commonly used to validate findings from large-scale miRNA profiling studies [13, 14]. However, RT-qPCR can be performed by various technologies that employ different strategies, e.g., stem-loop reverse transcriptase-PCR, polyadenylation of RNAs, ligation of adapters or RT with universal or miRNA-specific qPCR primers and/or probes [15]; all of which may further impact the outcome of results. Moreover, there is no consensus regarding data handling and analysis and the impact of various normalization methods has basis of discussion [1619]. Exogenous oligonucleotides, which are added at known amount to the biological specimens during RNA isolation or cDNA synthesis, can only correct RT-qPCR data for the variability arising from particular technical steps, but not for any other variables to which they are not exposed, such as disease stage and progression, medical treatment or sample collection and preservation [17, 18]. Ideally, the normalization methods should rely on the use of stably expressed endogenous miRNAs, which potentially eliminates the differences resulting from RNA class origin and sampling, thus might help to discover unique and reproducible changes in miRNA expression levels [18]. U6, a small nuclear RNA (snRNA), is commonly used as the endogenous control in miRNA studies, despite reported high inter-variances and expression instability in cancers [20, 21]. Moreover, U6-snRNA belongs to the other class of RNA than miRNAs, and thus, their transcription, processing and tissue-specific expression patterns are different [18]. Thus, despite its exhaustive use, U6-snRNA has a series of limiting factors which might compromise the result outcome. To date, only a few reports on the identification of endogenous miRNAs in OC have been published [4, 2224]. Another challenging step in data analysis is related to the fact that it is often not possible to collect complete data in RT-qPCR experiments. Cp values for some miRNAs can be missing because of technical failure or biological reason. Unfortunately, studies tend not to precisely describe how handling of missing data was performed, e.g., the number/ratio of missing values per variable of interest is not included or cut-off regarding acceptable amount of missingness is not presented [25, 26]. Therefore, our aim was to validate stability of previously reported endogenous miRNA controls in OC in a new cohort of patients. As differences between observed miRNAs levels in various studies have been reported, we validated our finding in two external cohorts retrieved from the NCBI Gene Expression Omnibus database. Moreover, we compared various missing data approaches (complete cases, single or multiple imputation) and data normalization strategies (endogenous miRNA controls, U6-snRNA, and global mean) in terms of their impact on the survival analysis.

Materials and methods

Patient cohort

Tumor tissues stored as formalin-fixed and paraffin embedded (FFPE) were acquired from two Danish projects: the Pelvic Mass study (2004–2014) and the GOVEC (Gynecological Ovarian Vulva Endometrial Cervix cancer) study (2015 –ongoing) through the Bio- and Genome Bank Denmark [27]. The study has been approved by the Danish National Committee for Research Ethics, Capital Region (H-17029749/H-15020061) and was performed according to the guidelines of the Declaration of Helsinki, including written informed consent from all patients. Tissues from a total of 63 patients diagnosed with EOC (International Federation of Gynecology and Obstetrics staging—FIGO stage III/IV) were included in this study. All patients were followed from the surgery date until either death of any cause, emigration or November 11, 2022.

RNA extraction and miRNA profiling by RT-qPCR

Total RNA was extracted from FFPE tissue slices by use of miRNeasy FFPE kit (Qiagen, cat. No. 217504) and subjected to RT-qPCR as described previously [28]. Briefly, two tissue sections with 5 μm thickness were sliced from each paraffin block and immersed in 160 μl deparaffinization solution (Qiagen), followed by steps described by the manufacture’s protocol. 1 μl RNA isolation spike-in control mix (UniSp2, UniSp4, and UniSp5) (Qiagen) was added to each sample. Reverse transcription of RNA was accomplished by miRCURY LNA RT Kit (Qiagen, cat. no. 339340) by adding 10 ng of total RNA and 0.5 μl cDNA synthesis spike-in control mix containing UniSp6 and cel-miR-39-3p in 10 μl reaction volumes. RT-qPCR reactions were performed using miRCURY LNA SYBR Green PCR Kit (Qiagen, cat. no. 339347), miRCURY Custom PCR Panels in a 384-well plate format (Qiagen, cat. no. 339332, design no. YCA33809) and a LightCycler 480 (Roche, Hvidovre, Denmark).

The miRCURY Custom PCR panels contained 48 locked nucleic acid (LNA) PCR assays for detection of 1) 40 miRNAs selected based on their stability or biomarker potential, 2) the RNA isolation spike-in controls added at the beginning of each isolation procedure (UniSp2, UniSp4, and UniSp5), 3) the cDNA synthesis spike-in controls (UniSp6 and cel-miR-39-3p), 4) UniSp3, the interplate calibrator (UniSp3_IPC), to identify between runs variations, 5) U6-snRNA which is commonly used as the endogenous control, and 6) blank spot to control unwanted RT-qPCR outcomes. A design of a customized plate can be found in S1 Fig. RT-qPCR reactions were prepared according to the manufacture’s protocol. Briefly, a large pool containing the 2x miRCURY SYBR Green Master Mix and RNase free water was prepared, and cDNA was added at a ratio of 1:100 of a total volume. The mixture was then homogenized thoroughly and briefly centrifuged before distributing 10 μl into individual PCR wells. The LightCycler®480 software version 1.5 (Roche) and absolute quantification analysis/2nd derivative maximum method with high confidence setting was employed to determine the baseline and the Crossing points (Cps) of the amplification curves for each run. The customized panels were subjected to RT-qPCR experiments and raw data was collected from 10 panels (8 patients per panel) (S2 Table). After interplate calibration with UniSp3_IPC, the isolation spike-in controls (UniSp2, UniSp4, and UniSp5) and the cDNA synthesis controls (UniSp6 and cel-miR-39-3p) were then used to evaluate the efficiency of the process and exclude any samples due to questionable quality, described according to previously published protocol [28]. The isolation spike-in controls are mixed such that UniSp2 is present at a concentration 100-fold higher than UniSp4, and UniSp4 is present at a concentration 100-fold higher than UniSp-5. Therefore, approximately 6.6 cycles difference is expected between UniSp4 and UniSp2, as well as between UniSp5 and UniSp4. As the concentration of UniSp5 reflects weakly expressed miRNAs, its detection might not always be possible. Moreover, since the input RNA amount for cDNA synthesis had to be adjusted to 10 ng in total, the cDNA samples were prepared with different RNA dilution ratios, as the concentrations of isolated RNA varied widely, which led to relatively high variation in observed UniSp2 and UniSp4 levels. Therefore, all samples with UniSp2 Cp values above 30 and outside 5–8 Cp difference range between UniSp4 and UniSp2 were excluded from further analysis (S2 Table).

Data handling and analysis

All data analyses were performed using R Statistical Software (version 4.1.1; R Foundation for Statistical Computing, Vienna, Austria) [29]. The analysis workflow employed is presented on Fig 1.

thumbnail
Fig 1. Analysis pipeline for comparison of missing data and data normalization handling strategies impact on identification of stable endogenous miRNAs controls and survival analysis.

Only miRNAs without missing values (complete_cases) were included to assess their stability with five algorithms: the comparative delta-Ct method (deltaCt), BestKeeper, NormFinder, geNorm, and RefFinder. Our findings were further validated in external cohorts. Single or multiple imputation methods were applied on miRNAs with missing value ratio lower than 20%. Three normalization strategies were used individually on different input datasets. Finally, survival analysis by Cox regression was performed on fifteen differently processed datasets originated from the same RT-qPCR raw dataset.

https://doi.org/10.1371/journal.pone.0282576.g001

Endogenous control selection

In order to identify stable endogenous controls, only miRNAs without missing values and Cp values below 35 (complete cases) were included. The stability of the miRNA investigated was assessed by utilizing online expression stability tool RefFinder [30] implemented into R software by use of R package: RefSeeker developed by our group. The tool enables the identification of stably expressed candidates by several well-known algorithms; BestKeeper [31], NormFinder [32], geNorm [19] and the comparative delta-Ct method [33]. Moreover, RefFinder assigns a weight to the ranking of the four statistical algorithms and calculate a geometric mean to determine an overall ranking of the analyzed genes.

Validation in external datasets

We retrieved data from two independent datasets: GSE81873 [34] and GSE43867 [35] by use of the NCBI Gene Expression Omnibus database. For detailed information regarding biospecimen collection, clinical data, and sample processing, we refer to the original publications for each dataset. We employed the R package miRBaseConverter to unify miRNA annotation from two datasets to the latest miRBase version (version 22) [36]. MiRNA entries not presented in the latest version of the miRBase database were excluded from further analysis. Only targets (miRNAs and controls) without missing values across entire dataset and Cp values below 35 were included to perform stability analysis as described in the section”Endogenous control selection”.

Data normalization

Three normalization methods were applied to one complete_cases and four datasets from missing data imputation step (Fig 1):

  • Based on stable endogenous miRNAs (ENDO): normalized Cp for each miRNA for each sample is calculated by subtracting the geometric mean of stable endogenous miRNA in that sample from the raw Cp of that sample
  • Based on U6-snRNA (U6-snRNA): normalized Cp for each miRNA for each sample is calculated by subtracting the value of U6 in that sample from the raw Cp of that sample
  • Global mean made on complete cases or imputed datasets (Gmean): normalized Cp for each miRNA for each sample is calculated by subtracting the mean of all miRNAs (present and imputed) from the raw Cp of that miRNA.

Missing data

The pattern for incomplete dataset was investigated by use of two R packages: “finalfit” [37] and “ggmice” [38]. The missingness of a single dependent variable–a specific miRNA with missing values was tested against explanatory variables individually: overall survival (OS), OS status, age at diagnosis, histological subtype, and a customized panel number as multiple univariate analyses. Continuous data were compared with a Kruskal Wallis test, whereas discrete data are compared with a chi-squared test [37]. In order to identify stable endogenous controls, only miRNAs without missing values were included. To investigate possible impact of different imputation methods, Cp values equal to or above 35 were marked as missing values (“NA”). The imputation methods were further applied on miRNAs with missing value ratio lower than 20%. To fill missing values, simple and multiple imputation approaches were applied. As simple imputation methods, we tested two approaches: replacing missing values with Cp = 35 (imp_na_35) or replacing missing values by “highest Cp value for investigated miRNA + 1” (imp_max_one) [39]. To fill missing values by multiple imputation, 3 strategies were employed:

  • nonparametric missing value imputation using random forest enabled by an R package “missForest” [40] (imp_missF)
  • k-nearest neighbour imputation with an R package “VIM” [41] (imp_VIM).

Survival analysis

Survival analysis was performed by use of following R packages:”finalfit” [37], “RegParallel” [42],”survival” [43], and”survminer” [44] in following datasets originated from raw data, but processed in various ways (Fig 1): “complete_cases_ENDO”, “complete_cases_U6-snRNA”, “complete_cases_GMean”, “imp_missF_ENDO”, “imp_missF_U6-snRNA”, “imp_missF_GMean”, “imp_VIM_ENDO”, “imp_VIM_U6-snRNA”, “imp_VIM_GMean”, “imp_max_one_ENDO”, “imp_max_one_U6-snRNA”, “imp_max_one_GMean”, “imp_na_35_ENDO”, “imp_na_35_U6-snRNA”, “imp_ na_35_GMean”.

OS was defined as time in months, counting from the time of diagnosis (surgery) to time of death, or last censored follow-up. To investigate potential impact of various normalization and imputation strategies on survival analysis, univariate Cox proportional hazard analysis of OS (P < 0.05) was employed to identify miRNA candidates with a potential prognostic value.

The coefficient of variation for individual miRNAs (CoV_ind) was calculated as the ratio between standard deviation and mean of the Cp value of this miRNA across all samples in 4 differently imputed datasets.

Results

The histological subtypes were 48 high-grade serous adenocarcinomas (HGSC) with the median age of 65.6 years (31.4–88.2) and the median overall survival of 31.3 months (13.1–176.1).

To evaluate performance of nine endogenous candidates, their expression was measured by RT-qPCR along the expression of miRNAs that have been reported as potential stable candidate biomarkers in OC (Table 1).

thumbnail
Table 1. Stable endogenous control and biomarker candidates investigated in the study.

https://doi.org/10.1371/journal.pone.0282576.t001

Identification of endogenous miRNAs controls

Only miRNAs without missing values (complete cases, Table 2) were included to assess their stability with five well-known algorithms; BestKeeper (BestK) [31], NormFinder (NormF) [32], geNorm [19], the comparative delta-Ct method (deltaCt) [33] and RefFinder [30] (Table 3). The latter algorithm assigns a weight to the ranking of the former ones and calculates a geometric mean to determine an overall ranking of the analyzed miRNAs. The differences between rankings from various algorithms can be seen, for example hsa-miR-23a-3p is ranked as most stable by deltaCt, NormF, geNorm and RefFinder, but according to the BestK is ranked at the 8th position. U6-snRNA, commonly used control is not among top most stable candidates for various algorithms.

thumbnail
Table 2. The expression values of miRNAs and controls presented as their mean (mean Cp), standard deviation (sd) and ratio of missing values (ratio NA).

https://doi.org/10.1371/journal.pone.0282576.t002

thumbnail
Table 3. Stability values and rankings for five algorithms used to identify best stable endogenous controls.

The stability values are not directly comparable between different algorithms.

https://doi.org/10.1371/journal.pone.0282576.t003

Validation in external datasets

Our results were validated using two external datasets retrieved from the NCBI Gene Expression Omnibus database: GSE81873 and GSE43867 (Table 4). After filtering miRNAs with missing values and values above 35, there were 81 and 162 targets, respectively in the cohorts GSE81873 and GSE43867, included for the stability analysis. Rankings for miRNAs shared between cohorts are presented in Tables 5 and 6, whereas full rankings for both datasets can be found in the S3 Table. The differences between rankings were observed, for example hsa-miR-193a-5p was ranked among top candidates for our cohort (RefFinder rank 2/21) and GSE81873 dataset (RefFinder rank 10/89), but not for the GSE43876 cohort (RefFinder rank 72/162), when only HGSC patients were included. U6-snRNA was ranked as 12 in our dataset, 45 in GSE81873, and 161 in GSE43876 according to RefFinder stability rank. The differences between various algorithms can be seen and are more pronounced with higher number of miRNAs being included in the analysis. In order to evaluate the composition of the cohort on the choice of endogenous controls, we performed stability analysis in different subgroups in the external cohorts and presented rankings for chosen miRNAs in Table 6.

thumbnail
Table 4. Characteristics of external cohorts included in the validation study.

https://doi.org/10.1371/journal.pone.0282576.t004

thumbnail
Table 5. Comparison of stability rankings for shared miRNAs between our cohort and two external datasets: GSE81873 and GSE43867.

https://doi.org/10.1371/journal.pone.0282576.t005

thumbnail
Table 6. Comparison of stability rankings for selected miRNAs in different subsets of two external datasets: GSE81873 and GSE43867.

https://doi.org/10.1371/journal.pone.0282576.t006

Imputation of missing data

After adjusting the raw RT-qPCR data from 10 panels (8 patients per panel) by the interplate calibrator, the pattern of missingness of an incomplete dataset was investigated (Table 2). The imputation methods were further applied on miRNAs with missing value ratio lower than 20%, resulting in excluding hsa-miR-1183, hsa-miR-135a-3p, hsa-miR-149-3p, hsa-miR-23a-5p, hsa-miR-27a-3p, hsa-miR-27a-5p, hsa-miR-302d-3p, hsa-miR-506-3p, hsa-miR-595, hsa-miR-802 and hsa-miR-92b-5p. In the next step, the missingness pattern of a specific miRNA with missing values was tested against explanatory variables: OS, OS status (dead or alive), age at diagnosis, and a customized panel number, but no dependence has been observed.

In order to fill missing values, simple (na_max_one and na_35) and multiple imputation (missF, VIM) approaches were applied and their effects on the CoV values on individual miRNAs are presented in Table 7. The results show that the CoV values for individual miRNAs are not affected by multiple imputation methods (missF and VIM), however single imputation methods lead to higher CoVs when comparing to CoVs from the raw dataset (data_raw). For hsa-miR-126-3p and hsa-miR-1234-3p, which had many missing values, the CoVs are approximately 1.6 times higher than CoVs from the raw dataset.

thumbnail
Table 7. Coefficient of variation analysis on individual miRNAs that presented missing Cp values and were imputed with four methods (missF, VIM, na_max_one, na_35).

CoV, CoV_imp_missF, CoV_imp_VIM, CoV_imp_max_one, and CoV_imp_na_35 –coefficient of variation for particular RNA in a raw dataset, in the missF imputed dataset, in the VIM imputed dataset, in max_one imputed dataset and na_35 imputed dataset, respectively.

https://doi.org/10.1371/journal.pone.0282576.t007

In the next step, three different methods of normalization (ENDO, U6-snRNA, GMean) were performed on complete cases dataset and on each imputed dataset, which resulted in fifteen different datasets. Each miRNA was subjected to univariate Cox regression analyses of OS and miRNA candidate OS predictors were found only in datasets that were imputed, but not in complete cases dataset normalized by three methods (ENDO, U6-snRNA and GMean) (Table 8). Hsa-miR-126-3p was indicated as associated with OS in two workflows: imp_na_35_U6-snRNA and imp_max_one_U6-snRNA, whereas hsa-miR-1301-3p was found in: imp_VIM_GMean, imp_missF_GMean, imp_na_35_U6-snRNA, imp_max_one_U6-snRNA, imp_VIM_U6-snRNA, imp_na_35_GMean, and imp_missF_U6-snRNA. Noticeably, both miRNAs: hsa-miR-126-3p and hsa-miR-1301-3p had missing values in a raw dataset (16.7% of missing values, Table 7).

thumbnail
Table 8. Univariate Cox analyses to identify the candidate predictors for OS.

miRNAs with P-value below 0.05 are presented.

https://doi.org/10.1371/journal.pone.0282576.t008

Discussion

With this study, hsa-miR-23a-3p, and hsa-miR-193a-5p were identified as most stable in a cohort of 48 patients with HGSC. Interestingly, these miRNAs were included in our study because they were reported previously as potential biomarkers in EOC (Table 1) [60] to discriminate between type I and II tumours (hsa-miR-193a-5p) and to predict progression free survival (hsa-miR-23a-3p) pointing towards an explanation of the very different effects of miRNA observed in cohorts with different composition of EOC patients. Moreover, as we showed in Table 6, the pattern of stability for a particular miRNA (increased or decreased stability rank) is dependent on the histological composition of the cohort, but also varies between different miRNAs. For example, hsa-miR-193a-5p stability is increasing, when there is only one subtype of EOC present (HGSC) versus a cohort with SC and benign cases included (GSE81873), however, opposite effect is observed for hsa-miR-191-5p. In the study that used the cohort GSE43867, two miRNAs were used as endogenous stable controls: hsa-miR-16-5p and hsa-miR-191-5p, however how they were chosen is not precisely described. According to our stability analysis, these miRNAs were not among top ten most stable targets for any of the algorithms and according to RefFinder they were ranked on the position 37 –hsa-miR-191-5p and on the position 121 –hsa-miR-16-5p (S3 Table). In case of GSE81873, the top 10 microRNAs obtained from the geNorm algorithm were used to normalize data. However, the names of these miRNAs are not mentioned in the description, therefore it would be difficult to compare it with our analysis. Nevertheless, different stability algorithms selected various top 10 miRNAs for this dataset according to our stability analysis (S3 Table). Therefore, in general more detailed description of various data processing steps would be recommended to enable validation of the results across different research centers.

U6-snRNA that is commonly used as the endogenous control in OC studies [6163] was not in the top 10 of most stable candidates for three algorithms: deltaCt, NormFinder, geNorm, and RefFinder (rank 14 or 15 dependently on the imputation method–Table 3). Similar findings were made in external validation cohorts while considering HGSC patients, in which U6-snRNA was ranked as 45th among 81 targets in GSE81873, whereas in GSE43867 was on the position 161 among 162 targets considered (Table 5). These observations indicate that U6-snRNA is not a suitable endogenous candidate for normalization in EOC, which is in line with some previous reports showing high inter-individual variances and expression instability of U6-snRNA in cancers [20, 21]. Our results emphasize that it is crucial to select suitable endogenous controls to ensure the reliability of RT-qPCR, especially when working with miRNAs in a clinical context in a complex, heterogenous diseases such as EOC, as the conclusions that are drawn might be dependent on the composition of the cohort and might impact clinical decisions.

In our previous study, we collected miRNA-microarray data from four datasets: the in-house “Pelvic Mass”, and three public datasets with primary EOC patients: The Cancer Genome Atlas, GSE47841, and GSE73581 in order to find endogenous control candidates [22]. We found that two miRNAs: hsa-miR-106b-3p and hsa-miR-92b-5p were among the top 100 candidates for all datasets, when considering only miRNAs mutual for all datasets. Their stability was not evaluated in this study, as none of them were available as a complete case (hsa-miR-106b-3p – 2.1% of missing data, hsa-miR-92b-5p – 33.3% of missing data).

Moreover, we furthermore examined the impact of various data handling on survival analysis in 48 HGSC patients. Each miRNA in each of 15 differently processed datasets (Fig 1) was subjected to univariate Cox regression analysis and miRNA being predictors of OS (hsa-miR-126-3p and hsa-miR-1301-3p) were only found in the datasets that included miRNAs with missing values (Table 8). Interestingly, these miRNAs were included because of their biomarker potential: hsa-miR-126-3p was associated with OS in EOC and hsa-miR-1301 was shown to be involved in cisplatin resistance (Table 1). However, these miRNAs were not complete cases (Table 2) and further investigation is required to understand their role in EOC. As shown in Table 7, the CoV values for individual miRNAs were mainly affected by single imputation methods, whereas subtle differences were observed when comparing datasets filled by multiple imputation methods with the raw dataset. This is not surprising, as imputed values are estimated from the detected Cp values. However, such imputation might assign Cp values to some miRNAs that were in fact not detectable because of a biological reason (no target in the sample or very low concentration) not a technical failure and might lead to false conclusions [39]. Therefore, one might consider increasing the number of biological/technical replicates per sample in order to decrease the number of miRNAs with missing data or to confirm their missingness because of biological reasons, such as regulatory loops of other genes expression depending on the biological context of each EOC subtype. Moreover, the sample age and type could be additionally included as a potential factor that could impact miRNA studies. Nonetheless, previous studies have showed a good correlation of miRNA between fresh-frozen and FFPE when including several hundred microRNAs [64, 65].

Conclusions

Our study demonstrates the need of awareness in the choice of RT-qPCR missing data handling and data normalization approaches in miRNA biomarker studies. We suggest that other endogenous controls than U6-snRNA, which seems to be unstable in various EOC cohorts, should be considered and we identified hsa-miR-23a-3p and hsa-miR-193a-5p among top candidates for HGSC patients. We presented that the pattern of miRNA stability depends on the histological composition of the cohort and it might suggest that each subtype of EOC might be characterized by its unique miRNAs expression pattern. Moreover, in order to achieve consensus on miRNA-related studies in EOC, future studies regarding miRNA data analysis are required, as e.g., many studies do not precisely describe how handling of missing data was performed, which can lead to biased results and hinder the validation efforts.

Supporting information

S1 Fig. The customized design of 48 assays per sample in a 384 well plate format (8 samples in total).

https://doi.org/10.1371/journal.pone.0282576.s001

(DOCX)

S1 Table. The Cp values obtained by miRNA profiling with use of the miRCURY customized PCR panels.

https://doi.org/10.1371/journal.pone.0282576.s002

(XLSX)

S2 Table. The Cp values of spike-ins controls: UniSp2, UniSp4, UniSp5, UniSp6 and cel_miR_39_3p.

https://doi.org/10.1371/journal.pone.0282576.s003

(XLSX)

S3 Table. Stability rankings from five algorithms (the comparative delta-Ct method, BestKeeper, Normfinder, geNorm and RefFinder) for two external datasets: GSE81873 and GSE43867.

https://doi.org/10.1371/journal.pone.0282576.s004

(XLSX)

References

  1. 1. Ferlay J, Colombet M, Soerjomataram I, Mathers C, Parkin DM, Piñeros M, et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int J Cancer. 2019;144(8):1941–53. pmid:30350310
  2. 2. Prat J. Ovarian carcinomas: Five distinct diseases with different origins, genetic alterations, and clinicopathological features. Virchows Arch. 2012;460(3):237–49. pmid:22322322
  3. 3. Bast RC, Hennessy B, Mills GB. The biology of ovarian cancer: new opportunities for translation. Nat Rev Cancer. 2009;9(6):415–28. pmid:19461667
  4. 4. Yokoi A, Matsuzaki J, Yamamoto Y, Yoneoka Y, Takahashi K, Shimizu H, et al. Integrated extracellular microRNA profiling for ovarian cancer screening. Nat Commun. 2018;9(1):2–6.
  5. 5. Alshamrani AA. Roles of microRNAs in Ovarian Cancer Tumorigenesis: Two Decades Later, What Have We Learned? Front Oncol. 2020;10:1084. pmid:32850313
  6. 6. Condrat CE, Thompson DC, Barbu MG, Bugnar OL, Boboc A, Cretoiu D, et al. miRNAs as Biomarkers in Disease: Latest Findings Regarding Their Role in Diagnosis and Prognosis. Vol. 9, Cells. 2020. pmid:31979244
  7. 7. Yoshida K, Yokoi A, Kato T, Ochiya T, Yamamoto Y. The clinical impact of intra- and extracellular miRNAs in ovarian cancer. Cancer Sci. 2020;111(10):3435–44. pmid:32750177
  8. 8. Saliminejad K, Khorram Khorshid HR, Ghaffari SH. Why have microRNA biomarkers not been translated from bench to clinic? Futur Oncol. 2019;15(8):801–3.
  9. 9. Mockly S, Seitz H. Inconsistencies and Limitations of Current MicroRNA Target Identification Methods. In: Methods in Molecular Biology. 2019. p. 291–314. pmid:30963499
  10. 10. Tiberio P, Callari M, Angeloni V, Daidone MG, Appierto V. Challenges in using circulating miRNAs as cancer biomarkers. Biomed Res Int. 2015;2015:731479. pmid:25874226
  11. 11. Leshkowitz D, Horn-Saban S, Parmet Y, Feldmesser E. Differences in microRNA detection levels are technology and sequence dependent. Rna. 2013;19(4):527–38. pmid:23431331
  12. 12. Dong H, Lei J, Ding L, Wen Y, Ju H, Zhang X. MicroRNA: Function, detection, and bioanalysis. Chem Rev. 2013;113(8):6207–33. pmid:23697835
  13. 13. Hong LZ, Zhou L, Zou R, Khoo CM, Chew ALS, Chin CL, et al. Systematic evaluation of multiple qPCR platforms, NanoString and miRNA-Seq for microRNA biomarker discovery in human biofluids. Sci Rep. 2021;11(1):1–11.
  14. 14. Li Y, Yao L, Liu F, Hong J, Chen L, Zhang B, et al. Characterization of microRNA expression in serous ovarian carcinoma. Int J Mol Med. 2014;34(2):491–8. pmid:24939816
  15. 15. Forero DA, González-Giraldo Y, Castro-Vega LJ, Barreto GE. qPCR-based methods for expression analysis of miRNAs. Biotechniques. 2019;67(4):192–9. pmid:31560239
  16. 16. Chekka LMS, Langaee T, Johnson JA. Comparison of Data Normalization Strategies for Array-Based MicroRNA Pro fi ling Experiments and Identi fi cation and Validation of Circulating MicroRNAs as Endogenous Controls in Hypertension. 2022;13(March):1–9.
  17. 17. Faraldi M, Gomarasca M, Sansoni V, Perego S, Banfi G, Lombardi G. Normalization strategies differently affect circulating miRNA profile associated with the training status. Sci Rep. 2019;9(1):1–13.
  18. 18. Schwarzenbach H, Da Silva AM, Calin G, Pantel K. Data normalization strategies for microRNA quantification. Clin Chem. 2015;61(11):1333–42. pmid:26408530
  19. 19. Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002;3(7). pmid:12184808
  20. 20. Morata-Tarifa C, Picon-Ruiz M, Griñan-Lison C, Boulaiz H, Perán M, Garcia MA, et al. Validation of suitable normalizers for miR expression patterns analysis covering tumour heterogeneity. Sci Rep. 2017 Jan;7:39782. pmid:28051134
  21. 21. Lou G, Ma N, Xu Y, Jiang L, Yang J, Wang C, et al. Differential distribution of U6 (RNU6-1) expression in human carcinoma tissues demonstrates the requirement for caution in the internal control gene selection for microRNA quantification. Int J Mol Med. 2015;36(5):1400–8. pmid:26352225
  22. 22. LOPACINSKA-JOERGENSEN J, OLIVEIRA DVNP, HOEGDALL , HOEGDALL E V. Identification of Stably Expressed Reference microRNAs in Epithelial Ovarian Cancer. In Vivo (Brooklyn). 2022 May 1;36(3):1059 LP– 1066. pmid:35478140
  23. 23. Bignotti E, Calza S, Tassi RA, Zanotti L, Bandiera E, Sartori E, et al. Identification of stably expressed reference small non-coding RNAs for microRNA quantification in high-grade serous ovarian carcinoma tissues. J Cell Mol Med. 2016;20(12):2341–8. pmid:27419385
  24. 24. Vilming Elgaaen B, Olstad OK, Haug KBF, Brusletto B, Sandvik L, Staff AC, et al. Global miRNA expression analysis of serous and clear cell ovarian carcinomas identifies differentially expressed miRNAs including miR-200c-3p as a prognostic marker. BMC Cancer. 2014;14(1):1–13. pmid:24512620
  25. 25. Carroll OU, Morris TP, Keogh RH. How are missing data in covariates handled in observational time-to-event studies in oncology? A systematic review. BMC Med Res Methodol. 2020;20(1):1–15. pmid:32471366
  26. 26. Papageorgiou G, Grant SW, Takkenberg JJM, Mokhles MM. Statistical primer: How to deal with missing data in scientific research? Interact Cardiovasc Thorac Surg. 2018;27(2):153–8. pmid:29757374
  27. 27. Danish Gynecologic Cancer Group. Annual Report of the Danish Gynecologic Cancer Database 2016–17. Danish Gynecologic Cancer Database. 2017.
  28. 28. PETERSEN PHD, LOPACINSKA-JØRGENSEN J, OLIVEIRA DVNP, HØGDALL CK, HØGDALL E V. miRNA Expression in Ovarian Cancer in Fresh Frozen, Formalin-fixed Paraffin-embedded and Plasma Samples. In Vivo (Brooklyn). 2022 Jul 1;36(4):1591 LP– 1602. pmid:35738639
  29. 29. (2020). RCT. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  30. 30. Xie F, Xiao P, Chen D, Xu L, Zhang B. miRDeepFinder: A miRNA analysis tool for deep sequencing of plant small RNAs. Plant Mol Biol. 2012;80(1):75–84. pmid:22290409
  31. 31. Pfaffl MW, Tichopad A, Prgomet C, Neuvians TP. Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper—Excel-based tool using pair-wise correlations. Biotechnol Lett. 2004;26(6):509–15. pmid:15127793
  32. 32. Andersen CL, Jensen JL, Ørntoft TF. Normalization of real-time quantitative reverse transcription-PCR data: A model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 2004;64(15):5245–50. pmid:15289330
  33. 33. Silver N, Best S, Jiang J, Thein SL. Selection of housekeeping genes for gene expression studies in human reticulocytes using real-time PCR. BMC Mol Biol. 2006;7:1–9.
  34. 34. Korsunsky I, Parameswaran J, Shapira I, Lovecchio J, Menzin A, Whyte J, et al. Two microRNA signatures for malignancy and immune infiltration predict overall survival in advanced epithelial ovarian cancer. J Investig Med. 2017;65(7):1068–76. pmid:28716985
  35. 35. Vecchione A, Belletti B, Lovat F, Volinia S, Chiappetta G, Giglio S, et al. A microRNA signature defines chemoresistance in ovarian cancer through modulation of angiogenesis. Proc Natl Acad Sci U S A. 2013;110(24):9845–50. pmid:23697367
  36. 36. Xu T, Su N, Liu L, Zhang J, Wang H, Zhang W, et al. miRBaseConverter: An R/Bioconductor package for converting and retrieving miRNA name, accession, sequence and family information in different versions of miRBase. BMC Bioinformatics. 2018;19:514. pmid:30598108
  37. 37. Harrison E, Drake T, Ots R. finalfit: Quickly Create Elegant Regression Results Tables and Plots when Modelling. 2021.
  38. 38. Oberman H. ggmice: Visualizations for “mice” with “ggplot2.” 2022.
  39. 39. De Ronde MWJ, Ruijter JM, Lanfear D, Bayes-Genis A, Kok MGM, Creemers EE, et al. Practical data handling pipeline improves performance of qPCR-based circulating miRNA measurements. Rna. 2017;23(5):811–21. pmid:28202710
  40. 40. Stekhoven DJ, Buehlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112–8. pmid:22039212
  41. 41. Kowarik A, Templ M. Imputation with the {R} Package {VIM}. J Stat Softw. 2016;74(7):1–16.
  42. 42. Blighe K, Lasky-Su J. RegParallel: Standard regression functions in R enabled for parallel processing over large data-frames. 2021.
  43. 43. Therneau TM. A Package for Survival Analysis in R. 2021.
  44. 44. Kassambara A, Kosinski M, Biecek P. survminer: Drawing Survival Curves using “ggplot2.” 2021.
  45. 45. Jiang W, Pan JJ, Deng YH, Liang MR, Yao LH. Down-regulated serum microRNA-101 is associated with aggressive progression and poor prognosis of cervical cancer. J Gynecol Oncol. 2017 Nov;28(6):e75–e75. pmid:29027393
  46. 46. Prahm KP, Høgdall C, Karlsen MA, Christensen IJ, Novotny GW, Høgdall E. Identification and validation of potential prognostic and predictive miRNAs of epithelial ovarian cancer. PLoS One. 2018 Nov;13(11):1–18. pmid:30475821
  47. 47. Lopacinska-Jørgensen J, Oliveira DVNP, Wayne Novotny G, Høgdall CK, Høgdall E V. Integrated microRNA and mRNA signatures associated with overall survival in epithelial ovarian cancer. PLoS One. 2021 Jul 28;16(7):e0255142. pmid:34320033
  48. 48. Prahm KP, Høgdall CK, Karlsen MA, Christensen IJ, Novotny GW, Høgdall E. MicroRNA characteristics in epithelial ovarian cancer. PLoS One. 2021;16(6 June):1–18. pmid:34086724
  49. 49. Zhong C, Dong Y, Zhang Q, Yuan C, Duan S. Aberrant Expression of miR-1301 in Human Cancer. Front Oncol. 2022 Jan 5;11:789626. pmid:35070996
  50. 50. Tang W, Jiang Y, Mu X, Xu L, Cheng W, Wang X. MiR-135a functions as a tumor suppressor in epithelial ovarian cancer and regulates HOXA10 expression. Cell Signal. 2014;26(7):1420–6. pmid:24607788
  51. 51. Fukagawa S, Miyata K, Yotsumoto F, Kiyoshima C, Nam SO, Anan H, et al. MicroRNA-135a-3p as a promising biomarker and nucleic acid therapeutic agent for ovarian cancer. Cancer Sci. 2017;108(5):886–96. pmid:28231414
  52. 52. Shi M, Mu Y, Zhang H, Liu M, Wan J, Qin X, et al. MicroRNA-200 and microRNA-30 family as prognostic molecular signatures in ovarian cancer: A meta-analysis. Med (United States). 2018;97(32):1–9. pmid:30095616
  53. 53. Sharma PC, Gupta A. MicroRNAs: potential biomarkers for diagnosis and prognosis of different cancers. Transl Cancer Res Vol 9, No 9 (September 2020) Transl Cancer Res. 2020; pmid:35117940
  54. 54. Tokumaru Y, Asaoka M, Oshi M, Katsuta E, Yan L, Narayanan S, et al. High Expression of microRNA-143 is Associated with Favorable Tumor Immune Microenvironment and Better Survival in Estrogen Receptor Positive Breast Cancer. Vol. 21, International Journal of Molecular Sciences. 2020. pmid:32370060
  55. 55. Wilczyński M, Żytko E, Szymańska B, Dzieniecka M, Nowak M, Danielska J, et al. Expression of miR-146a in patients with ovarian cancer and its clinical significance. Oncol Lett. 2017;14(3):3207–14. pmid:28927067
  56. 56. Oliveira DNP, Carlsen AL, Heegaard NHH, Prahm KP, Christensen IJ, Høgdall CK, et al. Diagnostic plasma miRNA-profiles for ovarian cancer in patients with pelvic mass. PLoS One. 2019;14(11):1–15. pmid:31738788
  57. 57. Wang Q, Ye B, Wang P, Yao F, Zhang C, Yu G. Overview of microRNA-199a Regulation in Cancer. Cancer Manag Res. 2019 Dec 10;11:10327–35. pmid:31849522
  58. 58. Sujamol S, Vimina ER, Krishnakumar U. Improving Recurrence Prediction Accuracy of Ovarian Cancer Using Multi-phase Feature Selection Methodology. Appl Artif Intell. 2021 Feb 23;35(3):206–26.
  59. 59. LI R, WU H, JIANG H, WANG Q, DOU Z, MA H, et al. FBLN5 is targeted by microRNA-27a-3p and suppresses tumorigenesis and progression in high-grade serous ovarian carcinoma. Oncol Rep. 2020;44(5):2143–51.
  60. 60. Ren X, Zhang H, Cong H, Wang X, Ni H, Shen X, et al. Diagnostic Model of Serum miR-193a-5p, HE4 and CA125 Improves the Diagnostic Efficacy of Epithelium Ovarian Cancer. Pathol Oncol Res. 2018 Mar;24. pmid:29520570
  61. 61. Gu Y, Zhang S. High-throughput sequencing identification of differentially expressed microRNAs in metastatic ovarian cancer with experimental validations. Cancer Cell Int. 2020;20(1):517. pmid:33100909
  62. 62. Ramalho S, Andrade LADA, Filho CC, Natal R de A, Pavanello M, Ferracini AC, et al. Role of discoidin domain receptor 2 (DDR2) and microRNA-182 in survival of women with high-grade serous ovarian cancer. Tumor Biol. 2019;41(1). pmid:30810094
  63. 63. Wilczyński M, Żytko E, Danielska J, Szymańska B, Dzieniecka M, Nowak M, et al. Clinical significance of miRNA-21, -103, -129, -150 in serous ovarian cancer. Arch Gynecol Obstet. 2018 Mar;297(3):741–8. pmid:29335784
  64. 64. Meng W, McElroy JP, Volinia S, Palatini J, Warner S, Ayers LW, et al. Comparison of MicroRNA Deep Sequencing of Matched Formalin-Fixed Paraffin-Embedded and Fresh Frozen Cancer Tissues. PLoS One. 2013;8(5):1–9. pmid:23696889
  65. 65. Xi Y, Nakajima G, Gavin E, Morris CG, Kudo K, Hayashi K, et al. Systematic analysis of microRNA expression of RNA extracted from fresh frozen and formalin-fixed paraffin-embedded samples. Rna. 2007;13(10):1668–74. pmid:17698639