Testing for Differentially-Expressed MicroRNAs with Errors-in-Variables Nonparametric Regression

MicroRNA is a set of small RNA molecules mediating gene expression at post-transcriptional/translational levels. Most of well-established high throughput discovery platforms, such as microarray, real time quantitative PCR, and sequencing, have been adapted to study microRNA in various human diseases. The total number of microRNAs in humans is approximately 1,800, which challenges some analytical methodologies requiring a large number of entries. Unlike messenger RNA, the majority of microRNA (60%) maintains relatively low abundance in the cells. When analyzed using microarray, the signals of these low-expressed microRNAs are influenced by other non-specific signals including the background noise. It is crucial to distinguish the true microRNA signals from measurement errors in microRNA array data analysis. In this study, we propose a novel measurement error model-based normalization method and differentially-expressed microRNA detection method for microRNA profiling data acquired from locked nucleic acids (LNA) microRNA array. Compared with some existing methods, the proposed method significantly improves the detection among low-expressed microRNAs when assessed by quantitative real-time PCR assay.


Introduction
MicroRNA (miRNA) is a panel of naturally-occurring noncoding RNA molecules in short length (*22 nt on average). To date, a total of 18,226 miRNAs, including 1,523 human miRNAs, has been registered in the miRbase database (Version 18.0). Evolutionarily conserved miRNA is capable of mediating approximately 30% of human genes, and involves many biological processes such as development, cell growth, differentiation, apoptosis, and tumorigenesis through its superior regulatory capability [1].
As a well-established discovery tool for biological and medical research, microarray technology has been migrated to the application of characterizing miRNA. Normalization is an essential matter for discovery experiments using microarray. It can minimize the systematic non-biological variations, and thus improves the identification of differentially-expressed miRNAs. The total number of miRNA is much smaller than messenger RNA (mRNA), which challenges the normalization methods utilizing global profiling information and requiring a large number of entries. Some methods including the cyclic loess method (LOESS) [2,3], the modified-LOESS (LOESS-M) [4], and quantile normalization (QN) [5][6][7] have been applied to miRNA array analysis; however, the unique signature of miRNA, such as the small total number, has reduced the enthusiasm of direct adoption [8].
Measurement errors are introduced in miRNA microarrays from different sources, including sample preparation, dying, microarray hybridization, scanning, image intensity, and equipment errors, among many others. When the majority of miRNAs is weakly expressed, measurement errors dramatically increase the uncertainty in detecting the differentially-expressed miRNAs. In this study, we adopt and generalize the two-component measurement error model for miRNA microarray data, and propose to calibrate the measurement errors using an errors-in-variables nonparametric regression method (EIVNPR hereafter). Simultaneous confidence bands are constructed to test differentiallyexpressed miRNAs. The proposed methods are applied to LNA miRNA microarray profiling data accompanied with validation by qRT-PCR. The performances of the algorithms are evaluated by computing the weighted kappa statistic, which reveals the reproducibility between the two profiling methods, LNA array and qRT-PCR. Results show that EIVNPR efficiently calibrates the measurement errors and achieves better performance than the existing methods being benchmarked.

Results
Signal quality of the LNA miRNA profiling data In miRNA data analysis, it is crucial to assess the signal quality of various profiles before normalization and differentiallyexpressed miRNAs detection. Signal-to-noise ratio (SNR) is a measure that compares the level of a desired signal to the level of background noise. MiRCURY LNA miRNA Array tests a set of 560 miRNAs with four technical replicates on each slide for each miRNA. In raw data processing, one signal intensity measure and one background intensity measure are obtained for each probe, and the negative and empty signals are flagged by the ImaGene 7.0 software. The outliers and poor signals, if the signal intensity is less than two standard deviations from the background intensity, are flagged automatically by the image processing software as well. We compute the SNR of a probe by dividing its net intensity (the background-subtracted signal) by the background. A ratio higher than 1/1 indicates that the signal is larger than the noise. For arrays with high quality signals, the SNRs tend to be large. To summarize the overall signal quality of a profile, a mean SNR is computed by taking the arithmetic average of the SNRs of all probes on a slide.
The results in Figure 1 illustrate that the majority of human miRNAs are weakly or not expressed. Among all 40 human osteosarcoma xenografts profiles, at least 65% of the probes are flagged, and at least half of the profiles have more than 80% of the probes flagged (see panel (a)). All 40 profiles have mean SNR smaller than 10, and more than half of the profiles have mean SNRs smaller than 5.00 (see the panel (b) in Figure 1). Among all 40 profiles, the majority have maximum SNR smaller than 100, and five of which have maximum SNR less than 20 (see the panel (c) in Figure 1).

Intra-and inter-platform reproducibility
The specimens are also evaluated using qRT-PCR. For each specimen treated with a specific chemotherapeutic treatment, a set of 663 miRNAs are tested with TaqMan Array (TLDA), with an overlap of 508 miRNAs tested with both TLDA and LNA miRNA Array. For each of the 508 miRNAs, the relative abundance is measured by the relative quantity (RQ) in qRT-PCR, and a foldchange given by the ratio between the mean intensity in the treated sample and the mean intensity in the control based on the LNA miRNA profiling data. For each of the two array platforms, TLDA and LNA array, the intra-platform reproducibility is assessed using the Spearman's correlation coefficient. That is, under each of the three chemotherapeutic treatments, a Spearman's correlation coefficient is computed between the two profiles of every pair of specimens from the same platform. To assess the inter-platform reproducibility, a Spearman's correlation coefficient is computed between the two profiles from TLDA and LNA array for the same specimen under the same chemotherapeutic treatment, respectively.
The left panel in Figure 2 shows the boxplot of the Spearman's correlation coefficients for all 30 profiles from LNA array, and the middle panel shows the qRT-PCR results from TLDA. We find that intra-platform reproducibility is high for both profiling methods. As revealed in the right panel, most of samples show high coefficients except sample 6. The inter-platform reproduc- ibility is relatively low compared with the intra-platform reproducibility.

Differentially-expressed miRNA detection without replicate arrays
For each specimen under a specific chemotherapeutic treatment, we obtained two profiles: one from the treated sample and one from the control. Due to the fact that the majority miRNAs is weakly expressed, we are facing a dilemma of whether or not to use the probe level measures that are weak. If we filter out the flagged probes, we may have too little available information to evaluate the regulation trends of most miRNAs, and we could dramatically over-estimate their expression levels by dropping the measures not significantly higher than the background noise. Another drawback is that some statistical tests such as the t-test can not be applied to detect the differentially-expressed miRNAs. On the contrary, if we keep measures from all probes, the test could be dominated by the measurement errors.
In this study, we filter out all probes from contaminated regions that are marked as outliers. Then detection of differentiallyexpressed miRNAs are performed based on (a) probes that are not flagged as weakly expressed, or (b) the rest of the probes including those are weakly expressed. When the flagged probes are filtered out, the number of usable probes are different for different miRNAs on each slide. We compute the mean intensity for each miRNA, and various normalization methods are applied to the two profiles of miRNAs with valid measures. As a result, some existing methods such as the t-test is not applicable for differentially-expressed miRNA detection. As a ''poor-man's method'', regulation trends are identified using the fold-change (FC), which is the ratio between the intensity measures in the treated and control samples, or the difference of the logarithms of the intensity measures in the two samples, after normalization. Various FC cutoffs are used for the performance comparisons. Similar results are produced when a cutoff is selected between 1.5 to 2.2. A cutoff of two folds is adopted in the results reported in this study. For each normalization method, a classification table in the format of Table 1 is constructed for each treated sample, and a weighted kappa coefficient is computed to assess the reproducibility (or the degree of agreement) between the TLDA and LNA arrays. When all probes with weak signals are kept in the analysis, the differentially-expressed miRNAs can be detected using a t-test based on the technical replicates on each sides. If there are not enough usable probes, the FC method is used instead. Whichever method is used to detect the differentially-expressed miRNAs, the basel levels of the miRNAs are checked in identifying the regulation trends.
Results based on all 30 treated samples show that filtering out all flagged probes does not improves the reproducibility between the LNA array and TLDA results. The VSN and ''invariants'' methods won't work properly on the filtered data. The QN, LOESS-M and global median normalization methods produce similar results on filtered and unfiltered data. The EIVNPR method outperforms the existing normalization methods being benchmarked (see Table 2). By ignoring the probes flagged by the image processing software, EIVNPR results in slight agreement for 21 samples and no agreement for the other samples (see column marked EIVNPR2 in Table 2 and Table 3). If we keep the measures of all probes, after normalization with EIVNPR, 18 samples show slight agreement, four samples show fair agreement and one sample show moderate agreement (detailed results in column marked as EIVNPR1 in Table 2).
Normalized with the other existing normalization methods without filtering out the flagged probes, the majority of the 30 samples show slight agreement, according to Landis and Koch's interpretation of the Kappa test statistic. To further compare the performances of various normalization methods, we checked the samples having weight kappa coefficients greater than 0.1 (the number of samples are shown in parentheses in column 3 of Table 3). We find that VSN results in 30 weight kappa coefficients that are smaller than 0.1, while this number is 29 for LOESS-M, 28 for ''invariant'' method, median normalization and quantile normalization (QN), respectively. In addition, LOESS-M results in one sample show moderate agreement, and median normalization results in one sample show fair agreement. It is worth noting that LOESS-M results in less no agreement than the other methods, and its performance is close to EIVNPR based on the filtered profiles.

Performance comparisons with replicate measures from multiple profiles
We further pool the arrays for all 10 specimens together under the same chemotherapeutic treatment as biological replicates. Hence, we have 10 replicate arrays from the treated samples for each treatment, and for 10 controls. To apply EIVNPR, we first normalize multiple arrays using the built-in normalizers. Twelve normalizers are provided on each LNA miRNA array for normalization purposes. They are hsa_SNORD2, hsa_SNORD3, hsa_SNORD4A, hsa_SNORD6, hsa_SNORD10, hsa_S-NORD12, hsa_SNORD13, hsa_SNORD14B, hsa_SNORD15A, hsa_SNORD118, U6-snRNA-1, and U6-snRNA-2. These normalizers are supposed to highly and stably express across experiments. We first filter these normalizers using the flag information by the ImaGene 7.0 software. Second, we compute the normalization parameter for each profile using a maximum likelihood based iterative algorithm as in [9]. In the iterative algorithm, each normalizer is tested and will be removed if it has a significantly larger dispersion than the others. In case there are not enough normalizers, we expand the search to include the spike-ins on each array. Third, we compute the average log-transformed intensities and standard errors for all miRNAs under the treatment and control, respectively. Last, we apply EIVNPR to detect the differentially-expressed miRNAs using the 95% confidence bands.
Based on the 10 profiles from the treated samples and the 10 profiles as controls, we applied normalization methods such as QN, LOESS, LOESS-M, and median normalization, respectively. A paired t-test is performed to detect the differentially-expressed miRNAs based on multiple normalized profiles. The first column in Table 4 shows the names of the miRNAs classified as differentially-expressed by various methods, based on the specimens under treatment Ifo. The second column gives the qRT-PCR results: a t-test is applied based on the logarithms of the 10 RQ values to test whether the true RQ value is significantly    Table 4 we find that by applying LOESS normalization and t-test, only hsa-miR-27 is identified as differentially-expressed, which is validated by qRT-PCR results.
If the LOESS-M normalization method is applied instead, hsa-miR-24 will be identified as differentially-expressed as well, which is also validated by qRT-PCR results. With QN, hsa-miR-22 and hsa-miR-143 are detected and validated by qRT-PCR results, but hsa-miR-30e is misclassified at significance level 0.05. The median normalization detected four miRNAs and all are validated by qRT-PCR results, but unfortunately none is consistent with those by QN, LOESS, and LOESS-M. If we simply normalize the profiles using the normalizers, and detect the differentiallyexpressed miRNAs by a t-test, hsa-miR-191, hsa-miRlet-7b, hsa-miR-24 and hsa-miR-130b are correctly detected (column 4 marked as ''ME''). Using the 95% confidence bands method from EIVNPR, a total of 10 miRNAs are detected with seven validated by qRT-PCR, and the other three are misclassified. Among the seven miRNAs validated by qRT-PCR, two miRNAs have p-value v0:0001 with one identified by median normalization, and four mIRNAs have 0:001ƒp-valuev0:01 where hsa-miR-27a is also detected by LOESS and LOESS-M. Table 5 shows the detailed classification results by different methods. The first column shows a sequence of classes for the pvalues from 0 to 0.05. For each method listed in the first row, a pvalue is computed for each miRNA. The number of miRNAs with p-values fall in a specific class is shown as n z =n { , where n z is number of miRNAs that are correctly classified as differentiallyexpressed (TP: true positive), and n { is the number of miRNAs that are incorrectly classified as differentially-expressed (FP: false positive). From Table 5, we see that if we lower the significance level from 0.05 to 0.03, EIVNPR can correctly identify five miRNAs with no FPs. The median normalization can detect one less miRNAs than EIVNPR at significance level 0.03 with no FP as well. If we further lower the significance level to 0.025, we see all methods will produce similar results, except that QN has one FP and LOESS can detect only one differentially-expressed miRNA.

Discussion
The ''invariants'' method is developed specifically for miRNA analysis by finding a set of stably and highly expressed miRNAs for normalization [10]. However, it is challenging to find such a set of ''invariants'', especially when the expression levels of the majority of miRNA are close to the background noise due to their relatively low abundance in the cells. For miRNA profiling data, normalization methods based on the designed ''normalizers'' is feasible   Table 4. Comparisons of differentially-expressed miRNA detection (I). and reliable. However, it is also found that some endogenous normalizers are not stable across experiments [11][12][13].
The relationship between two gene expression profiles is usually non-linear, especially for the genes/miRNAs with extremely high expression levels. In concern of the nonlinearity, several prevalent nonlinear normalization methods such as LOESS, LOESS-M and QN, are adapted for miRNA study with or without modification. LOESS is a method based on the idea of the M-A plot by regressing M~log(W Y =W X ) on A~log(W Y Ã W X )=2 via locally weighted polynomial regression. The LOESS-M is a modification of LOESS by subtracting the median of M from the loess fit to the MA-plot. QN assumes that various profiles have a common distribution, and all profiles are forced to have the same quantiles at all levels. The common distribution assumption is reasonable for mRNA or cDNA data normalization because most genes are strongly and non-differentially expressed, and the total number of entries is very large. However, these assumptions might not hold true for miRNA data. VSN is another popular microarray data normalization method via a variance stabilization transformation to expression data [14]. Literature shows that both the invariant and quantile method achieved satisfying performances for onecolor miRNA microarrays [10,15].
The errors-in-variables nonparametric regression method can effectively calibrate the measurement errors and improve the detection of differentially-expressed miRNAs. It can be applied to multiple profiles normalized by some existing popular normalization methods, or by a measurement error model-based normalization procedure as in [9]. When EIVNPR is applied to two profiles, one treatment and one control, the normalization step can be bypassed for the purpose of differentially-expressed miRNA detection. On the other hand, EIVNPR is computational intensive and sensitive to the following issues. First, the results are sensitive to bandwidth selection. If one prefers to have a more smooth regression curve, the regression model is supposed to be more robust to outliers. A trade-off is that the result for an individual miRNA might be affected too much by the other miRNAs having similar expression levels. If the bandwidth is too small, the fitted curve will become too bumpy. The data-driven adaptive bandwidth selector proposed in this study works pretty well. Second, when no or less replicates are available, finding good estimates of the variances of the measurement errors is challenging. When we have only one treatment and one control, we can estimate the variances using the measures in a close neighborhood.
It is worth noting that the proposed method might be a little bit aggressive. Using simultaneous confidence bands based on nonparametric regression can utilize more global information to a large extent. However, it ultimately increases the risk of misclassification as a trade-off. There is no clear cutoff as for whether a signal is strong or weak; classifications of the differentially-expressed miRNAs should be done by using both the simultaneous confidence bands, and basal levels of the individual miRNAs. Figures 3, 4, 5 show the 95% confidence bands based on the specimens under the three treatments, respectively. We see that the regression curve is pretty smooth in all three figures. But the confidence bands in Figure 4 is not very smooth. In each of these three figures, a vertical line is drawn to mark the position two standard deviations above the mean background noise (log-transformed). When we detect the differentially-expressed miRNAs, this line can be used as a reference to check the strength of the signals. From Figure 6, we see three miRNAs fall outside the 95% simultaneous confidence bands: hsa-miR-19b has strong signal, but it stays very close to the upper band. The other two miRNAs, hsa-miR-101 and hsa-miR-195, stay farther from the confidence bands, but their expression levels are not very high. The qRT-PCR results show that none of these three miRNAs are significantly-expressed.

Conclusions
Data quality assurance is crucial in miRNA array data analysis. Well designed and well performed experiments can alleviate the bias from various sources, but can not completely eliminate the measurement errors. For miRNA microarray data, the signal quality is not as good as that for mRNA/cDNA microarray data. The majority of miRNAs are often weakly or not expressed, and the rest may have overall low SNR, which increases the uncertainty in detecting the differentially-expressed miRNAs. By modeling the measurement errors with a two-component measurement error model, and calibrating the measurement errors with errors-in-variables nonparametric regression, the proposed method using simultaneous confidence bands is more sensitive to detect the differentially-expressed miRNAs. At the same significance level, the proposed method tends to classify more miRNAs are differentially-expressed than the other existing methods, and increases the false positive rate as a trade-off. However, as a conservative solution we can lower the significance level to achieve similar false positive rates (see Table 5). Potentially the proposed method can improve the inter-platform reproducibility and can be applied for cross-platform and/or cross-lab microarray data integration.

Sample preparation and profiling data acquisition
Each of ten specimens are treated with three chemotherapeutic treatments: cisplatin(Cis), Doxorubincin (Dox), and Ifosfamide (Ifo), respectively. In addition, each specimen is treated with saline and is used as a control to detect the differentially-expressed miRNAs under different chemotherapeutic treatments. The 40 human osterosarcoma xenografts were prepared as previously described in [16]. RNA was isolated, purified, and quantified using established protocols [17]. The miRCURY LNA microRNA Array based on miRbase 9.2 (Exiqon Inc., Denmark) and TaqMan Low Density Array (TLDA) Human MicroRNA Panel v2.0 (Applied Biosystems, CA, USA) were employed for miRNA  global profiling and data validation, respectively. The detailed procedures are referred to in our previous publication [12], and raw data are available at http://gauss.usouthal.edu/publ/ada/.

Measurement error models for gene expression data
In gene expression arrays it is observed that the standard deviations of measurements are proportional to the expression levels; and this proportionality cannot continue down for entirely unexpressed genes -the standard deviations of the weakly or nonexpressed genes won't be zero [18]. A two-component measurement error model, which was originally developed in the context of instrumental methods of analytical chemistry, was extended for gene expression arrays [19][20][21]. In the same spirit, we consider the following measurement error models for one-color miRNA microarrays: W ' X j~b X j X j e g X j zE X j W ' Y j~b Y j Y j e g Y j zE Y j , j~1,:::,n, where (W ' X j ,W ' Y j ) is a pair of net median fluorescent intensities (nMFI's), which is the background-subtracted response at concentration (X j ,Y j ), and (b X j ,b Y j ) is a pair of relative expression levels that are usually indiscernible unless extra calibration data are available. In (1), two types of measurement errors are considered: (g X j ,g Y j ) represents the multiplicative error that always exists but is noticeable at concentrations significantly above zero, and (E X j ,E Y j ) represents the additive error that always exists but is noticeable mainly for near-zero concentrations. In this study, we assume independence among the error terms with g X j *N(0,s 2 ). In addition, heteroscedastic errors are assumed for both the additive and multiplicative errors in the models in (1).
Applying a Taylor expansion to the logarithm of (W ' Xj ,W ' Yj ), we get where the higher order terms to the right-hand side of the above two equations are negligible when miRNA-i is not weakly expressed in the two cell populations.
. When the higher order terms are absorbed into the multiplicative errors in (2) and (3), we get where g' X j and g' Y j are independent heteroscedastic normal errors with mean zero and standard deviations s' X j and s' Y j , respectively.

Statistical inference through nonparametric regression with errors-in-variables
In order to identify the differentially-expressed miRNAs, we propose a statistical inference approach through constructing simultaneous confidence bands (SCB) under an errors-in-variables regression model.
We are interested in the nonlinear relationship between the uncontaminated (log-transformed) intensities log X j' and log Y j' . A conventional regression model can be formulated as where f j is the random error with E(f j Dlog X ' j )~0. The regression function r(x) is the expectation of log Y ' on the condition that log X '~x, i.e., r(x)~E(log Y 'Dlog X '~x).
Directly estimating r( : ) is not feasible since X ' j and Y ' j are the true expression levels of miRNA-i in the two cell populations and are unobservable. However, combining (4) and (5) results in an errors-in-variables regression model, where j j~g ' Y j zf j is the random residual error and g' X j is the measurement error. Notice that r(x)~E(log Y 'Dlog X '~x)Ẽ (log W ' Y Dlog X '~x) since g' Y j is independent of log X ' j with mean zero. Therefore, r( : ) can be estimated from the observed contaminated data (log W ' X j ,log W ' Y j ) using the local polynomial deconvolution estimator [22]. In this study, both the random error j and the measurement error g' X are heteroscedastic [23]. For the random error, we simply assume that it has a very general variance function, j~j(x)~s(x)E, where s(x) is the unknown variance function and E has mean 0 and variance 1. For the measurement error, the heteroscedastic variance parameters can be estimated directly from the data.
We consider a local linear deconvolution estimator; it is a special case of local polynomial estimator with degree p~1. It is given byr where For heteroscedastic normal errors, we consider the following kernel for K j , q f j is the characteristic functions of the g' X j , q L (t)~(1{t 2 ) 3 x ½{1,1 (t) with the indicator function x ½{1,1 (t), and h is the smooth parameter [24]. We then construct the (1{a)100% SCB for r(x) to identify the differentially expressed miRNAs. The observations that do not fall into the confidence regions are considered as the differentially expressed miRNAs. One advantage of the approach is that the normalization step is by-passed in identifying the differentially expressed miRNAs. The form of the (1{a)100% confidence bands for r(x) over a subset x[½a,b of the predictor space is taken by for some cw0, where x[ DD : DD denotes the L 2 norm. To obtain I(x) in (8), we need to calculate the critical value c and the residual variance function s(x). c can be found using the tube formula [25,26], where Z j *N(0,1) and T j (x)~m j (x)=DDm(x)DD, and where T'(x)~(T 1' (x),T 2' (x), . . . ,T n' (x)) and T i' (x)~LT i (x)=Lx.
There are two approaches to estimate s(x). We may take the nonparametric estimator proposed by [27]. It is given bŷ whereq q n (x) is the local linear estimate of q(x)~E((log W ' Y ) 2 Dlog X '~x). To avoid zero estimates in (11), one further implements the bagging-type correction algorithm to computê s s(x) [27]. When the level of measurement errors are relatively small, we may use the other simple method by ignoring the measurement error effect on the variance function (See more discussions of the effects of error magnitude in measurement error models in [28]). The following procedure is adopted from [29]: first, define V i~l og(log W ' Y j {r r n (log W ' X j )) 2 ; second, regress the V i 's on the log W ' X j 's using any nonparametric method to get an estimateq q(x) of log s 2 (x) and computeŝ s 2 (x)~exp(q q(x)). For microarray data, the distribution of the intensities after logarithm transformation is usually still skewed. We use a variable bandwidth to choose the smoothing parameter h by following the idea of the conventional adaptive kernel estimator in [30]. First, find a pilot estimatef f of the density function of log X ' based on log W ' Xj , with bandwidth h and with measurement error considered; second, define local bandwidth factor l j by taking , and a[½0,1 is the sensitivity parameter; finally, define a bandwidth h j~lj h. The smoothing parameter h and the sensitivity parameter a are selected by minimizing the leave-one-out cross-validation scorê R R(h) defined aŝ where L jj~'i (log W ' X j ). Details of implementation of the errors-in-variables nonparametric regression and the construction of SCB are described in the R script available at http://gauss.usouthal.edu/publ/ada/.

Normalization methods for benchmarking
All normalization methods are performed in R, an open source statistical scripting language (http://www.r-project.org). Median normalization is performed by dividing each array by its median signal intensity, and then by rescaling them to the global median intensity of all arrays. A function ''normalize.quantile'' from R package affy can be used to perform the quantile normalization [5]. The traditional LOESS normalization method is based on the idea of the M versus A plot, which has been implemented in an R packages codelink and affy [2,3]. The LOESS-M normalization is a modification of the traditional loess normalization by subtracting the median of M from the loess fit to the MA-plot D [4]. R functions were written to implement the LOESS-M normalization. Invariants normalization is performed based on a set of probes that have medium-high mean intensity and low variance across arrays (named ''invariants'') [10]. R script at http://www.unil.ch/ dafl/page58744.html is used. VSN normalization is performed using the ''vsn2'' function from R package ''vsn'' from the Bioconductor project (http://www.bioconductor.org).

Weighted kappa test for platform reproducibility evaluation
Sensitivity and specificity are commonly used to evaluate the reproducibility or consistency between two platforms when interests are focused on whether the miRNAs (genes) being studied are differentially-expressed or not. In this study, instead of classifying the miRNAs as differentially-expressed and nondifferentially expressed, we further identify the regulation trends. When the regulation trends are also of concern, sensitivity and specificity are not convenient to be used to compare the performances of different methods for a three-way classification [9,11]. A three-way classification table is presented in Table 1. We adopt the weighed kappa test to measure the agreement between two qualitative classification schemes: k k w~( P o (w){P e (w))=(1{P e (w)), ð13Þ where P o (w)~P i P j w ij p ij , P e (w)~P i P j w ij p i: p :j , p ij~nij =n :: , p i:~ni: =n :: , p :j~n:j =n :: . We define a distance C ij~D i{jD to quantify the relative difference between categories, and use the Fleiss-Cohen weighting scheme to compute w ij~1 {C 2 ij =4 [11,[31][32][33][34]. The degree of agreement can be interpreted as follows: no agreement ifk k w v0, slight agreement ifk k w [½0,0:2), fair agreement ifk k w [½0:2,0:4), moderate agreement ifk k w [½0:4,0:6), substantial agreement ifk k w [½0:6,0:8), and almost perfect agreement if k k w §0:8 [35].