Reducing the Standard Deviation in Multiple-Assay Experiments Where the Variation Matters but the Absolute Value Does Not

When the value of a quantity for a number of systems (cells, molecules, people, chunks of metal, DNA vectors, so on) is measured and the aim is to replicate the whole set again for different trials or assays, despite the efforts for a near-equal design, scientists might often obtain quite different measurements. As a consequence, some systems’ averages present standard deviations that are too large to render statistically significant results. This work presents a novel correction method of a very low mathematical and numerical complexity that can reduce the standard deviation of such results and increase their statistical significance. Two conditions are to be met: the inter-system variations of matter while its absolute value does not, and a similar tendency in the values of must be present in the different assays (or in other words, the results corresponding to different assays must present a high linear correlation). We demonstrate the improvements this method offers with a cell biology experiment, but it can definitely be applied to any problem that conforms to the described structure and requirements and in any quantitative scientific field that deals with data subject to uncertainty.

two columns correspond to the average µ of the three assays for each system, and the associated standard deviation (or error) σ. The units are irrelevant for the discussion.
The different systems can be anything, from cities to DNA sequences, from people to chunks of metal. They can even be the same system at different times if the quantity x is expected to evolve in some reproducible manner. The differences among the assays could be due to the experiments being performed by the same researcher on different days, by different (but in principle equally skilled) researchers using the same equipment, by the same researcher using different (but in principle equally accurate) equipment, by different (but in principle equally proficient) laboratories, etc. As long as we expect different assays to yield the same results.

The problem
The standard deviation from the systems' averages across assays in tab. 1 is comparable to the average itself for most of the systems. Only on a couple of them you are 'lucky' enough so that the former is about half the value of the latter. You check the corresponding chart in fig. 1, and you see the same despairing situation. The error bars are humongous, and this will render your results statistically insignificant if you perform, for example, a Student's t-test to check whether or not the observed differences are real.

The requirements
If two requirements about your problem and your results are met, you can apply the correction method in the next section to reduce the standard deviations and increase the statistical significance of your data: • The absolute value of x for each given system is not really very important to you.
What you are really interested in properly measuring is the variation in x from one system to another. For example, whether or not you could safely claim that the value of x corresponding to system 1 is larger than, and approximately the double of, that associated to system 5.
• Even if you seem to be measuring huge differences in absolute value across the different assays, the 'tendency' of the variations is similarly captured in all of them. You can check this by looking at a graphical representation of your data such as the one in fig. 2, or you could be safer and check for high linear correlation between each pair of assays by performing a number of least-square linear fits (placing one array in the x-axis and the other one in the pair in the y-axis).

The correction method
Perform a linear fit for every pair of assays k and l, with k = l, and k = 1, . . . , N , placing assay k in the x-axis and assay l in the y-axis. For this, compute the averages, A k and A l , of the measured quantity across systems for each assay in the pair: Of course, A l is obtained just changing k by l in this expression. Compute the standard deviation in A k (and A l ): Compute the covariance between the values in assay k and those in assay l: Take these quantities to the slope b kl and the intercept a kl : defining the best fit line: The results of these fits allow you to check for the required high linear correlation mentioned in the previous section. This is done by computing the Pearson's correlation coefficient for every pair of assays k and l: In the first three columns of tab. 2, we can see that r kl is close to 1.0 for all pairs in tab. 1. We can therefore suspect that our correction method will produce sizable improvements in the data. Now, for each k compute the average correlation coefficient r k of the k-th assay with respect to all the rest of them: and pick the one with the largest r k as the reference assay, i.e., the one against all the other assays will be corrected. The values for the example in tab. 1 are presented in the last column of tab. 2. We see that, in this case, the reference assay is the second one.
assay 1 assay 2 assay 3 r k assay 1 0.000 0.947 0.852 0.900 assay 2 -0.00 0.881 0.914 assay 3 --0.000 0.867 Table 2: Pearson's correlation coefficient r lk between each pair of assays in tab. 1. The last column displays the average r k of each assay with respect to all the rest of them.
Finally, denote by f the value of the index k that corresponds to the reference assay (f = 2 in our example) and usex l j for the corrected value associated to the original quantity x l j (system j, assay l). Now, the correction formula reads like this: In order to produce the whole set of corrected results, you should apply this for all assays l = f , with l = 1, . . . , M , and for all systems with the index j = 1, . . . , N .
In tab. 3 and fig. 3, we show the numerical values and the bar charts for the corrected results obtained from the example in tab. 1 through the application of the correction in eq. (8).
assay 1 assay 2 assay 3μ ±σ system 1 6.   As you can see the standard deviations as well as the associated statistical significance have greatly improved. If your data fits into the basic setup and satisfies the requirements, you will probably see a similar improvement. Enjoy! All the formulae needed to compute the linear fits, the inter-assay correlation coefficients, as well as the correction in eq. (8) are provided in this section and they are very simple. The reader can choose to implement them in any spreadsheet of her liking, or she can use the Perl scripts we have written for the occasion and which can be found here.
For more information, check the complete article at: http://arxiv.org/abs/1309.2462