The authors have declared that no competing interests exist.
Conceived and designed the experiments: AP DT. Performed the experiments: AP JB. Analyzed the data: AP PAN DT. Wrote the paper: AP PAN DT.
Although microarrays are analysis tools in biomedical research, they are known to yield noisy output that usually requires experimental confirmation. To tackle this problem, many studies have developed rules for optimizing probe design and devised complex statistical tools to analyze the output. However, less emphasis has been placed on systematically identifying the noise component as part of the experimental procedure. One source of noise is the variance in probe binding, which can be assessed by replicating array probes. The second source is poor probe performance, which can be assessed by calibrating the array based on a dilution series of target molecules. Using model experiments for copy number variation and gene expression measurements, we investigate here a revised design for microarray experiments that addresses both of these sources of variance.
Two custom arrays were used to evaluate the revised design: one based on 25 mer probes from an Affymetrix design and the other based on 60 mer probes from an Agilent design. To assess experimental variance in probe binding, all probes were replicated ten times. To assess probe performance, the probes were calibrated using a dilution series of target molecules and the signal response was fitted to an adsorption model. We found that significant variance of the signal could be controlled by averaging across probes and removing probes that are nonresponsive or poorly responsive in the calibration experiment. Taking this into account, one can obtain a more reliable signal with the added option of obtaining absolute rather than relative measurements.
The assessment of technical variance within the experiments, combined with the calibration of probes allows to remove poorly responding probes and yields more reliable signals for the remaining ones. Once an array is properly calibrated, absolute quantification of signals becomes straight forward, alleviating the need for normalization and reference hybridizations.
Microarrays have been extensively used for examining gene expression and for detecting single nucleotide polymorphisms (SNPs) or copy number variations (CNVs) in genomic DNA
We argue that one reason for the uncertainty in the interpretation of the array output is insufficient measurement of experimental noise in current protocols. In the first generation array platforms (spotted arrays), the noise problem was mostly due to uneven surfaces of arrays and variability between arrays (e.g., ref
Another problem for the optimal design of arrays is the uncertainty of probe binding behavior. Although many parameters have been identified that affect probe binding behavior
Our revised design for microarray experiments includes an estimate and control of experimental noise, as well as calibration of probes with a biological sample. Specifically, the calibration of probes allows one to identify poorly responding probes and subsequently remove them from the analysis. In addition, calibration allows one to directly determine target concentrations in biological samples from signal intensity, without the need to use reference hybridizations. To show that these procedures can improve the accuracy of quantitative measurements using microarrays, we use two types of test arrays: one with short (25 mer) probes and another with long (60 mer) probes. Using test hybridization and adjusted statistical procedures, we show that a major improvement of signal reliability can indeed be obtained.
All animal work followed the legal requirements, was registered under number V312-72241.123-34 (97-8/07) and approved by the ethics comission of the Ministerium für Landwirtschaft, Umwelt und ländliche Räume, Kiel (Germany) on 27. 12. 2007.
The general workflow of the revised design is depicted in
Workflows are from top to bottom and equivalent stages are set next to each other. New steps are in blue type face. Both workflows represent only general schemes and further variations are possible. For example, we discuss also an additional step for the target labeling procedure in the text (denoted by an asterisk in step 3).
Genomic DNA (gDNA) and RNA was labeled according to the manufacturer's recommended protocol (Agilent). For the gDNA and RNA dilution series experiments (
The dilution series was created by pooling the labeled samples, serially diluting the pool, and hybridizing each diluted sample to an independent array. The arrays within the white box were used to calibrate the probes. One array marked with an asterisk (*) was used as a ‘reference’. The independent ‘test’ array ($) is also shown. Numerical values indicate the target concentration in folds of the recommended concentration.
The Freundlich equation is:
The purpose of determining the relative error of the mean signal intensity
The differentials are equivalent to standard deviations
The error for calculated concentration is found according to the error propagation theorem
The data were stored and analyzed in an MS SQL database. We wrote three C++ programs to analyze the data for users. The program executables, documentation, the programs' code as well as an example dataset are provided in
The probe lists and microarray data were submitted to datadryad.org and are available under doi:10.5061/dryad.57ms3.
Our study was initially motivated by an attempt to use the Affymetrix mouse genome diversity arrays
We analyzed the range of signal intensities of these invariant probes and found that it spans over almost four orders of magnitude, i.e. deviate significantly from an expectation of similar hybridization efficiency. We assessed whether differences in GC composition or Gibbs free energy parameters could explain this, but neither parameter was significantly correlated with the signal intensities of the probes (Figure S1A and B in
To investigate this further, we compared the experimentally measured melting temperatures of five sense-antisense probe pairs on the array and in solution (Table S1 in
We conjectured that there are two major sources that produce uncontrolled variance. The first source is the experimental variance of signal generation, i.e., hybridization and washing, and the second is the poorly known probe responsiveness (as shown above). A further source of error may be the variance in sample preparation (e.g.,
Experimental variance of signal generation can be measured by replicating identical probes on the same array. Assuming homogeneous hybridization conditions across the array (which is mostly the case for today's commercial hybridization systems), one should expect that the variances of the signals coming from these identical probes are a direct measure of technical noise associated with the hybridization itself.
Probe responsiveness can be empirically assessed by hybridizing an array with a dilution series of a given mix of targets, e.g., genomic DNA (gDNA). The individual probe hybridization isotherms can then be obtained by plotting the relationship between the diluents (target concentrations) and signal intensities. Their shape will reveal if the isotherms follow a predictable dose-response relationship and thus can be used for quality filtering, e.g., to remove non-responsive probes.
Below, we assess a revised experimental design, outlined in
We assessed the extent of measurement error associated with hybridization and probe binding that is inherent in the standard microarray procedure by comparing the signals from ten replicated probes within each array. The arrays were hybridized with genomic DNA (gDNA) using the dilution series depicted in
(A) Typical Agilent array isotherms obtained using a dilution series of genomic mouse DNA, BL6 strain for a single probe and its replicates. Raw data (gray) and predicted isotherm based on the average signal intensity (black). (B) Mean and standard deviation of the coefficient of variation (CV) across all probes at each concentration for the 25 mer array, (C) same for the 60 mer array.
The majority of probes have a variation coefficient of ∼12 to 35% for the 25 mer array (
Calibration can be used to determine the probe response function (i.e., calibration curve) and thus to remove poorly responding probes. In order to obtain calibration parameters of each probe, one has to determine the respective equation parameters, e.g., R2,
Panels A to F, Freundlich model, Panels G to I, Langmuir model. Panels A to C, 25: Distribution of R2 across all probes. Panels B and E: Distribution of
In contrast to gDNA arrays (such as CNV arrays), expression arrays are usually hybridized with mRNA targets. The optimal labelling procedure for RNA involves a RNA synthesis step
There is an additional problem with RNA calibration because different mRNAs occur at different concentrations in a given sample. Specifically, probe signal intensities of mRNAs expressed at low levels (i.e., at low concentrations) will fall below the background level (Figure S3 in
In our first test we used identical DNA samples against each other (a reference and a test array, marked * and $ in
Since the sample is compared against itself, the log2 ratio should be 0, i.e. all values above or below 0 are experimental noise. (A) Classical reference procedure - ratio of signal intensities between all individual probes (
The second test was aimed at assessing signal improvement in an actual experiment. Specifically, we compared the conventional analysis procedure using Agilent software to our calibration approach using a given CNV region in the mouse genome. The CNV analysed consisted of an approximate 5 kb fragment present in variable copy numbers between wild type individuals, but only one copy in the reference strain (C57B1/6).
Four different wild type mice were analyzed, each represented as a track. Top: output from the ratio analysis implemented in the Agilent software (ratio with respect to DNA from an C57Bl/6 inbred mouse strain). The input was the concentrations derived from the ten averaged probes, but without calibration and without removal of non-responding probes. Colored dots represent values larger (blue) or smaller (red) then log2 = 0.5. Bottom: concentration calculations based on the full revised method, non-responding probes removed (>20% error in any of the experiments on the array). The values were normalized with respect to average intensities on the array and are displayed as custom track in the UCSC genome browser.
Although the above experiments used 10 replicated probes for averaging, it would be of interest to know whether this is an optimal number. To address this question, we randomly selected 2 to 10 replicated probes from both 25 mer and 60 mer arrays and back-calculated the expected concentration of targets for the standard experiments (target concentration of 1×). The calculation was based on the calibration equations and parameters derived from 10 replicated probes because they are closest to the truth. As to be expeccted, we find a higher variance for estimating the true concentration when fewer probes were used (
CV averages are displayed, circles, 25
Although the averaging and calibration removed much of the noise, a known additional source of noise comes from target preparation. Specifically, the target fragmentation and labelling procedures involve several enzymatic steps (i.e., PCR enzymatic digestion), which have previously been reported to introduce variability
(A) Sample $ (
Given the broad application of microarrays in biological research and their success in determining gene expression patterns and structural genomic variation, one might ask:
The results presented above may be considered as a proof of principle that assessing experimental noise and calibration can indeed improve microarray output. It will evidently be necessary to do large scale comparative experiments to fully assess the possible impact. The calibration procedure was already applied in one experimental study and did indeed yield a much better resolution of signals to allow clearer biological conclusions
Our procedure is generally based on a common approach used in physics and analytical chemistry to experimentally determine the performance of a sensor (i.e., probe) and the magnitude of a measurement error. Once the measurement error is known, simple statistics can be used to obtain estimates of the true values. Knowing the error distribution, one can also reject outliers. We have conjectured and experimentally verified that there is indeed such an error distribution at the level of probe binding and target preparation (the labeling procedure). Our results show that the fidelity of estimating the true target concentration increased significantly with multiple replications of the probes and that this fidelity was dependent on probe length. Hence, we recommend that probe replication and averaging (steps 2 and 7 in
An additional element that we introduced into our revised design is the calibration of each individual probe (steps 4 and 8). It is often assumed that bioinformatic procedures for probe design are sufficient to optimize probe behavior. However, while some optimization is certainly possible in this way, it is evident that it does not fully solve the problem of huge differences in binding affinities between different probes (Figure S1 in
However, proper calibration is a challenge, since one needs to know the exact concentration of the target that is used for calibration. In case of calibration with a complex RNA sample, one does of course not know this and any measurement can therefore be only with the reference to the sample that was used for calibration, i.e. calibration yields only a small advantage over the normal reference hybridization procedure. The situation is a little bit better when gDNA is used for calibration, although it has also the uncertainty that the gDNA sample used may include regions that are subject to unknown CNV.
Using the biological sample itself for calibration entails also the risk that one is not only calibrating for the specific signal, but also for any unwanted nonspecific hybridization. The problem of cross-hybridization by similar target sequences can usually be addressed by applying algorithms in the probe design phase, provided full genome information is available. It remains a problem, though, that the total signal intensity contains specific and nonspecific hybridization signal and this will be probe-specific. Hence a remedy would be to design more than one probe for a given region (e.g., a gene) and compare the signals.
The best calibration would therefore be achieved with pure synthetic target DNA or RNA, but for arrays that are designed to record patterns from complex targets, this will evidently be very costly. Still, such an investment should be warranted for standard arrays, e.g., cancer research, hereditary diseases, etc., that are used in many experiments, since the data quality that can be obtained in this way would not require further verification experiments. We anticipate therefore that properly calibrated arrays will eventually become available. For the time being, one can use a well-defined target preparation for calibration.
(DOC)
(ZIP)
We thank E. Blom-Sievers for technical support and the members of the laboratory for discussions. Discussions among the participants of the MPI international conference entitled “Physicochemical fundamentals of DNA hybridization on surfaces as applied to microarrays and bead-based sequencing technologies” at Ploen, Germany on May 9 to 12, 2011(