BeadArray Expression Analysis Using Bioconductor

doi:10.1371/journal.pcbi.1002276

Figure 1.

Overview of the technology and workflow.

(A) A zoomed view of a typical bead (top) with the pixels that contribute to the overall (red square) and local background (yellow squares) signals marked. Many replicate beads that contain the same 50-mer oligo are located on each BeadArray (middle) to ensure robust measures of expression can be obtained for each probe in a given sample. Around 48,000 different probe types are assayed in this way per sample. These BeadArrays come from a WG-6 BeadChip (bottom), which is made up of a total of 12 arrays, which are paired to allow transcript abundance to be measured in a total of six samples per BeadChip. (B) Summarizes the various data formats available along with the Illumina workflow associated with the different levels of data. Data can be in raw form, where pixel-level data are available from TIFF images, allowing the complete data processing pipeline, including image analysis, to be carried out in R. The next level, referred to as bead-level, refers to the availability of intensity and location information for individual beads. In this format, a given probe will have a variable number of replicate intensities per sample. Processed data, where replicate intensities have been summarized and outliers removed to give a mean, a measure of variability, and a number of observations per probe in each sample, is the most commonly available format. Summary data are usually obtained directly from Illumina's BeadStudio/GenomeStudio software, but can also be retrieved from public repositories such as GEO or ArrayExpress. The right-hand column of this figure indicates the R/Bioconductor packages that can handle data in these different formats. Probe annotation packages are also listed. List of abbreviations and footnotes used in this figure: QA, quality assessment; DE, differential expression; ∧, package available from CRAN [46]; *, denotes chip-specific part of package name that depends upon platform version (e.g., v1, v2, v3, v4).

More »

Expand

Table 1.

Summary of the processing methods recommended for different levels of data.

More »

Expand

Figure 2.

Various diagnostic plots which are useful for quality assessment.

Where scanner metrics information is available, arrays within a particular experiment can be compared to each other, or to a wider set from the same core facility. In (A), a per array signal-to-noise value (95th percentile of signal divided by the 5th percentile) is plotted for 200 consecutive BeadArrays, with the arrays from the experiment in question highlighted in color (blue or red). Low signal-to-noise ratios indicate a poor dynamic range of intensities and can highlight problems with array processing when they occur sequentially over time. At the individual array level, sub-array artefacts can be detected using spatial plots of the intensities across the BeadArray surface (B) and removed using BASH and outlier removal. For a between sample display, boxplots of the intensities from different arrays within an experiment can highlight samples with unusual signal distributions (C). The relationships between different samples can also be assessed using a multi-dimensional scaling (MDS) plot (D), which can highlight true biological differences between samples (in this example, the difference between UHRR and Brain in dimension 1 and the pure versus mixed samples in dimension 2), as well as technical effects due to lab, experiment date, etc., which may also need to be accounted for in the modelling.

More »

Expand