Sources of nonlinearity in cDNA microarray expression measurements
© Ramdas et al., licensee BioMed Central Ltd 2001
Received: 23 July 2001
Accepted: 11 September 2001
Published: 18 October 2001
A key assumption in the analysis of microarray data is that the quantified signal intensities are linearly related to the expression levels of the corresponding genes. To test this assumption, we experimentally examined the relationship between signal and expression for the two types of microarrays we most commonly encounter: radioactively labeled cDNAs on nylon membranes and fluorescently labeled cDNAs on glass slides.
We uncovered two sources of nonlinearity. The first, which led to discrepancies in analysis affecting the fluorescent signals, was signal quenching associated with excessive dye concentrations. The second, affecting the radioactive signals, was a nonlinear transformation of the raw data introduced by the scanner. Correction for this transformation was made by some, but not all, image-quantification software packages.
The second type of nonlinearity is more troublesome, because it could not have been predicted a priori. Both types of nonlinearities were detected by simple dilution series, which we recommend as a quality-control step.
DNA microarray technology allows the simultaneous analyses of thousands of genes [1,2,3]. There are two major platforms for cDNA microarrays: membrane-based arrays (porous surfaces like nylon) and chemically coated glass-based arrays. In both cases, thousands of cDNA fragments are robotically deposited on the substrate. The nylon membrane microarrays are hybridized with 32P or 33P-labeled cDNA targets, and microarrays on glass are hybridized with fluorescent dye-labeled cDNA targets. After hybridization, the radioactive or fluorescent signal intensities are measured using a phosphorimager or laser scanner, respectively. The signal intensities are surrogates for the expression levels of the genes in the samples under testing and are used to make biological inferences.
A key assumption in the analysis of microarray data is that the quantified signal intensities are linearly related to the expression of the corresponding genes in the target sample. We experimentally examined this relationship. Our investigations uncovered two sources of nonlinearity: signal quenching and a nonlinear (square-root) transformation of the raw data introduced by the scanner. Users presented with the same image but using different software packages may arrive at quite different conclusions about levels of differential expression. In both cases, the nonlinearities were revealed by serial dilution experiments. Given the lack of an absolute scale for microarray measurements, we recommend serial dilution experiments as a quality-control step.
Measurement of fluorescent signals from glass-based microarrays
Measurement of radioactive signals on a membrane array
The membrane had been scanned by the STORM PhosphorImager at a 45° angle (not by design). Because neither GLEAMS nor ArrayVision cope well with microarray images at this angle, we loaded the GEL file into ImageJ , an image-editing program available from the National Institutes of Health (NIH). We rotated and cropped the image, and saved it as a Tagged Image Format File (TIFF) (Figure 4b) which was loaded into both commercial software packages. The results from both packages indicated that the signal intensities were proportional to the square root of the true concentrations (Figure 5), in disagreement with both theory and the ImageQuant results. In fact, the pixel-by-pixel intensity data are square-root-transformed before being saved as a GEL file. When an image-editing program (such as ImageJ) processes these data, tags describing this transformation are not preserved in the resulting TIFF file.
This study was conducted to assess the response linearity of measurements from cDNA microarray experiments using the two most frequently used systems. The study was performed not only because of the general need for quality control, but also because of the complexity of the process of acquiring data from microarrays. Images and data are often transferred between different computer programs, and many instruments used for microarray research are new and insufficiently tested. Thus, it is rather optimistic to take the numbers generated from a series of machines and software at face value. Simple dilution experiments revealed problems that have implications for the biological interpretation of gene expression data produced from microarray experiments.
Our experiments on glass provided an assessment of the degree of signal quenching for the two fluorescent/glass microarrays. In dilute solutions fluorescence intensity is linearly proportional to the concentration with all other parameters being constant. However, in a sample with absorbance exceeding 0.05 at the emission wavelength, the relationship becomes nonlinear and the measurements are distorted (by self absorption, inner filter effect, quenching) [4,6]. Fluorescence properties of such labeled DNA probes have been studied [7,8].
Our experiments on membranes provide instances where different microarray-specific image-analysis programs were applied to the same images and produced divergent results. In each instance, at least one of the software packages produced results that were linearly related to the square root of the results produced by another package. The significance of this finding for the biological interpretation of gene expressions is very clear. Where users of software package 1 might detect, for example, a four-fold change in gene expression, users of software package 2 would see only a two-fold change. If two-fold change is set as a threshold, the same data can be viewed as significant or insignificant, depending on which software package is used.
The explanation for the divergent results in our experiments is simple: the hardware (scanner) applied a mathematical transformation to the data before writing them to the image file. The nature of this transformation was not communicated to the software (image-quantifying program) that analyzed the data. Consequently, the software assumed (incorrectly) that the values in the file were linearly related to the original intensity levels.
In our case, the STORM PhosphorImager produced a GEL file. This file contained numerical values for each pixel, which need to be squared to exhibit the proper linear relationship. The problem lies with the fact that the internal structure of a GEL image file is essentially identical to that of a TIFF image file, so any program that can read a TIFF file can read a GEL file and even manipulate the contents as if it were a TIFF file. But, if the file is then saved as a TIFF file, its GEL file origins are lost. This leads to two scenarios for bad data. In the first scenario, a GEL file is loaded into two software packages. Software package 1 recognizes that a GEL file includes a nonlinear transformation and corrects for it. Software package 2 treats the GEL file as a TIFF file and does not correct for nonlinearity. The results from the two packages therefore disagree. In the second scenario, the GEL file is saved as a TIFF file after editing. Software package 1, which correctly dealt with a GEL file, now sees a TIFF file and applies no transformation because none is generally needed for TIFF files. Software package 2 sees a TIFF file and deals with it as before. The results from the two packages now agree, but both are wrong because we have removed the information the packages need to perform correctly.
It is worth pointing out another common instance where a square-root transformation is applied to microarray data. In a two-color fluorescence experiment, the microarray is scanned twice, at different wavelengths corresponding to the different dyes used in the assay. Each scan is saved as a separate 16-bit grayscale image. It is possible to combine the two grayscale images into a single 24-bit color image, sometimes called a false-color image. One simply imports the first image into the red channel and the second image into the green channel. However, a 24-bit full color image allocates only 8 bits to each channel. In order to pack a 16-bit number representing the scanned intensity into an 8-bit space, some information must be discarded. For instance, the software operating the GenePix 4000A Microarray Scanner (Axon Instruments, Foster City, CA) provides four packing options (note that the Axon manual says that packing is a bad idea if investigators want to get numbers from the image later). The default option is to perform a square-root operation. The remaining options preserve linearity, but truncate the data, either by preserving low values, preserving high values or preserving middle values. Although it is tempting to discard the two grayscale images and save only the full-color image, doing so would unavoidably discard essential aspects of the data.
The primary data produced by a microarray experiment is the original scanned image, which is stored as a computer file. Any processing of this image file has the potential to change, lose or otherwise corrupt data. We have seen that square-root transformations are incorporated in some programs. All general-purpose image-editing programs provide multitudes of additional transformations that can be used to brighten, sharpen or smooth images. Even though the square-root transformation appears to be the only transformation in common use among current generations of scanners, it is conceivable that other transformations may be introduced in the future.
In summary, when designing a protocol for a set of microarray experiments, researchers should perform dilution series as one of their standard calibration experiments. Processing of the array through the scanner and quantification software that will be used in the experiments can confirm that the reported results are linearly related to the known input values.
For the experiments on glass, cyanine 3-labeled (Cy3), cyanine 5-labeled (Cys)  and unlabeled 3omer oligonucleotides were synthesized (Synthegen, Houston, TX). Plain glass slides from Fisher Scientific were coated with polylysine according to the published procedure . An arrayer from Genomic Solutions (Ann Arbor, MI) was used to spot the oligonucleotides onto the treated glass slides. A 48-pin head from Genomic Solutions was used to create an array design of a 2 × 5 grid of 8 × 8 patches with a spot spacing of about 400 μm.
The slides were scanned on a GeneTac LS IV laser scanner (Genomic Solutions) with laser energy sources for measuring Cy3 and Cy5 fluorophore. Data from the dual-lasers are collected as separate TIFF files for each of the two lasers.
The images were processed using the analysis software program ArrayVision, version 5.1 (Imaging Research, Inc., St Catherine's, Ontario, Canada) and GLEAMS version 2.0 (NuTec Sciences, Houston, TX). Background-corrected intensity was determined for each element of each array.
For the experiments on membranes, 1 μl 32P-α-dATP stock solution (NEN Life Science Products, Inc., Boston, MA) was first diluted 100 times, then 5 μl of this mixture was diluted two-fold by adding 5 μl water. This process was repeated to generate a serial dilution. Next, 1 μl of each diluted sample was spotted onto a nylon membrane. After hybridization, the nylon membrane was exposed to the STORM Phosphorlmager from Molecular Dynamics (Sunnyvale, CA), which produced a GEL image file. ImageQuant analysis software (Molecular Dynamics) was used to quantify the images.
For the follow-up experiment, a GF200 Human GeneFilter microarray was purchased from Research Genetics (Huntsville, AL). Total RNA was isolated from a GA-10 Burkitt lymphoma cell line (a kind gift of Aaron Rapaport, University of Maryland). Ten μg total RNA were reverse-transcribed and 33P-labeled following the standard procedure. The labeled cDNAs were hybridized to the GeneFilter.
We thank Kenneth Hess, Jing Wang, Mini Kapoor and David Stivers for their critical comments on this study and Marla Bordelon for editorial assistance. This work was partially supported by Tobacco Settlement Funds appropriated to M.D. Anderson Cancer Center by the Texas Legislature, a donation from the Kadoorie Foundation, and a grant from Texas Higher Education Coordinating Board under grant number 003657-0039-1999.
- Lockhart DJ, Winzler EA: Genomics, gene expression and DNA arrays. Nature. 2000, 405: 827-836. 10.1038/35015701.PubMedView ArticleGoogle Scholar
- DeRisi JL, Iyer VR: Genomics and array technology. Curr Opin Oncol. 1999, 11: 76-79. 10.1097/00001622-199901000-00015.PubMedView ArticleGoogle Scholar
- Khan J, Saal LH, Bittner ML, Chen Y, Trent JM, Meltzer PS: Expression profiling in cancer using cDNA microarrays. Electrophoresis. 1999, 20: 223-229. 10.1002/(SICI)1522-2683(19990201)20:2<223::AID-ELPS223>3.0.CO;2-A.PubMedView ArticleGoogle Scholar
- Kubista M: Experimental correction for the inner-filter effect in fluorescence spectra. Analyst. 1994, 119: 417-View ArticleGoogle Scholar
- ImageJ. [http://rsb.info.nih.gov/ij/]
- Cantor CR, Schimmel PR: Biophysical Chemistry: Techniques for the Study of Biological Stucture and Function. San Francisco: W.H. Freeman and Co;. 1980Google Scholar
- Mujumdar RB, Ernst LA, Mujumdar SR, Lewis CJ, Waggoner AS: Cyanine dye labeling reagents: sulfoindocyanine succinimidyl esters. Bioconjug Chem. 1993, 4: 105-111.PubMedView ArticleGoogle Scholar
- Randolph JB, Waggoner AS: Stability, specificity and fluorescence brightness of multiply-labeled fluorescent DNA probes. Nucleic Acids Res. 1997, 25: 2923-2929. 10.1093/nar/25.14.2923.PubMedPubMed CentralView ArticleGoogle Scholar
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995, 270: 467-470.PubMedView ArticleGoogle Scholar