Skip to content


  • Method
  • Open Access

Combined histomorphometric and gene-expression profiling applied to toxicology

  • 1Email author,
  • 2,
  • 2,
  • 1,
  • 2,
  • 3,
  • 3, 4,
  • 3 and
  • 1
Genome Biology20034:R32

  • Received: 12 September 2002
  • Accepted: 26 March 2003
  • Published:


We have developed a unique methodology for the combined analysis of histomorphometric and gene-expression profiles amenable to intensive data mining and multisample comparison for a comprehensive approach to toxicology. This hybrid technology, termed extensible morphometric relational gene-expression analysis (EMeRGE), is applied in a toxicological study of time-varied vehicle- and carbon-tetrachloride (CCl4)-treated rats, and demonstrates correlations between specific genes and tissue structures that can augment interpretation of biological observations and diagnosis.


  • Vascular Endothelial Growth Factor
  • CCl4
  • Additional Data File
  • Tissue Feature
  • Clear Space


Recent reports describe the use of gene-expression profiling for the identification of molecular markers of toxicity [13]. This technique alone does not account for morphological changes in tissues that have traditionally been used by pathologists to discriminate between types and severity of toxicological responses [46]. For a comprehensive approach to toxicological evaluation, we developed a unique methodology that uses histomorphometric profiles, derived from machine vision, in conjunction with gene-expression profiles, termed extensible morphometric relational gene-expression analysis (EMeRGE). This novel method was evaluated on an established, extreme model of liver toxicity using carbon tetrachloride (CCl4) in rats that were dosed for 3 days and allowed to recover. Liver is relevant in toxicology as the primary organ of metabolism and detoxification; it is a recurrent target of chronic drug toxicity.

A fully automated analytical microscope equipped with machine-vision hardware and software was used to generate quantitative information about the structure and heterogeneity of liver. The histomorphometric profiles could be used to evaluate tissue heterogeneity across the tissue including regions of hepatocellular necrosis. Representative images of tissue sections from control and treated tissues are shown in Figure 1. Examples of processed sample image tiles are shown in Figure 2, where a control liver (Figure 2a) can be compared to a treated liver (Figure 2b), illustrating the significant structural damage induced by treatment with CCl4. Gene-expression profiles were generated from the same livers using DNA microarrays. The microarrays measured mRNA transcription levels of genes important in adsorption, distribution, metabolism and excretion (ADME). Previous studies of these genes, including markers of toxic stress, apoptosis, growth regulation and repair, were consistent with documented toxicologic responses to CCl4, where expression of components of cytochrome P450 and other metabolic enzymes (Cyp2C, Cyp3A18, Cyp3A9, SCS and Fmo1) were found to decrease, whereas certain genes involved in the inflammatory response and signal transduction were increased (CD44 and Lgals3) [7].
Figure 1
Figure 1

Comparison of control and treated livers. Image montages based on image tiles of (a) control and (b) treated liver.

Figure 2
Figure 2

Microscopic comparison of control and treated livers. Image tiles of (a) the control liver treated with corn oil and (b) a CCl4-treated liver analyzed by an automated microscope system. Identified structures including hepatocyte nuclei (blue), other nuclei (black), clear space (yellow) and vacuoles (green) are indicated by the overlay on the right side of each panel.

Previous studies relating gene expression to pathologic or tissue data have evaluated qualitative tissue information only [8, 9], or focused on visually identified tissue subareas or specific cells isolated by by laser microdissection (LMD) [1012]. In LMD, specific cells from a sample are collected by interactively defining areas of interest in a microscope image, which are excised by a laser for subsequent gene-expression analysis. LMD is subjective and destructive to the specimen and has limited ability to account for tissue heterogeneity. Progress in digital microscopy has allowed quantitative image analysis to generate data that objectively and completely describe tissue phenotype, free of observer disagreement [13], and with the potential to detect subtle changes that are undetectable to the human eye [14]. Tissue histomorphometric profiles in EMeRGE are correlated statistically with gene-expression profiles, characterizing each sample through both top-down phenotypic information and complementary bottom-up genomic data. Spearman's rank order correlation determined significant monotonic relationships that illuminate important connections between structural features in tissue elements and genes that have been reported as significantly up- or downregulated by CCl4 treatment in previous studies [7]. Principal component analysis (PCA), as described in Materials and methods, was carried out to reduce the complexity of the data [15] and then to relate individual animals to specific phenotypic groups. A quadratic regression classifier was used to develop a scheme that defined treated and control groups in three datasets: gene expression, tissue feature and the combination of both. This method required no pre-selection or filtering of invariantly expressing genes to classify groups as previously published. Although the classification ability was not improved when tissue features were added to gene expression, analysis of the combined data revealed different outlier animals for each dataset, presenting a more complete picture of the damage and regeneration.

Results and discussion

Liver tissues from animals treated for three days with CCl4 or corn oil vehicle were harvested 4, 7 and 14 days after administration of the first dose. Treated livers showed various degrees of hydropic degeneration, individual hepatocyte necrosis, hepatocellular fatty change, along with other less significant structural changes compared to control animals. Structural changes induced by CCl4 were accompanied by the glycogenation of cells caused by the corn oil vehicle. An overall decrease in vacuoles and glycogenation at day 7 in livers from CCl4-treated animals suggested decreased metabolism and toxin depletion over time. Increased variance in hepatocyte size was observed in livers from CCl4-treated animals, suggesting cellular proliferation, a marker for recovery from the hepatic injury that approached normality by day 14.

Correlation between tissue features and gene expression

Relationships between gene expression and tissue features can be a deep source of information about toxin mechanism. Using Spearman's rank order correlation (see Materials and methods), we tested for monotonic functional relationships between gene-expression values and tissue features. The numbers of significant correlations were determined at the following levels: 0.05, 0.01 and 0.001 (Table 1). These levels represent the cutoff probability of making a type I error, for example, the probability of determining that there exists a true correlation when in reality there is none.
Table 1

Number of genes correlated with tissue metrics at three levels ofconfidence ranked by the number of significant correlations for alpha = 0.01


Tissue metric

Confidence level






% Clear space (non-stained tissue and cellular elements)





Area % vacuoles










% H&E stained elements





Area % sinusoids





Area % other nuclei





Hepatocyte nuclei/mm2





Other nuclei/mm2





Total nuclei/mm2





Area % hepatocyte nuclei





Cytoplasmic texture




Expected false positives




Number determined significant




For details see Additional data files.

Genes that correlated with tissue features with a p-value of 0.01 or lower were chosen for additional inspection (see Additional data files). This analysis highlighted known markers of toxicity as well as many genes identified in previous CCl4 studies, most notably that for LRF-1 (M63282), a known regulator of proliferation during liver regeneration following injury, as well as the genes for vitronectin (U44845) and α glutathione-S-transferase Mu isoform (U86635) [7, 16, 17].

A number of genes correlating with tissue features were distinct from those found by a twofold ratio analysis [7]. Assessment of these genes revealed an association with the known or suspected genes involved in the biology of CCl4 toxicity. Correlations with three structural metrics were chosen for examination, providing a substantial number of genes: '% clear space', 'area % vacuoles' and 'vacuoles/mm2'. The tissue metrics % clear space and area % vacuoles were negatively correlated with at least one isoform of cytochrome P450 (3A9:U46118, 2B12:X63545 and 2B3:M20406), this may be a hallmark of recovery. One gene common to clear-space-related metrics (non-stained tissue, including vacuoles and hydrophic degeneration) was α-tubulin (V01227). Tubulin levels do not change during the acute phase of CCl4 poisoning, but have been shown to decrease in association with the development of fatty liver tissue as a result of chronic CCl4 exposure [18, 19]. In the present study, tubulin was positively correlated with the three components of clear space, perhaps indicating cellular reorganization. Also strongly correlated with all three clear-space metrics was S-adenosylmethionine synthetase (X60822) (AdoMet synthetase). A suppression of AdoMet synthetase activity has been observed in a model of acetaminophen toxicity [20]. The negative correlation of AdoMet synthetase mRNA observed in the present study suggests that CCl4 toxicity suppresses message as well, as AdoMet synthetase was most significantly correlated with % clear space.

Expression of distintegrin metalloprotease (Z48444) was positively correlated with the two vacuole-related metrics. This is an ADAM10-homolog known to be a tumor necrosis factor-α (TNFα) convertase [21], TNFα produced by Kupffer cells in the liver stimulates production of TNFα, a mitogen for hepatocytes helping to generate new cells needed to rebuild damaged areas of the liver [22]. Kupffer cells have a significant role as mediators of acute inflammation after CCl4 treatment; however, they also produce factors that can cause secondary injury in the liver through fibrogenic responses. An increase in a TNFα convertase would ensure that any residual cytokine present would be in the active stimulatory form. TNFα and its receptor, along with interleukin-6 (IL-6) and its receptor, were both upregulated in the present study, but neither was correlated with a tissue metric below the 0.01 p-value threshold.

Different spectra of cytokines are associated with toxicity or inflammation compared to tissue repair. In the present study, genes encoding many different cytokines were affected; however, not all were significantly correlated with the tissue metrics examined in this analysis. A striking exception was the family of vascular endothelial growth factors (VEGF). The role of VEGF in the recovery of necrotic liver following CCl4 treatment has been well established [23, 24]. Here, several VEGF isoforms were found to correlate with clear-space-related metrics. Isoforms B and D (AF022952 and AF014827) were positively correlated with % clear space itself, whereas isoform C (AF010302) was negatively correlated with both area % vacuoles and vacuoles/mm2. VEGF Ch was reported to be involved in cancer metastasis and tumor-cell invasion [25, 26], whereas isoforms B and D are believed to have roles in proliferation and tissue remodeling [2730], highlighting the diverse roles of these cytokines. VEGF mRNA peaks at 72 hours after CCl4 treatment in Kupffer cells, whereas it peaks at 7 days in hepatocytes, well after necrosis has resolved and when vascular and sinusoidal endothelial cells are reappearing in the tissue [23, 31]. Correlation of VEGF isoforms with specific tissue metrics suggests they have different roles in the regeneration and revascularization of severely damaged areas of the treated livers.

Both the heat-stable enterotoxin receptor (M55636) and contrapsin-like protease inhibitor CPi-26 (D00753) correlated with % clear space and area % vacuoles. The enterotoxin receptor, which is a marker for regeneration in both partial hepatectomy and CCl4 toxicity [32], was positively correlated with these tissue metrics, whereas CP1-26 was negatively correlated. SPI-3 has been shown to be a downstream target of both the inflammatory acute-phase regulators TNFα and IL-6 [33]. Positive correlation with an intermediate conductance calcium-activated potassium channel SMIK (AF190458) was common to the metrics related to vacuole density, area % vacuoles and vacuoles/mm2. CCl4 treatment perturbs the balance of many serum electrolytes [34], and an increase in the SMIK channel might help re-establish normal potassium homeostasis as the liver recovers from the toxic stress.

Correlations between genes and tissue metrics may reveal interesting differences similar to histomorphometric findings. For example, the vacuolar metrics (area % vacuoles and vacuoles/mm2) have genes uniquely correlated with each, demonstrating a potential genetic basis for the difference between the number and size of vacuoles and their density, or packing. Vacuole density was linked to the expression of wee1 tyrosine kinase (D31838) and a nuclear co-repressor (AF059311) as well as IL-5 and -12 (AJ011299 and AF083329), all involved in cell-cycle regulation and signaling. In contrast, area % vacuoles was linked to expression of interferon regulatory factor 1 (IRF-1; M34253). IRF-1 has been reported to be upregulated by TNFα during sepsis [35]. The correlation between this cytokine and area % vacuoles but not with vacuoles/mm2 shows the dynamic balance between the primary response to injury and tissue proliferation and repair. This illustrates how genes associated with each metric could help define molecular factors behind complex cellular mechanisms.

In this study, the toxic injury visible on day 4 was subsequently reversed by days 7 and 14, with decreasing impairment indicated by the levels of expression of metabolic genes. By day 14, there was a decrease in % clear space in livers from treated animals whereas the other metrics - area % vacuoles and vacuoles/mm2 - remained consistent over time (data not shown). These findings hint at the important role of genes correlated with % clear space in the balance between toxicity, repair and proliferation occurring in treated livers.

PCA and classification

PCA was used to reduce the complexity in the gene, tissue and combined profiles of each rat liver. From these analyses, we determined the principal component that identified the day-4-treated animals. The second principal component of the gene-expression data separated the day-4-treated animals from all others, including the remaining CCl4-treated days 7 and 14 (Figure 3a). The first principal component separates the two groups for the tissue-only data (Figure 3b), and both the first and second principal components track the differences between the two groups in the combined datasets (Figure 3c).
Figure 3
Figure 3

Principal component analysis. (a) First principal component versus the second principal component of the gene-expression data. The first principal component describes variance that tracks a single outlier rat in the control day-14 group. The second principal component captures the variance between the day-4 CCl4-treated group and the remaining animals. (b) First principal component versus the second principal component of the tissue data. The first principal component primarily captures the variability within the CCl4-treated day-4 group discretely from the other treatment or vehicle control groups. The second principal component describes variance between individual animals in all groups. (c) First principal component versus the second principal component of the combined data. Both the first and the second principal components capture the difference between the day-4 CCl4-treatment group and all other animals in the complete set. The treated day-7 and -14 animals completely overlap the control animals, demonstrating that the tissue appears to have returned to a nearly normal state.

Using the first two principal components, we classified the animals in treated day-4 versus control and other groups using a quadratic regression classifier for all three datasets. A cross-validation estimate of the probability of successful classification of the animals was used to determine the diagnostic strength of each type of dataset. Each step of the cross-validation calculated the appropriate principal components for all but one animal to determine the parameters of the quadratic regression classifier, and then the remaining animal was classified. This was repeated so that every observed animal was left out and classified. The resulting schema can identify which animals were correctly classified or misclassified. With this method we were able to estimate the accuracy of developing a well-defined classification method. In practice, the discrimination rule would be created on a training set of data and applied to an unknown set; therefore a high cross-validation probability of successful classification for developing a well-defined discrimination rule is important.

This analysis was carried out on gene expression and tissue features separately and on the combined dataset. Using gene-expression data only, the correct classification probability was 96.43%. Using the tissue-feature data, correct classification was 92.86%. Finally, using the combined data, accuracy for correct classification was 96.43%. The strong correlations between tissue features and genes allowed for the prediction of the tissue metrics from the gene dataset but not the genes from the tissue-feature data; note that this is due to the fact that it is easier to predict 11 tissue metrics from 1,040 genes than vice versa. Therefore, inclusion of tissue data in the combined set did not necessarily improve the classification using just the gene-expression data. Comparing the misclassified animals from both sets, however, revealed differences for genes compared to tissue features. These differences can be used for diagnostic purposes by comparing the classification of outlier animals, and to examine more closely the underlying components in both the molecular and phenotypic response to a toxin.

The introduction of robust machine-vision techniques that can consistently discern the histomorphology of tissues in a comprehensive fashion will support pathologic assessments allowing a more quantitative evaluation of phenotypic responses to compounds. Investigations can be carried out in an automated, unsupervised fashion necessary for high-throughput analyses. The EMeRGE method demonstrated correlations between specific genes and tissue structures that can augment interpretation of biological observations and diagnosis. These correlations illustrate a new paradigm that phenotypically anchors changes in gene expression to structural features in tissue. Improved outlier detection may be attained with a combination of gene-expression and tissue phenotypic data. The improved objectivity and reproducibility of phenotypic assessments correlated with gene-expression data is a step forward in tissue analysis that will impact on the definition of therapeutic margins and help refine dose levels, as well as improve identification of atypical responses. Other covariants such as clinical chemistry parameters can be added to this analysis for an even deeper view [1]. Combined analyses, such as EMeRGE, can be applied to any experimental or pathologic conditions where gene expression and tissue histology are integral to the interpretation of toxic or pharmacologic events as well as to the pathophysiology of disease.

Materials and methods

Experimental protocol

Six groups of five male, age- and weight-matched, Sprague-Dawley rats were separated into three groups of vehicle controls and three groups of CCl4-treated animals. Each rat was dosed once daily on days 1, 2 and 3 by intraperitoneal injections of either pure corn-oil vehicle or CCl4 dissolved in corn oil (approximately 15% v/v) at a dose of 1,000 mg/kg/day [1]. Animals were euthanized by CO2 asphyxiation and exsanguination following a 24 h fast on either day 4, 7, or 14, and tissues were harvested for analysis. The central lobe of each liver was harvested at necropsy, and approximately 1 g was flash frozen in liquid nitrogen. RNA was extracted using TRIzol reagent according to the protocol from Gibco-BRL. The remaining liver fraction was fixed in 10% formalin.

Sample preparation, hybridization and scanning

Total RNA was quantified and assessed for quality on a Bioanalyzer RNA chip (Agilent, Palo Alto, CA). Each chip contains a set of interconnected gel-filled channels that enables molecular sieving of nucleic acids. Pin-electrodes in the chip create electrokinetic forces that drive molecules through these microchannels and carry out electrophoretic separations. Ribosomal RNA peaks are measured by fluorescence signal and displayed in an electropherogram. A successful total RNA sample featured two distinct ribosomal peaks (18S and 28S rRNA). First-strand cDNA was prepared, labeled and processed as described in the Motorola CodeLink system protocols. Processed arrays were scanned using an GenePix Scanner (Axon Instruments, Foster City, CA); array images were acquired using the Motorola CodeLink™ Analysis Software (Amersham Biosciences, Piscataway, NJ).

ADME rat bioarray system and data preparation

The Motorola CodeLink™ ADME Rat Bioarray consists of 1,137 oligonucleotide probes corresponding to 1,040 unique clusters and 97 control probes, selected from GenBank Rodent and RefSeq build number 122. Each 30 base-pair (bp) probe is was spotted in triplicate. All samples were hybridized to two microarrays resulting in six data points per probe.

The Motorola CodeLink™ Analysis Software gives an integrated optical density (IOD) value for every spot; a unique background value for that spot is subtracted, resulting in 'raw' data points.

Histomorphometric profiling

The remaining portion of the central lobe was fixed in formalin, embedded in paraffin, sectioned (5 μm) and stained with hematoxylin and eosin (H&E). An automated analytical microscope system (ARTIS™, TissueInformatics) was used to scan the entire slide at a low resolution to identify tissue locations. The smallest possible rectangular region fully encompassing each tissue section was captured at a high resolution at 0.64 μm/pixel on a tile-by-tile basis, generating 100-400 individual digital image tiles per slide. Each individual image tile was automatically focused, captured, digitized, corrected for shading and analyzed.

The methodology was developed for image segmentation and trained for object classification using a representative set of images from controls and treated rat liver specimen from various studies, before the system was applied the specimen of this study. As a precursor to any segmentation operation the input 24-bit RGB images of H&E-prepared tissue were converted using a stain space conversion, based on PCA. This technique effectively allowed each pixel in the image to be mapped to its respective stain-absorption value to give an invariant representation of the histologic properties of the image. Once the stain value at each pixel was determined, a generalized grouping was used to account for regional features, such as nuclei, which bind the stain. Similar high-level segmentation accounted for other object types within the tissue - namely clear space, vacuoles, sinusoids and cytoplasmic material - that generally tend to absorb (or not absorb) one type of stain.

Hepatic cells comprise approximately 80% of the volume of the liver and 60% of the cell count. They vary in shape and texture (both in nuclei and cytoplasm) from one liver zone to another. Conditions such as hyperplasia and hypertrophy and other metabolic conditions may affect the appearance of nuclei and the surrounding cytoplasm as a result of tissue injury.

Definition of the invariant characteristics of hepatic nuclei made a robust segmentation possible and included the following features. Hepatic nuclei have a dark-blue appearance as a result of the absorption of hematoxylin. They may contain nucleoli, or chromatin figures internal to the nuclei. This will be exhibited as a conic boundary with a clear internal structure highlighted by dark annotations. They are generally round or polyhedral in shape. They will have a general size boundary (correlated to the image capture magnification); variance will exist between each layer of the liver. They are highlighted from the background tissue by the differential stain properties of hematoxylin, offsetting nuclear material from the surrounding cytoplasm.

For classification, a probability model was used where each pixel expressing hematoxylin stain contains a vector of attributes that denotes the probability of the pixel belonging to a nucleus object. Candidate objects can be readily found by locating local maxima in the hematoxylin-stained image (expressed as a ratio of hematoxylin and eosin with a spatial filter applied to reduce noise). The vector is composed of elements such as stain-value deviation, size, shape, and texture measurements of internally visible chromatin and/or nucleoli. Proper values for these vectors were learned by a simple neural network model and used to classify objects into appropriate categories (for example, hepatic nucleus or non-hepatic nucleus). At times, representative elements of this vector were misleading, because of artifacts in histology preparation and sectioning. These errors were resolved by applying pre-filtering to each vector element before comparison with the neural network. For example, nuclei with a 'half-moon' silhouette caused by the sectioning process will be misleading. In this case, fitting a polygon to the boundary and extracting hull-properties for the classification can make a solid differentiation.

Once the non-hepatic nuclei objects were separated from the surrounding tissue, they were differentiated further by applying spatial characteristics derived from the architecture of the liver. Both size and eccentricity acted as a differentiator, as well as encapsulation by or nearness to sinusoids (clear space). These rules were fairly invariant and had proved robust in validation protocols graded by pathologists on how well a feature was discriminated. The cellular architecture of the liver allowed for robust training data to be automatically characterized by the system. Objects were assigned 'rankings' of their probability of belonging to a given class plotted against spatial relationships, allowing for quick identification and correction of errors; for example, an object classified as a Kupffer nucleus (other nucleus) should not exist outside of a certain proximity to a sinusoid.

Other elements in the liver could be derived with similar morphometric comparisons utilizing simple pattern-recognition classifiers for optimal performance. Micro- and macrovesicular change were detected by thresholding the resultant clear-space image (derived by calculating tissue pixels that do not contain stain) in the saturation plane and locating 'round' objects. Similarly, performance was improved by localizing red blood cells, identified by strong eosin stain band expression, to oblong white objects, as they are typically found in intrasinusoidal space.

All nuclei were then masked out of the original tissue mask, and clear-space areas were identified by performing an automatic threshold in the intensity band of the masked image. Binary clear space areas were classified as vacuoles or as general white space objects based on circularity and size. A texture measure was developed, based on a gradient-filtered image, as an indicator of fine spatial changes in the cytoplasm [36]. The cytoplasm of control livers had a smooth and uniform appearance, compared to that of treated animals, which exhibits more roughness, mainly caused by vacuoles.

After training, the system was applied to the specimen of this study and all analytical steps were executed fully automatically and without further user interaction. The derived tissue metrics included hepatocyte and other nuclei, sinusoids, vacuoles including conglomerates of microvesicular change, counts of each feature per mm2, % area of each feature respective to the total tissue area of each section, total % clear space (non-stained cellular and tissue elements), and % hematoxylin stained objects (nuclei) and a texture feature. A reference set of 24 images was selected for validation. Two images from two animals from all groups, control and treated, were randomly selected. The system used in this study was validated for the detection of basic tissue elements including hepatocyte nuclei, other nuclei, vacuoles and sinusoids. Two pathologists independently inspected the classified objects indicated by colored overlays on the digital images. Typically, there are about 250-450 hepatocyte nuclei, 20-130 non-hepatocyte nuclei, 50-300 vacuoles and 70-200 sinusoids in any one image. False-positive 'hits' and missed false-negative identifications of nuclei were counted. Results for hepatocytes nuclei were less than 3% false negative and less than 2% false positive; for other nuclei, less than 7% false negative and less than 15% false positive. The percentage of correctly detected vacuoles including conglomerates of microvesicular change and sinusoids were determined by an estimation of correct identification. The result for vacuoles and conglomerates of microvesicular change were in the range of less than 40% false negative and less than 10% false positive. Smaller vacuoles could not be correctly identified because of resolution limits, but the heterogeneity in the cytoplasm could be detected by the texture analysis. Estimate for sinusoids was in the range of less than 5% false negative and less than 20% false positive.

The field-specific features clear space, hematoxylin-stained areas and texture have no direct quantifiable visual correlation; this also applies to geometric and density features that are derived from the basic tissue elements and therefore cannot be validated by human observation.

Statistical data analysis

The 'raw' microarray data, consisting of six readouts per gene, were prepared for analysis by removing outliers and undergoing normalization. Generally, outliers result in expression values that are much larger than expected. To minimize the effects of the outliers, an outlier detection filter was applied to the data. First, the median of each set of six data points was calculated, then the absolute value of the difference between each data point and the median of the set to which it belonged was calculated. The median absolute deviation was then determined. A modified z-score for each original gene-expression value was derived as the absolute difference between the original data point and the median of the set to which it belonged, multiplied by 0.6745 and divided by the median absolute difference for the set [37]. Then, if the largest gene-expression value in a given set of six values had a modified z-score larger than 8.7, it was labeled an outlier and removed from the raw dataset. Only 0.3% of all raw data points were determined to be outliers using this method. After outlier removal, the resulting datasets were normalized with respect to the median value of all raw data points on a given microarray to adjust for array-to-array variability. The medians of the remaining gene-expression values for each probe on the microarray were then calculated and used for further analysis. This data preparation resulted in one gene-expression value per gene per animal and is referred to as the 'normalized' dataset.

Spearman's rank order correlation is a rank-based adjustment to Pearson's correlation, which checks for strength of a linear relationship between two variables [38]. Spearman's rank order correlation ranks the data in each variable and then calculates Pearson's correlation with the paired ranked data, hence checking for the strength of a monotone relationship; p-values are based on permutation methods.

PCA is a statistical technique to reduce the dimensionality of a dataset while retaining as much of the variability as possible [15]. Each principal component captures as much variability as possible with a linear combination of the data and is uncorrelated with all other principal components. The resulting data points represent a projection of the dataset. PCA was conducted on all three datasets and hence there were three different input variables. With the gene-expression-only dataset, we used the 'normalized' gene expression data, 1,040 genes, for all rats. With the tissue-only dataset, we used the data from all 11 tissue metrics for all rats. Finally for the combined dataset for each rat, we combined the tissue metrics, 11 metrics, and 'normalized' gene-expression data (1,040 expression values) to give a total of 1,051 measurements, and conducted PCA on the resulting dataset.

The quadratic regression classifier [39] is given by

y = β0 + β1 (e1 - 1) + β2 (e2 - 2) + β3 (e2 - 2)2 + β4 (e2 - 2)2 + β5 (e1 - 1) (e2 - 2) + ε

where e1 and e2 are the first and second principal components respectively and ε is assumed to be independently and identically normally distributed with mean 0 and variance σ2. Also note that 1 and 2 are the arithmetic means of all of the first and second principal components. The model was estimated using the usual least-squares assumptions, where the dependent variable, y, was an indicator variable for CCl4 toxicity.

The cross-validation probability is estimated by first calculating the first two principal components of the 'normalized' data, except for one rat. Using the two principal components and the CCl4-toxicity classification, the quadratic regression classifier was fit to estimate the coefficients β0 through β5. Then the same eigenvectors, or projection estimated in the PCA, were applied to the left-out observation to estimate the first two principal components; note that this side-steps any potential prediction bias. The principal components are then plugged into the estimated quadratic regression classifier. If the resulting prediction, y, was above 0.5, then the sample was classified as suffering from CCl4 toxicity; if the value was below 0.5, then it was classified as not suffering from CCl4 toxicity. To estimate the accuracy of the classification, this was repeated so that every rat was left out and then predicted.

PCA, Spearman's rank correlations, quadratic regression classification and cross-validation were programmed in MatLab release 12. Scripts are freely available from the authors upon request.

Additional data files

An Excel spreadsheet showing genes correlating with tissue features (Additional data file 1), also represented in 4 pdf files that contain tables of genes correlating with % clear space (Additional data file 2), area % vacuoles (Additional data file 3), vacuoles/mm2 (Additional data file 4), and all other metrices (Additional data file 5). These tables include gene descriptions for the ADME chip [40], updated with descriptions provided by GenBank. Genes found to be associated with the biology of CCl4 toxicity are highlighted.



We thank Scott Spear and Loey Healy for technical assistance, and Ed Klein, Veterinary Diagnostic Associates, Murrysville, PA, and Rajiv Dhir, University of Pittsburgh, for support with histopathology. We are indebted to Drew Lesniak and Mark DiSilvestro for continual software development, and Sheila Dela Cruz and Keith Boyce for assistance with data preparation. We thank David Pot for helpful discussions.

Authors’ Affiliations

Tissue Informatics Inc., 711 Bingham Street, Suite 200, Pittsburgh, PA 15203, USA
InforMax - Invitrogen Life Science Software, 7305 Executive Way, Frederick, MD 21704, USA
Motorola Life Sciences, 4088 Commercial Avenue, Northbrook, IL 60062, USA
Amersham Biosciences, 3200 West Germann Rd., Chandler, AZ 7285248, USA


  1. Waring JF, Jolly RA, Ciurlionis R, Lum PY, Praestgaard JT, Morfitt DC, Buratto B, Roberts C, Schadt E, Ulrich RG: Clustering of hepatotoxins based on mechanism of toxicity using gene expression profiles. Toxicol Appl Pharmacol. 2001, 175: 28-42. 10.1006/taap.2001.9243.PubMedView ArticleGoogle Scholar
  2. Farr S, Dunn RT: Concise review: gene expression applied to toxicology. Toxicol Sci. 1999, 50: 1-9. 10.1093/toxsci/50.1.1.PubMedView ArticleGoogle Scholar
  3. Nuwaysir EF, Bittner M, Trent J, Barrett JC, Afshari CA: Microarrays and toxicology: the advent of toxicogenomics. Mol Carcinog. 1999, 24: 153-159. 10.1002/(SICI)1098-2744(199903)24:3<153::AID-MC1>3.0.CO;2-P.PubMedView ArticleGoogle Scholar
  4. House DE, Berman E, Seely JC, Simmons JE: Comparison of open and blind histopathologic evaluation of hepatic lesions. Toxicol Lett. 1992, 63: 127-133. 10.1016/0378-4274(92)90003-3.PubMedView ArticleGoogle Scholar
  5. Iatropoulos MJ: Appropriateness of methods for slide evaluation in the practice of toxicologic pathology. Toxicol Pathol. 1984, 12: 305-306.PubMedView ArticleGoogle Scholar
  6. Goodman DG: Factors affecting histopathologic interpretation of toxicity-carcinogenicity studies, carcinogenicity: the design, analysis, and interpretation of long-term animal studies. ILSI Monographs. Washington, DC: International Institute of Life Sciences. 1988, 109-118.Google Scholar
  7. Young MB, DiSilvestro MR, Sendera TJ, Kriete A, Magnuson SR: Analysis of gene expression in carbon tetrachloride-treated rat livers using a novel bioarray technology. Pharmacogenomics J. 2003, 3: 41-52. 10.1038/sj.tpj.6500147.PubMedView ArticleGoogle Scholar
  8. Hamadeh HK, Knight BL, Haugen AC, Sieber S, Amin RP, Bushel PB, Stoll R, Blanchard K, Jayadev S, Tennant R, et al: Methapyrilene toxicity: anchorage of pathologic observations to gene expression alterations. Toxicol Pathol. 2002, 30: 470-482.PubMedView ArticleGoogle Scholar
  9. Bissell MJ, Weaver VM, Lelievre SA, Wang F, Petersen OW, Schmeichel KL: Tissue structure, nuclear organization, and gene expression in normal and malignant breast. Cancer Res. 1999, 59 (Suppl 7): 1757-1763.PubMedGoogle Scholar
  10. Cole KA, Krizman DB, Emmert-Buck MR: The genetics of cancer - a 3D model. Nat Genet. 1999, 21(1 Suppl): 38-41. 10.1038/4466.View ArticleGoogle Scholar
  11. Klimek F, Bannasch P: Biochemical microanalysis of pyruvate kinase activity in preneoplastic and neoplastic liver lesions induced in rats by N-nitrosomorpholine. Carcinogenesis. 1990, 11: 1377-1380.PubMedView ArticleGoogle Scholar
  12. Best CJ, Emmert-Buck MR: Molecular profiling of tissue samples using laser capture microdissection. Expert Rev Mol Diagn. 2001, 1: 53-60.PubMedView ArticleGoogle Scholar
  13. Morris JA: Information and observer disagreement in histopathology. Histopathology. 1994, 25: 123-128.PubMedView ArticleGoogle Scholar
  14. Furness PN: The use of digital images in pathology. J Pathol. 1997, 183: 253-263. 10.1002/(SICI)1096-9896(199711)183:3<253::AID-PATH927>3.0.CO;2-P.PubMedView ArticleGoogle Scholar
  15. Mardia KV, Kent JT, Bibby JM: Multivariate Analysis. 1979, London: Academic PressGoogle Scholar
  16. Koukoulis GK, Shen J, Virtanen I, Gould VE: Vitronectin in the cirrhotic liver: an immunomarker of mature fibrosis. Hum Pathol. 2001, 32: 1356-1362. 10.1053/hupa.2001.29675.PubMedView ArticleGoogle Scholar
  17. Clarke H, Egan DA, Heffernan M, Doyle S, Byrne C, Kilty C, Ryan MP: A-glutathione S-transferase (alpha-GST) release, an early indicator of carbon tetrachloride hepatotoxicity in the rat. Hum Exp Toxicol. 1997, 16: 154-157.PubMedView ArticleGoogle Scholar
  18. Panduro A, Shalaby F, Weiner FR, Biempica L, Zern MA, Shafritz DA: Transcriptional switch from albumin to alpha-fetoprotein and changes in transcription of other genes during carbon tetrachloride induced liver regeneration. Biochemistry. 1986, 25: 1414-1420.PubMedView ArticleGoogle Scholar
  19. Selan FM, Evans MA: The role of microtubules in chlorinated alkane-induced fatty liver. Toxicol Lett. 1987, 36: 117-127. 10.1016/0378-4274(87)90175-5.PubMedView ArticleGoogle Scholar
  20. Shirota FN, DeMaster EG, Shoeman DW, Nagasawa HT: Acetominophen-induced suppression of hepatic AdoMet synthetase activity is attenuated by pro-drugs of L-cysteine. Toxicol Lett. 2002, 132: 1-8. 10.1016/S0378-4274(01)00549-5.PubMedView ArticleGoogle Scholar
  21. Lunn CA, Fan X, Dalie B, Miller K, Zavodny PJ, Naruka SK, Lundell D: Purification of ADAM 10 from bovine spleen as a TNFalpha convertase. FEBS Lett. 1997, 400: 333-335. 10.1016/S0014-5793(96)01410-X.PubMedView ArticleGoogle Scholar
  22. Luckey SW, Peterson DR: Activation of Kupffer cells during the course of carbon tetrachloride-induced liver injury and fibrosis in rats. Exp Mol Pathol. 2001, 71: 226-240. 10.1006/exmp.2001.2399.PubMedView ArticleGoogle Scholar
  23. Ishikawa K, Mochida S, Mashiba S, Inao M, Matsui A, Ikeda H, Ohno A, Shibuya M, Fujiwara K: Expression of vascular endothelial growth factor in nonparenchymal as well as parenchymal cells in rat liver after necrosis. Biochem Biophys Res Commun. 1999, 254: 587-593. 10.1006/bbrc.1998.9984.PubMedView ArticleGoogle Scholar
  24. Mochida S, Ishikawa K, Toshima K, Inao M, Ikeda H, Matsui A, Shibuya M, Fujiwara K: The mechanisms of hepatic sinusoidal endothelial cell regeneration: a possible communication system associated with vascular endothelial growth factor in liver cells. J Gastroenterol Hepatol. 1998, 13 Suppl: S1-S5.PubMedGoogle Scholar
  25. Nakashima T, Kondoh S, Kitoh H, Ozawa H, Okita S, Harada T, Shiraishi K, Ryozawa S, Okita K: Vascular endothelial growth factor-C expression in human gallbladder cancer and its relationship to lymph node metastasis. Int J Mol Med. 2003, 11: 33-39.PubMedGoogle Scholar
  26. Kaio E, Tanaka S, Kitadai Y, Sumii M, Yoshihara M, Haruma K, Chayama K: Clinical significance of angiogenic factor expression at the deepest invasive site of advanced colorectal carcinoma. Oncology. 2003, 64: 61-73. 10.1159/000066511.PubMedView ArticleGoogle Scholar
  27. Gunningham SP, Currie MJ, Han C, Turner K, Scott PA, Robinson BA, Harris AL, Fox SB: Vascular endothelial growth factor-B and vascular endothelial growth factor-C expression in renal cell carcinomas: regulation by the von Hippel-Lindau gene and hypoxia. Cancer Res. 2001, 61: 3206-3211.PubMedGoogle Scholar
  28. Aase K, Lymboussaki A, Kaipainen A, Olofsson B, Alitalo K, Eriksson U: Localization of VEGF-B in the mouse embryo suggests a paracrine role of the growth factor in the developing vasculature. Dev Dyn. 1999, 215: 12-25. 10.1002/(SICI)1097-0177(199905)215:1<12::AID-DVDY3>3.3.CO;2-E.PubMedView ArticleGoogle Scholar
  29. Byzova TV, Goldman CK, Jankau J, Chen J, Cabrera G, Achen MG, Stacker SA, Carnevale KA, Siemionow M, Deitcher SR, DiCorleto PE: Adenovirus encoding vascular endothelial growth factor-D induces tissue-specific vascular patterns in vivo. Blood. 2002, 99: 4434-4442. 10.1182/blood.V99.12.4434.PubMedView ArticleGoogle Scholar
  30. Olofsson B, Korpelainen E, Pepper MS, Mandriota SJ, Aase K, Kumar V, Gunji Y, Jeltsch MM, Shibuya M, Alitalo K, Eriksson U: Vascular endothelial growth factor B (VEGF-B) binds to VEGF receptor-1 and regulates plasminogen activator activity in endothelial cells. Proc Natl Acad Sci USA. 1998, 95: 11709-11714. 10.1073/pnas.95.20.11709.PubMedPubMed CentralView ArticleGoogle Scholar
  31. Lough J, Rosenthall L, Arzoumanian A, Goresky CA: Kupffer cell depletion associated with capillarization of liver sinusoids in carbon tetrachloride-induced rat liver cirrhosis. J Hepatol. 1987, 5: 190-198.PubMedView ArticleGoogle Scholar
  32. Laney DW Jr., Bezerra JA, Kosiba JL, Degen SJ, Cohen MB: Upregulation of Escherichia coli heat-stable enterotoxin receptor in regenerating liver. Am J Physiol. 1994, 266: G899-G906.PubMedGoogle Scholar
  33. Kordula T, Bugno M, Lason W, Przewlocki R, Koj A: Rat contrapsins are the type II acute phase proteins: regulation by interleukin 6 on the mRNA level. Biochem Biophys Res Commun. 1994, 201: 222-227. 10.1006/bbrc.1994.1692.PubMedView ArticleGoogle Scholar
  34. Mikhail TH, Awadallah R, El-Dessoukey EA: Effect of AMP on serum minerals in carbon tetrachloride hepatotoxicicty. Z Ernahrungswiss. 1978, 17: 47-51.PubMedView ArticleGoogle Scholar
  35. Geller DA, Nguyen D, Shapiro RA, Nussler A, Di Silvio M, Freeswick P, Wang SC, Tweardy DJ, Simmons RL, Billiar TR: Cytokine induction of interferon regulatory factor-1 in hepatocytes. Surgery. 1993, 114: 235-242.PubMedGoogle Scholar
  36. Kriete A, Schäffer R, Harms H, Aus HM: Computer based cytophotometry analysis of thyroid tumors in imprint. J Cancer Res Clin Oncol. 1985, 109: 252-256.PubMedView ArticleGoogle Scholar
  37. Iglewicz B, Hoaglin DC: How to Detect and Handle Outliers (The ASQC Basic Reference in Quality Control). Milwaukee, WI: American Society for Quality, Statistics Division. 1993, 16-Google Scholar
  38. Sregel S, Castellan NJ: Nonparametric Statistics for the Behavioral Sciences. 1988, Columbus, OH: McGraw-HillGoogle Scholar
  39. Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning - Data Mining, Inference and Prediction. 2002, Berlin: SpringerGoogle Scholar
  40. Amersham Biosciences: CodeLink reference material. []