Host-pathogen studies in the post-genomic era
© GenomeBiology.com 2000
Published: 4 August 2000
Several studies are starting to show the power of DNA microarrays to identify interactions between animal hosts and their pathogens, and have revealed interesting correlations between host responses to different infectious agents.
Post-genomic research is now firmly established as a major scientific discipline in the new millennium. The first working draft of the human genome is now available, and predictions of the human gene content will be available soon. Virology has been in the post-genomic era since 1977, with the sequencing of the ∅X174 genome , and GenBank now holds more than 1,000 complete viral genomes. Bacteriology has also been post-genomic since completion of the Haemophilus influenzae genome sequence in 1995 . Parallel to the sequencing of large genomes has been the rapid development of methods for studying the expression of the information they encode. With the advent of DNA microarray and chip technologies, gene expression can now truly be explored on a 'genome scale' . In research into infectious disease, we are now rapidly approaching the time when it will be possible to study gene expression of both host and pathogen at the whole-genome level. Realizing the promise of the post-genomic era is, however, largely dependent on harnessing expertise from all aspects of biology, underpinned in an integrative manner by computational biology. This is particularly relevant in host-pathogen studies, which, as well as 'post-genomic' scientists, require virologists, bacteriologists, parasitologists, immunologists and cell biologists.
Ways of studying gene expression
Large-scale expression studies now mean it is possible to define an organism's phenotypic state in any given condition according to which genes are expressed. This has been defined as the 'transcriptome'. Large-scale gene-expression mapping using arrays is motivated by the premise, based on the central dogma of molecular biology, that the functional state of the organism is largely determined by the information carried by its expressed genes. In reality, things are not that simple, as the relationship between the absolute amounts of some proteins and the level of their corresponding transcripts is more complex than a simple linear one. Nevertheless, much can be gained from this type of study.
There are several different methods of measuring gene expression, including quantitative RT-PCR, serial analysis of gene expression (SAGE), Affymetrix-type oligonucleotide microarray 'chips' and DNA-based microarrays (Table 1). I concentrate here on the use of microarrays (see Box 1).
One current problem with the different methods of quantifying gene expression is the lack of systematic assessment of the comparability of results. Each method tends to produce different representations of a gene expression level. It is widely acknowledged that experiments using the same samples but a range of methods are urgently required in order to understand the relative merits of each system . This is important, as it is unlikely that one method of measuring gene expression will be universally accepted. Over time, however, there may be a gradual shift to the use of one broad type of methodology, as occurred with the widespread preference for Sanger dideoxy-chain terminator sequencing over Maxam and Gilbert chemical degradation sequencing. Currently, DNA arrays seem to be the method of choice for monitoring of large-scale gene expression.
Host and pathogen gene expression
Despite being still in their infancy, DNA arrays have been used to study host and pathogen gene expression profiles for four viruses - human cytomegalovirus (HCMV) [5,6], human herpesvirus 8 (HHV8) (R.G. Jenner, M. Mar Albà, C. Boshoff, and P. Kellam, unpublished observations), human immunodeficiency virus type-1 (HIV-1)  and human papillomavirus type 31 (HPV31) , as well as two bacterial pathogens - Listeria monocytogenes  and Salmonella . Two studies focused on the complete gene expression profiles of the pathogen ( and our unpublished observations) with the rest focusing on the expression of subsets of host genes (Table 2). Most of these studies experienced the problems inherent in dealing with the masses of data produced with DNA arrays and confined their analysis to listing genes that were up- or downregulated. Our study of HHV8 gene expression used cluster analysis , a tool for rationalizing gene expression patterns into groups of coordinately expressed genes. Cluster analysis has been used to group genes involved in similar processes and provides an insight into the biology of the system studied [12,13]. In our study of HHV8, this analysis provided further information on the coordination of viral gene expression during replication.
Common patterns of host gene expression in response to different pathogens are difficult to determine from the current studies. This is mainly due to the different systems used and inconsistencies in the annotation of host genes. Many responses of the host to different pathogens are already known , but a more comprehensive whole-genome analysis may have far-reaching effects on understanding the pathogenesis of different infections. From the five studies focusing on the host response (Table 2), it is possible to determine a small number of genes that are consistently detected as up- or down-regulated (Table 3). Infection with both bacterial pathogens upregulates expression of the chemokines interleukin-8 (IL-8), GROβ (macrophage inflammatory protein 2α, MIP2α) and leukemia inhibitory factor (LIF). IL-8 is released by several cell types in response to an inflammatory stimulus and is a chemoattractant for neutrophils, basophils and T cells. GROβ is also known to be expressed at sites of inflammation, and LIF is able to induce hematopoietic differentiation of myeloid progenitor cells. Therefore, expression of these chemokines is consistent with the need to attract and activate leukocytes to bacterially infected tissues.
Tyrosine phosphorylation and interaction of signaling proteins are the foundation of many signaling pathways. General control of tyrosine phosphorylation of signaling molecules is accomplished through the action of phosphotyrosine phosphatases (PTPs). It is necessary for cells that both protein PTPs and protein tyrosine kinases maintain their physiological balance in order to sustain normal regulation of events dependent on phosphorylated tyrosine residues. Inhibitors of certain PTPs have been shown to inhibit the growth of the protozoan pathogen Leishmania , owing, in part, to increased sensitivity of host cells to interferon-γ stimulation. On the other hand, inhibitors of PTP have also been shown to activate the replication of HIV-1 by both NFκB-dependent and -independent pathways . Taken together, this suggests a reason for pathogen modulation of different PTP genes as indicated in Table 3 and indicates that pathogens may exploit PTPs during their replicative cycle.
It will be interesting to determine whether the host produces a consistent broad response to viruses or bacterial infections, or if the host is able to discriminate and tailor its response to different types of virus - for example, poliovirus, with a single-stranded mRNA sense genome, compared with herpesviruses, with double-stranded DNA genomes - and bacteria - for example, Gram-positive versus Gram-negative. In addition, post-genomic research may help to answer complex questions about pathogen persistence. For example, the quite closely related yellow fever virus and hepatitis C virus result in very different pathologies, yellow fever virus producing an acute, sometimes fatal, infection, whereas hepatitis C virus forms a long-term persistent infection that ultimately leads to liver cancer. Also, no attempts have yet been made to incorporate host and pathogen genes into the same DNA array to determine the coordinated interactions between host and pathogen. These sorts of studies are likely to reveal much new information and may ultimately lead to better targeted anti-infective therapeutics and enhanced vaccination strategies.
Data analysis and integration
To address many questions about host-pathogen interactions, methods of data analysis and integration must improve. Post-genomic studies, by their very nature, produce vast amounts of data. The true potential of methods such as DNA arrays will, however, only be realized by careful data management and bioinformatics analysis. A new breed of biologist is emerging who not only understands his or her particular biological system but is also computer literate and able to handle, analyze and conceptualize vast amounts of biological data. This has led to the realization that carefully designed and maintained databases are now a must for many laboratories, and data-warehousing of additional related information is likely to be essential for discovering underlying patterns and relationships in the data.
Most DNA array laboratories have in-house databases for their own array experiments. Of greater value would be public expression databases such as ArrayExpress, envisaged by the European Bioinformatics Institute [17,18], and the National Cancer Institute's ArrayDB . These will function as repositories for array data analogous to the sequence databases EMBL, GenBank and DDJB. In the future, it is likely that publication of expression data in journals will require the submission of data to a public expression database and the assignment of an accession number prior to publication, again analogous to submission of new sequence data. Gene expression data are at present far from suitable for such databases, however. In comparison to DNA sequence or protein structure data, gene expression data are stored mainly as unstructured flat-files with no uniform standards of data reporting . Different methodologies report different types of quantitation of gene expression, and the relationships between the different methods are not yet fully understood. This has led the array community to propose a minimum information standard and data format for expression data to facilitate the construction of a public database [4,18,19].
DNA array terminology
Type of ORF probe
PCR product or cloned DNA
Host and pathogen DNA array studies
Array type and support
Human foreskin fibroblasts
Oligonucleotides on glass slides
Virus gene expression during lytic replication
Human herpesvirus 8
PCR products on nylon
Virus gene expression during latent and lytic replication
Human immuno-deficiency virus type 1
PCR products on glass slides
Host gene expression during 72 hours of virus infection
Normal human keratinocytes
Human papillomavirus type 31
PCR products on glass slides
Host gene expression following transfection of the viral genome
Human foreskin fibroblasts
Host gene expression during 24 hours of virus infection
Human colorectal‡ and colon§ epithelial cells
PCR products on nylon
Host gene expression during 20 hours of bacterial infection
6,800 genes, 18,367 genes, 588 genes
Affymetrix microarrays, PCR products on nylon
Host gene expression during 2 hours of bacterial infection
Common genes up- or down-regulated during infection by bacteria and viruses
GenBank accession number
GROβ /macrophage inflammatory protein 2α
Leukemia inhibitory factor
Receptor phosphotyrosine phosphatase, PCP-2
Type IVA phosphotyrosine phosphatase
Phosphotyrosine phosphatase-BAS Type 1
Interferon α -inducible p27 protein
- Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA, Slocombe PM, Smith M: Nucleotide sequence of bacteriophage phi X174 DNA. Nature. 1977, 265: 687-695.PubMedView ArticleGoogle Scholar
- Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al: Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995, 269: 496-512.PubMedView ArticleGoogle Scholar
- The chipping forecast. Nat Genet. 1999, 21(Suppl): [http://www.nature.com/ng/chips_interstitial.html]
- Aach J, Rindone W, Church GM: Systematic management and analysis of yeast gene expression data. Genome Res. 2000, 10: 431-445. 10.1101/gr.10.4.431.PubMedView ArticleGoogle Scholar
- Chambers J, Angulo A, Amaratunga D, Guo H, Jiang Y, Wan JS, Bittner A, Frueh K, Jackson MR, Peterson PA, Erlander MG, Ghazal P: DNA microarrays of the complex human cytomegalo-virus genome: profiling kinetic class with drug sensitivity of viral gene expression. J Virol. 1999, 73: 5757-5766.PubMedPubMed CentralGoogle Scholar
- Zhu H, Cong J-P, Mamtora G, Gingeras T, Shenk T: Cellular gene expression altered by human cytomegalovirus: global monitoring with oligonucleotide arrays. Proc Natl Acad Sci USA. 1998, 95: 14470-14475. 10.1073/pnas.95.24.14470.PubMedPubMed CentralView ArticleGoogle Scholar
- Giess GK, Bumgarner RE, An MC, Agy MB, Van't Wont AB, Hammersmark E, Carter VS, Upchurch D, Mullins Jl, Katze MG: Large-scale monitoring of host cell gene expression during HIV-1 infection using cDNA microarrays. Virology. 2000, 266: 8-16. 10.1006/viro.1999.0044.View ArticleGoogle Scholar
- Chang YE, Laimins LA: Microarray analysis identifies interferon-inducible genes and Stat-1 as major transcriptional targets of human papillomavirus type 31. J Virol. 2000, 74: 4174-4182. 10.1128/JVI.74.9.4174-4182.2000.PubMedPubMed CentralView ArticleGoogle Scholar
- Cohen P, Bouaboula M, Bellis M, Baron V, Jbilo O, Poinot-Chazel C, Galiegue S, Hadibi E-H, Casellas P: Monitoring cellular responses to Listeria monocytogenes with oligonucleotide arrays. J Biol Chem. 2000, 275: 11181-11190. 10.1074/jbc.275.15.11181.PubMedView ArticleGoogle Scholar
- Eckmann L, Smith JR, Housley MP, Dwinell MB, Kagnoff MF: Analysis of high density cDNA arrays of altered gene expression in human intestinal epithelial cells in response to infection with the invasive enteric bacteria Salmonella. J Biol Chem. 2000, 275: 14084-14094. 10.1074/jbc.275.19.14084.PubMedView ArticleGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863.PubMedPubMed CentralView ArticleGoogle Scholar
- Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, et al: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000, 403: 503-511. 10.1038/35000501.PubMedView ArticleGoogle Scholar
- Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I: The transcriptional program of sporulation in budding yeast. Science. 1998, 282: 699-705. 10.1126/science.282.5389.699.PubMedView ArticleGoogle Scholar
- Luster AD: Chemokines - chemotatic cytokines that mediate inflammation. N Engl J Med. 1998, 338: 436-445. 10.1056/NEJM199802123380706.PubMedView ArticleGoogle Scholar
- Olivier M, Romero-Gallo B-J, Matte C, Blanchette J, Posner Bl, Trembley MJ, Faure R: Modulation of interferon-γ induced macrophage activation by phophotyrosine phosphatases inhibition. J Biol Chem. 1998, 273: 13944-13949. 10.1074/jbc.273.22.13944.PubMedView ArticleGoogle Scholar
- Barbeau B, Bernier R, Dumais N, Braind G, Olivier M, Faure R, et al: Activation of HIV-1 long terminal repeat transcription and virus replication via NF-κB dependent and independent pathways by potent phosphotyrosine phosphatase inhibitors, the peroxovanadium compounds. J Biol Chem. 1997, 272: 12968-12977. 10.1074/jbc.272.20.12968.PubMedView ArticleGoogle Scholar
- Abbott A: Bioinformatics institute plans public database for gene expression data. Nature. 1999, 398: 646-10.1038/19363.PubMedView ArticleGoogle Scholar
- The ArrayExpress database. [http://www.ebi.ac.uk/arrayexpress/]
- Ermolaeva O, Rastogi M, Pruitt KD, Schuler GD, Bittner ML, Chen Y, Simon R, Meltzer P, Trent JM, Boguski MS: Data management and analysis for gene expression arrays. Nat Genet. 1998, 20: 19-23. 10.1038/1670.PubMedView ArticleGoogle Scholar