Using linear models, we have identified genes differentially expressed in airway epithelium between never and current smokers and have characterized expression levels of these genes in former smokers who quit smoking for different periods of time. The majority (79%) of genes differentially expressed between current and never smokers are rapidly reversible upon smoking cessation while the remainders are either slowly reversible or irreversible. Differences between the rapidly reversible and slowly reversible or irreversible genes further suggest that their expression might be regulated through different mechanisms. The rapidly reversible genes have different biological functions than the slowly reversible or irreversible genes, suggesting that they might distinguish between an acute response to tobacco smoke and a more long-lasting response to tobacco smoke induced epithelial cell damage. The gene expression consequences of tobacco smoke exposure we identified are similar to gene expression changes observed in other human bronchial airway gene expression datasets involving tobacco smoke. Commonalities with human bronchial airway datasets involving other exposures suggest that the response to tobacco smoke exposure involves a number of common bronchial airway pathways. The accuracy of a biomarker of tobacco smoke exposure using irreversible genes in additional samples suggests that the irreversibility of these gene expression changes may provide a useful tool for assessing past exposure to tobacco smoke.
Many of the rapidly reversible genes are up-regulated by smoking and are involved in a protective or adaptive response to tobacco exposure and the detoxification of tobacco smoke components. The cytochrome p450s, CYP1A1 and CYP1B1, for example, are among the rapidly reversible genes and are involved in the oxidation of many compounds, including fatty acids, steroids, and xenobiotics. CYP1A1 and CYP1B1 have been previously described as being up-regulated in response to smoke  and CYP1B1 polymorphisms can influence the risk of developing lung cancer among never smokers . Several aldo-keto reductases, like AKR1B10 and AKR1C1, are also rapidly reversible upon smoking cessation. Aldo-keto reductases are soluble NADPH oxidoreductases that are involved in the activation of polycyclic aromatic hydrocarbons present in tobacco smoke and in the detoxification of highly carcinogenic nicotine-derived nitrosamino-ketone (NNK) compounds . Another class of rapidly reversible genes are the aldehyde dehydrogenases, such as ALDH3A1, which are involved in the oxidation of toxic aldehydes produced from oxidative stress and exposure to tobacco smoke . Both the cytochrome p450s and the aldehyde dehydrogenases have been found to be up-regulated in respiratory tissue from rats exposed to smoke  and the aldo-keto reductases are up-regulated in normal bronchial epithelium and non-small cell lung tumor tissue from smokers compared with non-smokers . All of the genes listed above as well as most of the differentially expressed genes that are members of the GO molecular function category 'oxidoreductase activity' are among the most highly reversible genes, suggesting that the up-regulation of these genes is driven by the acute exposure to smoke-related toxins and returns to baseline soon after the exposure to these compounds ceases. The induction of these genes in airway epithelial cells after 15 minutes of exposure to tobacco smoke (GSE2302) lends further support to this hypothesis.
In contrast to the rapidly reversible genes, the slowly reversible and irreversible genes reflect a more permanent host-response to tobacco smoke. Interestingly, several of these genes have been associated with the development of cancers of epithelial origin. CEACAM5, carcinoembryonic antigen-related cell adhesion molecule 5, is irreversibly up-regulated by smoking and is elevated in the serum of cancer patients with lung adenocarcinoma  and colorectal cancer . SULF1 (sulfatase 1), a gene irreversibly down-regulated by smoking, influences the sulfation state of residues present on heparin sulfate proteoglycans, which are involved in cell adhesion and mediate growth factor signaling. SULF1 was found to be down-regulated in ovarian, breast, pancreatic, renal, and hepatocellular carcinoma cell lines  and head and neck squamous carcinomas . UPK1B, uroplakin 1B, plays a role in strengthening and stabilizing the apical cell surface through interactions with the cytoskeleton . UPK1B is irreversibly down-regulated by smoking and has been shown to be reduced or absent in bladder carcinomas through CpG methylation of the proximal promoter [38, 39].
The enrichment of down-regulated genes among the irreversible, slowly reversible, and the least rapidly reversible genes suggests that genetic or epigenetic mechanisms, such as chromosomal loss [7, 8] or changes to promoter methylation status [11, 12], might account for the relative permanence of these gene expression differences. Given the rather rapid turnover of airway epithelial cells, the persistence of these changes post-smoking cessation may result from a clonal growth advantage to epithelial cells in the airway harboring these changes. Several of the down-regulated slowly reversible genes are present in cytoband 16q13, where a number of metallothioneins are located. Metallothioneins have the ability to bind both essential metals, like copper and iron, as well as toxic metals, such as cadmium and mercury. They also have detoxification and antioxidant properties and may be involved in cell proliferation and differentiation . MT3 has been shown to be down-regulated by hypermethylation in non-small cell lung tumors and cell lines . In addition, metallothioneins are thought to regulate some zinc-dependent transcription factors, such as the tumor suppressor p53, by donating zinc . Potential loss or methylation of the chromosomal locus containing several metallothionein genes may impair the ability of epithelial cells to protect or to repair cellular injury from future environmental exposures that occur after smoking cessation.
In order to confirm the observed effect of smoking and smoking cessation described above, we compared our dataset with other publicly available human bronchial epithelial cell datasets involving a variety of exposures. Reproducibility of findings using different microarray datasets across similar experimental conditions and cell types has not traditionally been common practice because overlap between differentially expressed gene sets is often surprisingly small . New methodologies for comparing datasets make the task more feasible , and provide more powerful methods for determining commonalities between the observed responses of a particular cell type under one or more conditions. The tobacco exposure associated gene expression changes we observed were concordant in three other datasets involving tobacco smoke exposures. The most significant similarity involved the gene expression consequences of tobacco smoke exposure in the small airway epithelium of never and current smokers (GSE3320). This suggests that the field of injury in response to tobacco smoke is similar throughout both the large and small airways. There was also significant similarity between those genes we found to be up-regulated by smoking and the immediate gene expression changes resulting from acute tobacco exposure (GSE2302). This similarity was significant for both rapidly reversible and irreversible/slowly reversible up-regulated genes (data not shown). The lack of similarity among genes down-regulated by smoking in our dataset and GSE2302 may reflect differences between acute and chronic cigarette smoke exposure, and suggests that up- and down-regulated irreversible gene expression may occur through different biological mechanisms. Additional large datasets of acute and chronic tobacco smoke exposure are needed to further explore these hypotheses.
There were also significant similarities between genes up- and down-regulated by smoking and the gene expression differences in additional datasets such as GSE5264 (cells undergoing mucociliary differentiation) and GSE1815 (interferon gamma treated cells). These may provide biological insights about the nature of airway epithelial response to tobacco smoke exposure. The gene expression program that accompanies mucociliary differentiation has led to the hypothesis that cultured 'undifferentiated' epithelial cells may more closely resemble damaged epithelium or neoplastic lesions in vivo because many genes associated with normal squamous epithelia, squamous cell carcinomas, or epidermal growth factor receptor signaling are more highly expressed in undifferentiated cells . The similarity between genes up-regulated by smoking in our dataset and genes that are more highly expressed early in mucociliary differentiation together with the similarity between genes down-regulated by smoking in our dataset and genes that are more highly expressed late in mucociliary differentiation might, therefore, reflect the cellular damage induced by smoke exposure. In addition, there was similarity between genes up-regulated by smoking in our dataset and genes down-regulated by treatment with interferon gamma. As interferon gamma plays a role in lung inflammatory responses, these similarities suggest that tobacco smoke exposure may suppress inflammatory responses in the airway. The relationships described above and presented in the results between our dataset and the other datasets are confirmed at a pathway level and suggest that oxidoreductase activity and electron transporter activity are among the important molecular functions of the bronchial epithelium that are regulated in response to a wide range of carcinogenic, inflammatory, and toxic exposures.
As an additional validation of the gene changes observed in response to smoking and smoking cessation, we developed a biomarker of tobacco smoke exposure. Using genes irreversibly altered by cigarette smoke, we were able to classify an independent sample set of former and current smokers (GSE4115) and a sample set of smokers and non-smokers (GSE5372) with high accuracy. Other datasets examining additional inhaled toxins (for example, ozone or fumes from charcoal stoves) are needed to determine if the persistent genomic changes we have identified are tobacco smoke specific. However, our preliminary biomarker results demonstrate the potential for developing a useful epidemiological tool if the gene expression biomarker could be ultimately extended to less invasive sites, such as the buccal and nasal epithelium, as these are tissues that are also directly exposed to tobacco smoke. Biomarkers of exposure are frequently used to improve upon or validate information about tobacco smoke exposure obtained by questionnaire; however, current biomarkers of tobacco exposure (for example, cotinine  and NNAL, a metabolite of the tobacco-specific nitrosamine NNK [46, 47]) are limited to detecting recent exposure. Development of a biomarker for long-term past exposure using gene expression could have widespread epidemiological utility. We are further interested to determine if there is sufficient similarity in the gene expression differences caused by distant and low-level tobacco smoke exposure such that a biomarker of past exposure could also detect current or past passive smoke exposure.