Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human
© Magness et al.; licensee BioMed Central Ltd. 2005
Received: 18 January 2005
Accepted: 23 May 2005
Published: 30 June 2005
We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.
The sequencing of genes and genomes has become a hallmark of modern molecular biology. The resulting wealth of nucleotide sequence information has fostered advances in gene discovery, the development of genome-based technologies to study gene expression and function, and a growing interest in comparative genomics. The comparison of the human genome with the genomes of closely related species has particular appeal, and there is considerable interest in identifying genomic traits that set humans apart from other primate species [1–4]. The recent growth in sequence information for the chimpanzee has fueled this interest . However, beyond that generated for chimpanzee, there has been remarkably little sequence information developed for other nonhuman primate species.
The rhesus macaque (Macaca mulatta) is a widely used small primate model of human disease, development, and behavior. Throughout the United States, National Institutes of Health (NIH)-supported facilities house more than 25,000 nonhuman primates, including more than 15,000 rhesus macaques . Each year, approximately 13,000 nonhuman primates are used for NIH-funded research, 65% of which are rhesus . These animals are used principally for infectious disease, pharmacology, and neuroscience research . In particular, the rhesus model is an essential tool for acquired immunodeficiency syndrome (AIDS) research and for the development of new drugs and vaccines against human immunodeficiency virus (HIV) [7, 8].
We report here on our initial efforts to sequence the rhesus macaque transcriptome. The close evolutionary relationship between rhesus and human, and its widespread use as a model for human reproduction, development, and disease, make it an ideal candidate for cDNA and genome sequencing. We have constructed cDNA libraries from a selection of diverse macaque tissues and multiple animals, and we have performed single-pass sequencing on 48,642 independent clones. This sequence information has been used to generate a rhesus macaque oligonucleotide microarray and to perform comparative analyses with human.
Sequence data collection and preliminary analysis
We prepared cloned cDNA libraries from 11 M. mulatta tissues derived from nine separate animals. In addition, the liver was independently sampled from one animal each of the M. mulatta, M. nemestrina, and M. fascicularis species. cDNA libraries were prepared by directional lambda-based cloning into Escherichia coli and sequenced using standard fluorescent dye-terminator chemistry. Sequencing was performed from the vector-insert junction distal to the polyadenylate sequence.
We compared each macaque sequence to the mRNA RefSeq  component of GenBank using the MEGABLAST algorithm . The most similar human sequence was identified as that reference sequence with the most significant match by bit score. In some cases, this method will identify matches between macaque and human sequences that are not orthologs, and so should be interpreted with caution. For all subsequent analyses, those macaque sequences with equally probable matches to more than one distinct human UniGene cluster have been excluded . The entire dataset taken together provides a sampling of the putative macaque orthologs for 6,216 human genes (unique human LocusLink IDs), representing approximately 25% of the human gene content by recent estimate .
Although libraries were constructed from poly(dT)-primed cDNAs, the dataset includes a significant amount of coding sequence. Of the 6,216 unique human LocusLink IDs that were sampled in macaque, 69.3% include coding sequence (mean aligned coding length = 602 bp), whereas 30.7% include only 5' or 3' untranslated region (UTR) sequence (mean aligned UTR length = 485 bp). Of those 69.3% of genes with sampled coding sequence, the average extent of coding sequence coverage in the macaque database is 49.9% (data not shown).
Similarity of Macacatranscripts with human
In order to determine if local regions of poor data quality contribute to biases in the computed degree of sequence similarity, we recomputed the histogram using alignments composed of only high-quality (Q ≥ 20) sequence. Constraining the dataset to include only high-quality bases (n = 633 sequences) did not result in significant differences in either the shape or the mean of the distributions (Figure 1).
To provide a reference dataset with which to evaluate the current results, we computed the degree of sequence similarity between human and Pan troglodytes (chimpanzee) using the same method as above. This analysis was performed using chimpanzee expressed sequence tag (EST) and cDNA sequences, as most currently available chimpanzee reference sequences are computationally predicted and therefore lack data from the 3' UTR. However, our chimpanzee-human analysis was hampered by the relative paucity of chimpanzee full-length cDNA and EST sequence in the public databases. There are currently only 209 full-length chimpanzee cDNA sequences and 6,930 EST sequences of varying quality in GenBank.
These data together provide a sampling of the 150 bp proximal and distal to the stop codon for only 134 human genes. On the basis of this small dataset, the degree of nucleotide identity between human and chimpanzee for coding and 3' UTR sequences is 98.3 ± 3.0% and 97.65 ± 3.2% respectively (Additional data file 1). As expected, the distribution of sequence similarity is strongly biased toward larger values, with 59.0% of sampled chimpanzee coding sequences and 46.3% of 3' UTR sequences identical to their best human match over the 150-bp window. The distribution of sequence identity between human and chimpanzee is presented in Additional data file 2.
Macaque sequences showing weak identity with best human match
Amino-acid identity (%)†
LocusLink/ Gene ID*
Pregnancy specific beta-1-glycoprotein 11
Pregnancy specific beta-1-glycoprotein 5
Angiogenin, ribonuclease, RNase A family, 5
Leukocyte-associated Ig-like receptor 2
Crystallin, lambda 1
Hypothetical protein LOC151174
Growth hormone 2
Apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3C
NADH dehydrogenase (ubiquinone) 1, subcomplex unknown, 2
Serum amyloid A4
Selenoprotein P, plasma, 1
Granzyme B (cytotoxic T-lymphocyte-associated serine esterase 1)
Interferon induced transmembrane protein 1
Growth hormone 1
Transmembrane protein 14B
Mitochondrial ribosomal protein L40
We also identified ten placentally expressed pregnancy-related transcripts with very weak similarity to their putative human ortholog. Prominent among these are the pregnancy-specific glycoproteins (PSG5 and PSG11). For example, the best macaque match to human PSG11 shows only 68% identity and is not better matched to any other member of the human PSG family. Other placentally expressed weak orthologs include the growth mediators angiogenin (ANG) and growth hormone 1 and 2 (GH1 and GH2). Episodic accelerated evolution has previously been reported for both angiogenin and the growth hormones, although its biological and developmental implications are not well understood [21, 22].
Mean amino-acid identity by GO ontology
Biological process group
Mean identity (%)*
Negative regulation of cell proliferation
Regulation of cell cycle
Response to oxidative stress
Proteolysis and peptidolysis
Positive regulation of cell proliferation
G-protein coupled receptor protein signaling pathway
Cell growth and/or maintenance
Regulation of transcription from Pol II promoter
Antimicrobial humoral response (sensu Vertebrata)
Ubiquitin-dependent protein catabolism
Regulation of transcription, DNA-dependent
Response to stress
Intracellular protein transport
Nuclear mrna splicing, via spliceosome
Small gtpase mediated signal transduction
Intracellular signaling cascade
These data share similarity with recent comparative analyses between human and chimpanzee [4, 24]. For example in chimpanzee, a high degree of sequence conservation and low rates of nonsynonymous substitution were found for several biological classes, including protein transport, small GTPase-mediated signal transduction, regulation of DNA-dependent transcription, intracellular signaling, and glycolysis. However, not all biological functional groups demonstrate consistent conservation among the three species. For example, the signal transduction biological class is highly conserved between chimpanzee and human, whereas its conservation between macaque and human does not significantly deviate from the mean over all classes.
Sequence divergence within and among macaque species
Estimate of Macaca mulatta nucleotide diversity
Number of animals
Nucleotide diversity (π)
Interspecies substitution rates
Number of reads
Frequency per kilobase
m vs n*
m vs f*
n vs f*
Putative rhesus sequences without human orthologs
Macaque sequences without apparent human ortholog
Ortholog by MEGABLAST*
PCR product length†
As above, we used MEGABLAST to test each macaque nucleotide sequence for one or more significant hits to the human EST or genome databases. The absence of an orthologous human sequence was defined as either no significant MEGABLAST hit in the human subset of GenBank or hits with sequence identity less than three standard deviations below the mean as measured over the entire dataset (Figure 1). Because the data were not normally distributed, the identity cutoff (approximately 92.2%) was computed using the geometric mean, which relies on a logarithmic transformation of the data. All sequences meeting this cutoff definition were also outliers based on Tukey's test .
We selected eight of the resulting macaque sequences for PCR-based analysis using a number of primate and human genomes (Table 6, Figure 2). The purpose of this analysis was simply to verify the presence or absence of the observed sequences in a panel of primate genomes. Selected primers had an average computed annealing temperature of 59.6 ± 0.9°C with an average amplified length of 108 ± 12 bp (Materials and methods). For each primer pair, PCR analysis was conducted at several annealing temperatures between 55 and 60°C. Genomic DNA was selected from independent M. nemestrina and M. mulatta animals in order to confirm the presence of these sequences in multiple independent genomes. Of the eight tested primer pairs, two resulted in amplification of consistent bands in both human and macaque genomic DNA, two were indeterminate in human but present in the macaques, and four, while obviously present in the macaque genomes, resulted in no consistent human-specific product under any cycling conditions.
The eight tested sequences fall generally into three categories: those with weak sequence similarity to the human genome or human-derived ESTs (class I), those with weak sequence similarity only to genes and proteins from nonhuman species (class II), and those with no significant amino-acid or nucleotide sequence similarity to any GenBank nucleic acid or protein sequence (class III).
Likewise, CX078592 from brain demonstrated 88-90% nucleotide similarity to the IL15RA gene and other immune-derived transcripts, as well as to a region of human chromosome 10 containing IL15RA. PCR primers derived from this sequence amplified multiple specific products from macaque, human, and other primates (data not shown). Similarly, CX078596 from placenta, although having no significant match to any human EST, demonstrated significant similarity to a region of human chromosome 22. CX078596 contained a clear mammalian polyadenylation signal and poly(A) tail, and primers derived from this sequence amplified an appropriately sized product from macaque. Alignment of this sequence with human chromosome 22 revealed a 284-bp insertion in human relative to macaque, which was reflected by amplification of a proportionately larger product in two human genomic DNA samples (data not shown). Finally, although CB552301 from spleen demonstrated significant sequence identity to regions of human chromosomes 4 and 15 and multiple ESTs from UniGene cluster Hs.459311, we failed to amplify a specific product from any primate species using primers derived from this sequence (data not shown).
The second class of sequences (class II) in Table 6 had no identified human match, while demonstrating weak sequence identity to nucleic acid or protein sequences from other species. For example, CX078598, a 670-bp transcript from PBMCs, demonstrated weak amino-acid identity (67%) to the endogenous retrovirus (ERV)-BabFcenv envelop polyprotein, a member of the ERV-F/H family of primate retroviruses . PCR with primers derived from CX078598 under a variety of thermal cycling conditions resulted in the consistent amplification of a product of expected size from only M. mulatta and M. nemestrina (Figure 2b). Similarly, CX078591 from macaque brain demonstrated weak amino-acid identity (20-45%) to ariadne homolog 2 (ARIH2/TRIAD1) from rodents and to two unnamed proteins from the puffer fish Tetraodon nigroviridis. Primers derived from this sequence amplified the appropriately sized product only from macaque genomic DNA (data not shown).
The last class of sequences (class III) in Table 6 demonstrated no significant similarity to any protein or nucleotide sequence in GenBank (represented by CB555845 and CB552531). Both showed evidence of a mammalian polyadenylation consensus sequence near their 3' terminus, with CB552531 additionally demonstrating a clear poly(A) tail. CB555845, a 485-bp sequence from spleen, amplified expected products from both M. nemestrina and M. mulatta. However, this clone was ultimately scored as indeterminate because of its consistently weak amplification of a discrete product from all hominids including human (Figure 2c). CB552531 amplified products of the expected size from macaque species and from Ateles geoffroyi and Lemur catta, but not from human (data not shown).
It is important to note that PCR-based analysis of divergent sequences is subject to a variety of influences and may result in different conclusions under different conditions. Furthermore, we cannot rule out the possibility that one or more of the sequences in Table 6 are alternatively spliced relative to human, pseudogenes, or genomic DNA contamination. However, each clone sequence in Table 6 demonstrated similarity to known expressed sequences or a polyadenylation consensus sequence and poly(A) tail at their 3' terminus upon complete sequencing of the clones.
Development of a macaque-specific expression microarray resource
Genome-based technologies such as DNA microarrays are now commonplace in human biomedical research. Similarly, species-specific arrays exist for model organisms such as the mouse and rat, for which a considerable amount of genome information is available. In contrast, researchers wishing to carry out gene-expression analyses on nonhuman primate cells or tissues are currently forced to use human DNA microarrays. As part of our effort to bring genome-based technologies to researchers using nonhuman primates, we have used ESTs generated by this project to construct a rhesus macaque-specific oligonucleotide microarray.
Oligonucleotides were designed as described in Materials and methods and arrayed onto glass slides by Agilent Technologies. Briefly, macaque cDNA sequences were assembled into 9,344 distinct clusters using The Institute for Genome Research (TIGR) clustering tools . From these, 7,973 macaque-specific oligonucleotide probes were identified for inclusion on the array. These probes represent the putative macaque equivalent of 3,519 unique human UniGene clusters  and 3,045 unique human RefSeqs . To quality control the microarray, we measured tissue-specific differences in gene expression as a means of evaluating whether the oligonucleotides were successfully binding target sequences. For these experiments, we hybridized the microarray with probes derived from RNA isolated from various rhesus macaque tissues. Probes were paired in different combinations and two dye-flipped technical replicates were performed for each pair of samples. Of the 7,973 rhesus macaque oligonucleotides present on the microarray, 6,215 showed differential expression (equal or greater than twofold; P ≤ 0.01) in at least one of the three experiments.
Primate models are essential to the study of human biology and disease and to the development of new pharmaceutical products, many of which require primate testing before approval for use in humans. The closest living primate relatives to human are the chimpanzee and other great apes . Human and chimp lineages diverged from a common ancestor 5-7 million years ago (Mya) and the genomes of the two species are highly conserved [4, 24, 34–36]. Experimental research using chimpanzees and other great apes is, however, significantly hampered by their size, maintenance costs, and endangered species status. The human-like qualities of the chimpanzee also make research using this animal generally unacceptable for ethical reasons. For the most part, chimpanzees are rarely used for invasive studies except, for example, when investigating diseases for which there is no other animal model (for example, hepatitis C infection) .
Old World monkeys, a group that includes macaque, baboon, and African green monkey, are our closest non-ape relatives. Old World monkeys and humans shared a common ancestor around 25 Mya, and the genomes of these organisms are highly conserved with human [33, 35, 38]. Furthermore, the biology of these organisms is such that they are an appropriate primate model for human physiology and disease. For this and other reasons, Old World monkeys are widely used in biomedical research, with members of the Macaca genus most frequently used .
We report here on the first phase of a study to sequence the rhesus macaque transcriptome. Our group has collected sequence data from 48,642 cDNA clones from nine animals and 11 tissues. For the current study, standard cDNA sequencing methods were used, with an emphasis on large clone-inserts and long sequence read lengths. Alternative methods could have been used for data collection that would have resulted in less 3'-end bias (for example, ORESTES ) or reduced redundancy in the collected data (for example, library normalization ).
We determined the average sequence divergence between human and macaque to be 2.21% for coding and 4.90% for noncoding sequence. An identical analysis of transcribed chimpanzee sequences demonstrated divergences of 1.70% and 2.35% for coding and noncoding sequence respectively. This is in comparison to a recently reported mean 1.44% divergence between human chromosome 21 and chimpanzee chromosome 22 over their entire length . The continued analysis of sequence divergence between the macaque and human species will be important for translating data collected in this primate model to human biology. Recent evidence suggests that even minor inter-species sequence variation can result in large phenotypic differences between macaque models and human disease [8, 41, 42].
In addition, we have identified gene functional groups with higher than average sequence divergence at the amino-acid level. In one example, we observe 15% amino-acid sequence divergence between putative human and macaque orthologs of the cytidine deaminase APOBEC3C. Consistent with this observation, Sawyer et al. have reported evidence for accelerated evolution of the primate APOBEC gene family, probably under the selective pressure of viruses . Members of this family (for example, APOBEC3G) have antiviral activity against lentiviruses and specifically against HIV . APOBEC3G is packaged into nascent virions and delivered together with the viral genome into newly infected host cells. The cytidine deaminase cargo results in hypermutation of the replicating virus in target cells, thereby inhibiting virus infection. The Vif proteins of HIV and other lentiviruses bind APOBEC3G and inhibit its antiviral activity. However, the interaction between Vif and APOBEC3G is highly species and virus specific. HIV Vif can inhibit the function of human but not simian APOBEC3G . Likewise, Yu and colleagues have recently reported that human APOBEC3B and APOBEC3C can inhibit SIV but not HIV-1 infection of human cells . Our observation of poor sequence conservation between macaque and human APOBEC3C is consistent with a model of accelerated evolution under selective pressure for this gene family.
This dataset has further enabled us to conduct a preliminary analysis of nucleotide diversity within the M. mulatta species and the degree of divergence among M. nemestrina, M. fascicularis, and M. mulatta. Mean nucleotide divergence computed over 24 genes is 15.8 ± 12.5 × 10-4, approximately twofold greater than that computed for human transcribed sequences by several recent comprehensive studies [26, 44]. Excess nucleotide diversity in macaque versus human is consistent with observations from other primate species. In general, numerous groups have observed increased nucleotide diversity in mitochondrial [45–47], sex chromosome [48–51], and autosomal DNA [28, 52, 53] sequences from chimpanzee, bonobo, and gorilla. Consistent with other primate species, this observation is likely to reflect a larger effective population size for macaque throughout evolution relative to human. Our analysis also confirms a high degree of sequence similarity among macaque species, with pairwise divergence estimates (0.380-0.588%) exceeding intraspecies heterozygosity. M. mulatta and M. fascicularis appear more closely related to each other than either one is to M. nemestrina, although these differences did not reach significance.
We describe a small number of macaque sequences without apparent human ortholog. Confirmation of this observation will require a complete sequence of the rhesus genome, but these preliminary data are consistent with recent comparative human-chimpanzee analyses demonstrating many small insertions/deletions and rearrangements between these species, some of which contain open reading frames or expressed sequences [4, 29, 54].
Finally, we report on the development of a first-generation rhesus-specific oligonucleotide microarray to support gene expression analyses of cells and tissues from this animal. Previously, investigators have used human DNA microarrays to measure gene expression changes in macaque tissues. Although the high degree of nucleotide sequence identity between humans and macaques makes this cross-species hybridization feasible, it is not clear to what extent sequence divergence between these species may affect gene expression measurements. Our observation of a small number of macaque sequences without apparent human ortholog also suggests the importance of using species-specific arrays. The rhesus microarray should therefore facilitate the use of the macaque model for future gene expression profiling experiments and may also be useful for studying similarities and differences in gene expression between macaque and human tissues . To this end, we have included on the microarray 1,014 human oligonucleotide sequences, many of which were chosen because they are orthologs of macaque sequences also present on the array. In addition, because we anticipate this array will be widely used for infectious disease research, many of the human sequences have relevance to cytokine signaling, apoptosis, or the immune response, and we have included oligonucleotides corresponding to genes from 20 different viruses.
While the macaque species are widely used primate models of human physiology and disease, there are few species-specific genomic resources available to the research community. Furthermore, the applicability of the macaque model to human disease will be highly dependent on the degree of sequence divergence between macaque and human, among the macaque species, and among animals of divergent geographic origin. Comprehensive genome-wide analysis has begun to characterize inter-species differences and to provide resources, such as the rhesus-specific microarray, that will enable a more efficient use of this model organism in the future.
Materials and methods
Animal tissues and blood were provided by the Tissue Distribution Programs of the Washington and Oregon National Primate Research Centers. All M. mulatta animals were of Indian origin and had wild-caught parents. No mitochondrial DNA or major histocompatibility complex (MHC) typing was performed.
Rhesus macaque tissues used for RNA isolation were harvested at necropsy and immediately placed in RNAlater stabilization and storage solution (Ambion). Tissues were then homogenized in Solution D  either by hand or mechanically using a Polytron tissue homogenizer, and total RNA was isolated by guanidinium isothiocyanate-phenol-chloroform extraction and further purified using RNeasy purification columns (Qiagen). Extraction of mRNA was performed using the FastTrack 2.0 mRNA extraction kit (Invitrogen). RNA quality and quantity were determined by spectrophotometry and by capillary electrophoresis using an Agilent Technologies BioAnalyzer.
cDNA library construction and sequencing
cDNA libraries were constructed using two alternative methods. Spleen mononuclear lymphocyte, brain, lung, activated PBMC, and two placental libraries (from male and female fetuses) were constructed with the Uni-ZAP cDNA library construction kit (Stratagene) using 3-5 μg high-quality mRNA for each library. Clones were isolated by ampicillin resistance and grown in 96-well plates containing LB-ampicillin medium. Liver, duodenum, ileum, jejunum, testes, ovary, and activated PBMC libraries were constructed with the CloneMiner cDNA construction kit (Invitrogen), again using 3-5 μg high-quality mRNA for each library. Clones were isolated by kanamycin resistance and grown in 96-well plates containing LB-kanamycin medium. All libraries were constructed with size-fractionated RNA, resulting in a mean insert size of approximately 1.5 kbp for each library as determined by PCR. Clone inserts were sequenced from the vector-insert junction distal to the poly(A) tail such that most resulting sequences do not include a poly(A) tail at their 3' terminus. For each clone, inserts were amplified by PCR directly from 0.2 μl of the glycerol stock using the following primers: for the Stratagene pBluescript SK (+/-) vector: 5' -CCCTCACTAAAGGGAACAAAA (the sequencing primer) and 5' -CACTATAGGGCGAATTGGGTA; for the Invitrogen pDONR222 vector: 5' -GACGTTGTAAAACGACGGC (the sequencing primer) and 5' -GCCAGGAAACAGCTATGACC. PCR products were sequenced using standard fluorescent dye-terminator chemistries on an Applied Biosystems 3700 capillary sequencer.
Sequence data analysis
cDNA sequences were first base-called using a modified version of the phred algorithm ([9, 57] and C.M., unpublished data) and then screened for cloning vector, lambda-phage, and E. coli contamination using the program cross_match . Sequences exhibiting multiple cloning sites or any contamination with lambda-phage or E. coli sequence were removed from further analysis. Leading- and trailing-cloning vector sequence was masked from all remaining sequences. Putative polyadenylation was identified by the presence of a consensus mammalian polyadenylation signal  followed by an (A)10 tract within 50 bp. The remaining sequences were analyzed using MEGABLAST  against rhesus mitochondrial sequence (GenBank accession AY612638.1) and against the human mRNA RefSeq collection  to identify putative human orthologs. Sequences with a significant match to any putative human ortholog were selected for GenBank submission. Of these sequences, 36,921 met minimum quality criteria for submission to GenBank. Each sequence was assigned a putative human ortholog if there was a unique maximally-scoring match by MEGABLAST bit score comparison; sequences with multiple maximal matches were not assigned an ortholog. Sequences with no significant RefSeq match were further analyzed by similar MEGABLAST comparisons against EST, genomic, and protein databases. Sequences were also analyzed for human repetitive sequence families using cross_match.
Rhesus-human similarity analyses
For the nucleotide comparisons, macaque sequences were selected for inclusion that were assigned a putative human ortholog and that spanned the human ortholog's final coding nucleotide by at least 150 nucleotides in each direction (as determined by initial MEGABLAST results). Selected sequences were then realigned independently against two subsequences of the corresponding human ortholog: one containing the final 150 coding nucleotides and the other containing the first 150 nucleotides in the 3' UTR. Results across all sequences were grouped by ortholog and the maximal bit score match in each region selected. For the amino-acid comparisons, macaque sequences were selected that had been assigned a putative human ortholog and contained at least 450 high-quality bases spanning the 3' end of the putative coding region (as determined by initial MEGABLAST results). Selected sequences were realigned independently (by translated MEGABLAST) against the protein sequence corresponding to the assigned ortholog. Results from all sequences were grouped by ortholog and by the maximal bit-score match selected. Grouping by protein classes was completed by cross-reference of each orthology against its GO biological process assignment.
Genomic PCR analysis of macaque sequences
PCR reactions (10 μl) included 0.132 U of Platinum Taq polymerase (PerkinElmer), 0.5 μM each primer, 0.132 μM each dNTP, 13 mM Tris-HCl (pH 8.4), 33 mM KCl, and 1 mM MgCl2. Thermal cycling was conducted in a PerkinElmer 9700 as follows: 95°C for 5 min (one cycle); 95°C for 30 sec, 55°C for 30 sec, 72°C for 1 min (40 cycles); and 72°C for 1 min (one cycle). Amplifications were evaluated under a variety of annealing temperatures between 55-60°C. Primer sequences are as follows (target/forward/reverse):
CX078591: 5'-GGAGAATCCAGTTAACGGCT-3', 5'-CTCTCATCCAGCCTAACGTG-3';
CX078602: 5'-GTTTTCAAAGAGCCCAGCAA-3', 5'-CTTTGGCATAGCTTCGGTTC-3';
CX078598: 5'-GGCAACAAGTGGGAATCAAC-3', 5'-GAGGAATCGGGATGGTCATA-3';
CB552301: 5'-CCTCCTTGGACTTGGACCTT-3', 5'-AGGACAGGAGTCTTGCCAAA-3';
CB555845: 5'-GTCAACAGGCTGGCATTTTA-3', 5'-CAATTATTGACCCCAAGGCTA-3';
CX078592: 5'-CAAAGCCATCAGACAGCAGA-3', 5'-GAGACCAGGAAAGTCGAAGG-3';
CB552531: 5'-CTGGAATAAGGCCAGAAGCA-3', 5'-ATTCCTCAGGTCTGGTGGAG-3';
CX078596: 5'-CCTCATGGTGTGGCTATGTG-3', 5'-ACACAAGGCGAGCTCTGGTA-3';
OAS1: 5'-GAGCCAAGAAGTACAGATGC-3', 5'-AGGACAGAGCTGTCCAATAG-3'.
Oligonucleotide microarray design
To design sequences for a rhesus macaque oligonucleotide microarray, we began with over 20,000 EST reads from clones derived from six cDNA libraries (spleen mononuclear lymphocyte, brain, lung, PBMC, and male and female placenta). After base-calling and quality filtering, sequences were processed using TIGR clustering tools  and compared by BLASTN with Human UniGene cluster representatives (build 167). High-quality reads that had at least one strong hit to Human UniGene were carried forward for oligonucleotide design. An additional 584 rhesus macaque sequences were provided by Robert Norgren (University of Nebraska) and Eliot Spindel (Oregon National Primate Research Center).
Oligonucleotides based on these sequences were designed by Agilent Technologies. Repeat sequences were identified, masked, and excluded. Candidate oligonucleotides were selected from the 3' end of each target sequence, filtered according to optimal base-composition profiles, and screened on the basis of predicted hybridization properties and potential cross-hybridization with other sequences. Four 60-mer oligonucleotides were initially designed for each target sequence that passed quality-control checks. To estimate specificity, each oligonucleotide was compared with Human UniGene build 167. Oligonucleotides with strong similarity to more than one UniGene cluster were then manually checked for cross-hybridization against the July 2003 assembly of the human genome (hg16) using the University of California Santa Cruz Genome Browser . Oligonucleotides that hit more than one region of the human genome were discarded as ambiguous.
Because target sequences were not filtered by annotation before oligonucleotide design, multiple oligonucleotides were often designed to different regions of the same gene. Oligonucleotides were therefore mapped to UniGene cluster sequences and two high-scoring oligonucleotides were selected for each underlying transcript represented. This resulted in a final set of 7,973 macaque oligonucleotides representing approximately 3,944 unique genes. In addition to the macaque oligonucleotides, 1,014 oligonucleotides corresponding to 894 human genes and 96 oligonucleotides corresponding to genes from 20 different viruses were also selected for inclusion on the microarray. Duplicate 60-mer oligonucleotides were arrayed onto glass slides by Agilent Technologies. The array is commercially available from Agilent Technologies, Agilent Microarray Design Identification (AMADID) Number 012650 .
Labeled probe synthesis and microarray hybridization
For microarray analysis, total RNA was extracted from spleen, brain, and placental tissues. Each tissue was obtained from a different animal. RNA quality and quantity were determined by spectrophotometry and by capillary electrophoresis using an Agilent Technologies BioAnalyzer. Labeled cRNA probes were generated using the Low Input RNA Probe Synthesis Kit (Agilent Technologies) according to the manufacturer's protocol for 11K postage-stamp oligonucleotide microarrays. The probes were hybridized in replicate to the rhesus macaque oligonucleotide microarray according to the manufacturer's protocol. Slides were scanned with an Agilent DNA microarray scannerand image analysis was performed using Agilent feature extraction software. All data were entered into a custom-designed gene-expression database, Expression Array Manager, and then uploaded into Resolver 4.0 (Rosetta Biosoftware) and DecisionSite for Functional Genomics (Spotfire) for analysis.
Data submission and databases
There are 36,921 GenBank accession numbers associated with this manuscript. They are cross-referenced and publicly available at the project website . Expression microarray data have been submitted to EBI ArrayExpress, accession number E-TABM-9.
Additional data files
This work was supported by Public Health Service grants P51RR00166 and R24RR16354 from the National Center for Research Resources. The Oregon National Primate Research Center Tissue Distribution Program is supported by Public Health Service grant P51RR00163. We thank Robert Norgren and Eliot Spindel for supplying additional rhesus macaque sequences; Barney Saunders, Charlie Nelson, and Christopher Hopkins at Agilent Technologies for oligonucleotide design; and Lianne Okada and Katie Woodard for excellent technical assistance. We also thank Roger Bumgarner and Ted Holzman for assistance with early phases of nucleotide sequence data development and analysis.
- Carroll SB: Genetics and the making of Homo sapiens . Nature. 2003, 422: 849-10.1038/nature01495.PubMedView ArticleGoogle Scholar
- Enard W, Paabo S: Comparative primate genomics. Annu Rev Genomics Hum Genet. 2004, 5: 351-378. 10.1146/annurev.genom.5.061903.180040.PubMedView ArticleGoogle Scholar
- Olson MV, Varki A: Sequencing the chimpanzee genome: insights into human evolution and disease. Nat Rev Genet. 2003, 4: 20-28. 10.1038/nrg981.PubMedView ArticleGoogle Scholar
- Watanabe H, Fujiyama A, Hattori M, Taylor TD, Toyoda A, Kuroki Y, Noguchi H, BenKahla A, Lehrach H, Sudbrak R, et al: DNA sequence and comparative analysis of chimpanzee chromosome 22. Nature. 2004, 429: 382-388. 10.1038/nature02564.PubMedView ArticleGoogle Scholar
- Rhesus monkey demands in biomedical research: a workshop report. Workshop on Rhesus Monkey Demands In Biomedical Research. 2002, Washington DC: National Academy of Sciences, 31-
- Carlsson HE, Schapiro SJ, Farah I, Hau J: Use of primates in research: a global overview. Am J Primatol. 2004, 63: 225-237. 10.1002/ajp.20054.PubMedView ArticleGoogle Scholar
- Nath BM, Schumann KE, Boyer JD: The chimpanzee and other nonhuman-primate models in HIV-1 vaccine research. Trends Microbiol. 2000, 8: 426-431. 10.1016/S0966-842X(00)01816-3.PubMedView ArticleGoogle Scholar
- Sauermann U: Making the animal model for AIDS research more precise: the impact of major histocompatibility complex (MHC) genes on pathogenesis and disease progression in SIV-infected monkeys. Curr Mol Med. 2001, 1: 515-522. 10.2174/1566524013363555.PubMedView ArticleGoogle Scholar
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8: 186-194.PubMedView ArticleGoogle Scholar
- Fitzgerald M, Shenk T: The sequence 5'-AAUAAA-3'forms parts of the recognition site for polyadenylation of late SV40 mRNAs. Cell. 1981, 24: 251-260. 10.1016/0092-8674(81)90521-3.PubMedView ArticleGoogle Scholar
- Macaque.org. [http://www.macaque.org]
- Pruitt KD, Maglott DR: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 2001, 29: 137-140. 10.1093/nar/29.1.137.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol. 2000, 7: 203-214. 10.1089/10665270050081478.PubMedView ArticleGoogle Scholar
- Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA, et al: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2003, 31: 28-33. 10.1093/nar/gkg033.PubMedPubMed CentralView ArticleGoogle Scholar
- International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431: 931-945. 10.1038/nature03001.View ArticleGoogle Scholar
- Hellmann I, Zollner S, Enard W, Ebersberger I, Nickel B, Paabo S: Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res. 2003, 13: 831-837. 10.1101/gr.944903.PubMedPubMed CentralView ArticleGoogle Scholar
- Geraghty DE: Genetic diversity and genomics of the immune response. Immunol Rev. 2002, 190: 5-8. 10.1034/j.1600-065X.2002.19001.x.PubMedView ArticleGoogle Scholar
- Klein J, Satta Y, O'HUigin C, Takahata N: The molecular descent of the major histocompatibility complex. Annu Rev Immunol. 1993, 11: 269-295. 10.1146/annurev.iy.11.040193.001413.PubMedView ArticleGoogle Scholar
- Sheehy AM, Gaddis NC, Choi JD, Malim MH: Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature. 2002, 418: 646-650. 10.1038/nature00939.PubMedView ArticleGoogle Scholar
- Sawyer SL, Emerman M, Malik HS: Ancient adaptive evolution of the primate antiviral DNA-editing enzyme APOBEC3G. PLoS Biol. 2004, 2: E275-10.1371/journal.pbio.0020275.PubMedPubMed CentralView ArticleGoogle Scholar
- Forsyth IA, Wallis M: Growth hormone and prolactin - molecular and functional evolution. J Mammary Gland Biol Neoplasia. 2002, 7: 291-312. 10.1023/A:1022804817104.PubMedView ArticleGoogle Scholar
- Zhang J, Rosenberg HF: Diversifying selection of the tumor-growth promoter angiogenin in primate evolution. Mol Biol Evol. 2002, 19: 438-445.PubMedView ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- Deloukas P, Earthrowl ME, Grafham DV, Rubenfield M, French L, Steward CA, Sims SK, Jones MC, Searle S, Scott C, et al: The DNA sequence and comparative analysis of human chromosome 10. Nature. 2004, 429: 375-381. 10.1038/nature02462.PubMedView ArticleGoogle Scholar
- Nei M, Li WH: Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA. 1979, 76: 5269-5273. 10.1073/pnas.76.10.5269.PubMedPubMed CentralView ArticleGoogle Scholar
- Livingston RJ, von Niederhausern A, Jegga AG, Crawford DC, Carlson CS, Rieder MJ, Gowrisankar S, Aronow BJ, Weiss RB, Nickerson DA: Pattern of sequence variation across 213 environmental response genes. Genome Res. 2004, 14: 1821-1831. 10.1101/gr.2730004.PubMedPubMed CentralView ArticleGoogle Scholar
- Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, et al: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001, 409: 928-933. 10.1038/35057149.PubMedView ArticleGoogle Scholar
- Yu N, Jensen-Seaman MI, Chemnick L, Ryder O, Li WH: Nucleotide diversity in gorillas. Genetics. 2004, 166: 1375-1383. 10.1534/genetics.166.3.1375.PubMedPubMed CentralView ArticleGoogle Scholar
- Frazer KA, Chen X, Hinds DA, Pant PV, Patil N, Cox DR: Genomic DNA insertions and deletions occur frequently between humans and nonhuman primates. Genome Res. 2003, 13: 341-346. 10.1101/gr.554603.PubMedPubMed CentralView ArticleGoogle Scholar
- Tukey J: Exploratory Data Analysis. 1977, Reading, MA: Addison-Wesley, 29:Google Scholar
- Benit L, Calteau A, Heidmann T: Characterization of the low-copy HERV-Fc family: evidence for recent integrations in primates of elements with coding envelope genes. Virology. 2003, 312: 159-168. 10.1016/S0042-6822(03)00163-6.PubMedView ArticleGoogle Scholar
- Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, et al: TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003, 19: 651-10.1093/bioinformatics/btg034.PubMedView ArticleGoogle Scholar
- Stewart CB, Disotell TR: Primate evolution - in and out of Africa. Curr Biol. 1998, 8: R582-588. 10.1016/S0960-9822(07)00367-3.PubMedView ArticleGoogle Scholar
- Crouau-Roy B, Service S, Slatkin M, Freimer N: A fine-scale comparison of the human and chimpanzee genomes: linkage, linkage disequilibrium and sequence analysis. Hum Mol Genet. 1996, 5: 1131-1137. 10.1093/hmg/5.8.1131.PubMedView ArticleGoogle Scholar
- Page SL, Goodman M: Catarrhine phylogeny: noncoding DNA evidence for a diphyletic origin of the mangabeys and for a human-chimpanzee clade. Mol Phylogenet Evol. 2001, 18: 14-25. 10.1006/mpev.2000.0895.PubMedView ArticleGoogle Scholar
- Schmutz J, Martin J, Terry A, Couronne O, Grimwood J, Lowry S, Gordon LA, Scott D, Xie G, Huang W, et al: The DNA sequence and comparative analysis of human chromosome 5. Nature. 2004, 431: 268-274. 10.1038/nature02919.PubMedView ArticleGoogle Scholar
- Bukh J: A critical role for the chimpanzee model in the study of hepatitis C. Hepatology. 2004, 39: 1469-1475. 10.1002/hep.20268.PubMedView ArticleGoogle Scholar
- Kumar S, Hedges SB: A molecular timescale for vertebrate evolution. Nature. 1998, 392: 917-920. 10.1038/31927.PubMedView ArticleGoogle Scholar
- de Souza SJ, Camargo AA, Briones MR, Costa FF, Nagai MA, Verjovski-Almeida S, Zago MA, Andrade LE, Carrer H, El-Dorry HF, et al: Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. Proc Natl Acad Sci USA. 2000, 97: 12690-12693. 10.1073/pnas.97.23.12690.PubMedPubMed CentralView ArticleGoogle Scholar
- Soares MB, Bonaldo MF, Jelene P, Su L, Lawton L, Efstratiadis A: Construction and characterization of a normalized cDNA library. Proc Natl Acad Sci USA. 1994, 91: 9228-9232. 10.1073/pnas.91.20.9228.PubMedPubMed CentralView ArticleGoogle Scholar
- Billick E, Seibert C, Pugach P, Ketas T, Trkola A, Endres MJ, Murgolo NJ, Coates E, Reyes GR, Baroudy BM, et al: The differential sensitivity of human and rhesus macaque CCR5 to small-molecule inhibitors of human immunodeficiency virus type 1 entry is explained by a single amino acid difference and suggests a mechanism of action for these inhibitors. J Virol. 2004, 78: 4134-4144. 10.1128/JVI.78.8.4134-4144.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Bogerd HP, Doehle BP, Wiegand HL, Cullen BR: A single amino acid difference in the host APOBEC3G protein controls the primate species specificity of HIV type 1 virion infectivity factor. Proc Natl Acad Sci USA. 2004, 101: 3770-3774. 10.1073/pnas.0307713101.PubMedPubMed CentralView ArticleGoogle Scholar
- Yu Q, Chen D, Konig R, Mariani R, Unutmaz D, Landau NR: APOBEC3B and APOBEC3C are potent inhibitors of simian immunodeficiency virus replication. J Biol Chem. 2004, 279: 53379-53386. 10.1074/jbc.M408802200.PubMedView ArticleGoogle Scholar
- Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N, Shaw N, Lane CR, Lim EP, Kalyanaraman N, et al: Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet. 1999, 22: 231-238. 10.1038/10290.PubMedView ArticleGoogle Scholar
- Ferris SD, Brown WM, Davidson WS, Wilson AC: Extensive polymorphism in the mitochondrial DNA of apes. Proc Natl Acad Sci USA. 1981, 78: 6319-6323. 10.1073/pnas.78.10.6319.PubMedPubMed CentralView ArticleGoogle Scholar
- Garner KJ, Ryder OA: Mitochondrial DNA diversity in gorillas. Mol Phylogenet Evol. 1996, 6: 39-48. 10.1006/mpev.1996.0056.PubMedView ArticleGoogle Scholar
- Jensen-Seaman MI, Kidd KK: Mitochondrial DNA variation and biogeography of eastern gorillas. Mol Ecol. 2001, 10: 2241-2247. 10.1046/j.0962-1083.2001.01365.x.PubMedView ArticleGoogle Scholar
- Gagneux P: The genus Pan : population genetics of an endangered outgroup. Trends Genet. 2002, 18: 327-330. 10.1016/S0168-9525(02)02695-1.PubMedView ArticleGoogle Scholar
- Kaessmann H, Wiebe V, Paabo S: Extensive nuclear DNA sequence diversity among chimpanzees. Science. 1999, 286: 1159-1162. 10.1126/science.286.5442.1159.PubMedView ArticleGoogle Scholar
- Kitano T, Schwarz C, Nickel B, Paabo S: Gene diversity patterns at 10 X-chromosomal loci in humans and chimpanzees. Mol Biol Evol. 2003, 20: 1281-1289. 10.1093/molbev/msg134.PubMedView ArticleGoogle Scholar
- Stone AC, Griffiths RC, Zegura SL, Hammer MF: High levels of Y-chromosome nucleotide diversity in the genus Pan . Proc Natl Acad Sci USA. 2002, 99: 43-48. 10.1073/pnas.012364999.PubMedPubMed CentralView ArticleGoogle Scholar
- Deinard AS, Kidd K: Identifying conservation units within captive chimpanzee populations. Am J Phys Anthropol. 2000, 111: 25-44. 10.1002/(SICI)1096-8644(200001)111:1<25::AID-AJPA3>3.3.CO;2-I.PubMedView ArticleGoogle Scholar
- Morin PA, Moore JJ, Chakraborty R, Jin L, Goodall J, Woodruff DS: Kin selection, social structure, gene flow, and the evolution of chimpanzees. Science. 1994, 265: 1193-1201. 10.1126/science.7915048.PubMedView ArticleGoogle Scholar
- Ruvolo M: Comparative primate genomics: the year of the chimpanzee. Curr Opin Genet Dev. 2004, 14: 650-656. 10.1016/j.gde.2004.08.007.PubMedView ArticleGoogle Scholar
- Khaitovich P, Muetzel B, She X, Lachmann M, Hellmann I, Dietzsch J, Steigele S, Do HH, Weiss G, Enard W, et al: Regional patterns of gene expression in human and chimpanzee brains. Genome Res. 2004, 14: 1462-1473. 10.1101/gr.2538704.PubMedPubMed CentralView ArticleGoogle Scholar
- Chomczynski P, Sacchi N: Single-step method of RNA isolation by guanidinium thiocyanate-phenol-chloroform extraction. Analyt Biochem. 1987, 162: 156-10.1016/0003-2697(87)90021-2.PubMedView ArticleGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.PubMedView ArticleGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12: 996-10.1101/gr.229102. Article published online before print in May 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Agilent Technologies Inc. [http://www.chem.agilent.com]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.