Genomic diversity of the human intestinal parasite Entamoeba histolytica
© Weedall et al.; licensee BioMed Central Ltd. 2012
Received: 14 November 2011
Accepted: 25 May 2012
Published: 25 May 2012
Entamoeba histolytica is a significant cause of disease worldwide. However, little is known about the genetic diversity of the parasite. We re-sequenced the genomes of ten laboratory cultured lines of the eukaryotic pathogen Entamoeba histolytica in order to develop a picture of genetic diversity across the genome.
The extreme nucleotide composition bias and repetitiveness of the E. histolytica genome provide a challenge for short-read mapping, yet we were able to define putative single nucleotide polymorphisms in a large portion of the genome. The results suggest a rather low level of single nucleotide diversity, although genes and gene families with putative roles in virulence are among the more polymorphic genes. We did observe large differences in coverage depth among genes, indicating differences in gene copy number between genomes. We found evidence indicating that recombination has occurred in the history of the sequenced genomes, suggesting that E. histolytica may reproduce sexually.
E. histolytica displays a relatively low level of nucleotide diversity across its genome. However, large differences in gene family content and gene copy number are seen among the sequenced genomes. The pattern of polymorphism indicates that E. histolytica reproduces sexually, or has done so in the past, which has previously been suggested but not proven.
Entamoeba histolytica is a parasite of the human large intestine, commonly contracted by ingesting contaminated water or food. The parasite has a two stage life-cycle consisting of a cyst, the infective stage outside of the host, and a trophozoite, the reproductive stage within the host. Invasive amoebiasis results when trophozoites attack the gut wall, leading to diarrhoea, dysentery and in some cases dissemination to organs (commonly the liver) where abscesses result . If untreated, amoebic liver abscess can be fatal.
E. histolytica infection is endemic in many parts of the world where sanitation infrastructure is poor; elsewhere infection tends to be restricted to certain at-risk groups such as residents in institutions for the mentally handicapped and men who have sex with men . In endemic areas, exposure can be extremely high: an annual incidence of 40% was estimated among children in an urban slum in Bangladesh . The global prevalence of infection was estimated to be approximately 10% of the world's population in the 1980s . Of these, approximately 90% were estimated to be asymptomatic carriers while 10% developed invasive amoebiasis, leading to 100,000 deaths per year . This estimate was made prior to the redescription in 1993 of E. histolytica as two separate species, E. histolytica and E. dispar . E. histolytica causes invasive amoebiasis while E. dispar is avirulent, and it was thought that E. dispar infection might account for the predominance of asymptomatic carriers. However, even considering only E. histolytica infections, invasive amoebiasis appears still to be a relatively rare outcome of infection.
Understanding the nature of amoebic virulence motivates a substantial body of research. Several lines of evidence suggest that genetic factors of the parasite affect its virulence: the morphologically identical but genetically distinct E. dispar appears to be avirulent; distinct E. histolytica genotypes have been detected in liver abscess and feces from the same infected person ; different endemic populations show different rates of liver abscess ; a short-term transmission chain of virulent parasites has been reported .
The E. histolytica genome has been sequenced to a draft level with 12.5-fold coverage genome assembly consisting of approximately 24 megabases in 888 scaffolds . Subsequent reassembly with additional data produced a genome assembly of approximately 21 megabases in 1,496 scaffolds . The ploidy of E. histolytica is uncertain though some studies suggest that it may be tetraploid . As part of the effort to identify genetic determinants of virulence and to gain an understanding of the diversity of the parasite, we re-sequenced the complete genomes of a number of E. histolytica strains. We show that single nucleotide diversity is relatively low among these parasites, but that gene copy number variation may be a significant contributor to genome diversity. We also provide evidence that E. histolytica may reproduce sexually and define a set of high quality SNPs that could be used as markers for genotyping E. histolytica isolates.
Results and discussion
Sequencing and mapping of Entamoebastrains to the HM-1:IMSS reference genome
Entamoeba histolytica strains sequenced in this study
The E. histolytica genome is known to be very A + T rich (approximately 75% A + T) and repetitive, which both make sequence alignment difficult. In order to assess this repetitiveness and estimate how much unique mapping we might expect from short sequencing reads, we made a set of 50 bp reads from the reference genome and mapped them to the reference: 72.36% aligned to only one location, 12.46% aligned to two different locations, 10.83% aligned to between 3 and 10 locations and 4.35% aligned to more than 10 locations. As reads were derived from the reference genome, these proportions relate directly to proportions of the genome, so we expected up to 14 megabases of sequence to be unique enough for unique alignment to 50 bp reads. This shows that the majority of the reference genome is unique enough for accurate alignment of 50 bp reads. When we aligned the read libraries, we saw that approximately 75% of reads that aligned uniquely and approximately 13 megabases of the reference genome were covered by uniquely aligned reads, broadly consistent with our expectations.
Alignment and SNP-calling statistics
c50, c95 covb
Divergence (hom only)f
SNP calling was carried out, using thresholds for minimum and maximum coverage and minimum mapping quality. The number of sites passing these thresholds ranged from approximately 9 to 14 megabases. At these positions, bases were predicted and SNPs called (Additional file 2). SNP calls for heterozygotes will be affected by the unknown ploidy (a ploidy of 4 was assumed based on the estimate of Willhoeft and Tannich ). At least 89.55% of the homozygous SNPs in the Rahman strain were also called as variants when sequenced independently using different sequencing technology (454 sequencing; Additional file 3). However, as few as 43.02% of the heterozygous SNPs were similarly validated. It is difficult to ascertain whether the unvalidated SNPs are false positives or are due to the lower coverage of the 454 sequencing. Due to the greater uncertainty associated with heterozygous SNP calls, homozygous SNP calls showing inter-strain variation and a base call in every strain were tabulated as a set of 'high quality candidate markers' (Additional file 4). These SNPs were used to infer a phylogeny of sequenced strains and to test for evidence of meiotic recombination.
Genealogy of E. histolyticastrains
Evidence of meiotic recombination among E. histolyticastrains
SNPs among E. histolyticastrains
Overall genetic diversity appears to be rather low among these genomes. For these data it is difficult to make an estimate of diversity such as the average pairwise divergence because both the ploidy and the phase of variant bases are unknown. Also, all data are mapped to a single reference genome, so comparisons to each other may not account for differences in genome coverage. Therefore, we estimated the divergence of each genome from the HM-1:IMSS reference genome as the proportion of sites that could be called as SNPs that were called as SNPs. We used both the number of putative homozygous SNPs and the total number of both homozygous and heterozygous SNPs to estimate a lower and an upper boundary for divergence. The estimates are shown in Table 2. Divergence estimates counting only putative homozygous positions were between 0.3120 and 0.4147 SNPs per kilobase, while estimates counting both putative homozygous and heterozygous positions were between 0.6439 and 0.8573 SNPs per kilobase.
All SNP calls estimated for each strain were tabulated and can be viewed in Additional file 2. In order to identify polymorphic genes that may be under diversifying selection, the numbers of nonsynonymous and synonymous homozygous SNPs per gene (compared to the HM-1:IMSS reference genome) were counted and tabulated in Additional file 5. Putative nonsense mutations in genes are tabulated in Additional file 6. In these Additional files, only homozygous variants were counted in order to reduce the chance of counting false heterozygous SNPs caused by alignment of paralogous sequence. Genes with five or more nonsynonymous homozygous SNPs in one or more strain are tabulated in Additional file 7. This rather crude method was used instead of estimating dN (nonsynonymous divergence per nonsynonymous site) and dS (synonymous divergence per synonymous site), despite its bias towards longer genes and genes with no closely related paralogues (that is, genes with a greater proportion of unique coverage), because the small number of variant positions per gene makes dN and dS estimates very sensitive to small stochastic differences in the number of variants and not robust. Fifty-three genes contained five or more homozygous nonsynonymous SNPs relative to HM-1:IMSS. Among these genes are several encoding proteins involved in host-parasite interactions, including the intermediate chains of the galactose/N-acetyl-D-galactosamine-inhibitable lectin (Gal/GalNAc lectin) igl1 (EHI_006980) and igl2 (EHI_065330), which were divergent in the HK-9 strain, the light chain Gal/GalNAc lectin gene lgl (EHI_035690), which was divergent in the Rahman and IULA:1092:1 strains, and three serine-threonine-isoleucine-rich protein (STIRP) genes (EHI_073630, EHI_004340 and EHI_012330). The Gal/GalNAc lectin complex is a major virulence factor expressed on the surface of the trophozoite that has multiple functions, including binding to the mucus layer and to epithelial cells and involvement in host cell killing . STIRPs are putative trophozoite surface-expressed proteins, some or all of which bind to host epithelial cells and cause cell death .
Gene copy number variation among E. histolyticastrains
Genes with coverage depth more than twofold the average of HM-1A and HM-1B in one or more strain were designated putative high copy number genes (Additional file 8). Putative missing genes (reads per kilobase per million mapped reads (RPKM) < 1 and < 50% of the average of HM-1A and HM-1B) were also tabulated (Additional file 9). Among the putative high copy number genes in one or more strain were genes encoding ribosomal proteins (EHI_081320, EHI_104660, EHI_108500, EHI_124300, EHI_124330, EHI_137870, EHI_152080, EHI_152570, EHI_152610, EHI_153070, EHI_179980, EHI_181560, EHI_182920, EHI_193330, EHI_199970). Also represented were members of large gene families encoding AIG1 (avirulence induced gene 1)-like proteins (EHI_022500, EHI_079610, EHI_115170, EHI_126560, EHI_144270, EHI_157260, EHI_176590, EHI_195250, EHI_195260), BspA (Bacteroides forsythus surface protein A)-like leucine rich repeat-containing proteins (EHI_002120, EHI_005660, EHI_008340, EHI_049160, EHI_094080, EHI_102380, EHI_113190, EHI_123820, EHI_137910, EHI_147680, EHI_163960, EHI_189090, EHI_191510, EHI_192600) and ariel family proteins (EHI_028430, EHI_057430, EHI_080200). Among the genes absent from different strains were genes on a duplicated genome region previously described in HM-1:IMSS , indicating variation in the copy number of these duplications among strains. Many putative missing genes were unannotated, and a proportion of them may be incorrect gene models. The results suggest that strains vary in their gene family content and that duplication and loss of members of these large gene families is a dynamic process.
We report the genome-wide diversity of the enteric parasite E. histolytica. Overall, diversity among the sequenced genomes was rather low: divergence from the reference genome was between 0.312 and 0.857 SNPs per kilobase, depending on whether heterozygous SNPs were counted or not. An analogous measure of genomic diversity in Plasmodium falciparum estimated divergence from the reference strain 3D7 of a clinical isolate from Ghana (divergence = 1.31 SNPs per kilobase); of a strain (IT) derived from central America (divergence = 1.10 SNPs per kilobase); and divergence between P. falciparum 3D7 and the related species Plasmodium reichenowi of 20.30 SNPs per kilobase . Previous studies of genetic diversity among E. histolytica isolates have produced conflicting results. In general, studies using repetitive DNA as markers of diversity, such as the serine-rich E. histolytica protein (SREHP) gene and the short tandem repeats separating tRNA genes, have reported high diversity among genomes [17–21]. In contrast, studies of SNPs, though few in number, have reported minimal diversity [22–24]. Our results agree with the latter studies, suggesting that SNP is low in E. histolytica. However, our results also suggest that recombination and reassortment of alleles may have shuffled these alleles to create new haplotypes. Therefore, recombination and reassortment may be relatively more important than point mutation in generating diversity among parasite lineages.
The highly polymorphic STIRP gene EHI_073630 showed a pattern of polymorphism consistent with allelic dimorphism, as described in P. falciparum. It is difficult to say with certainty that the gene is truly dimorphic since it is a member of a gene family so it is possible that the aligned sequences are paralogous rather than allelic. However, the alignment clearly shows that the HM-1-like form of the gene is present in HM-1A and HM-1B and not present in the other strains. The causes and significance of allelic dimorphism in P. falciparum are not fully understood, but several genes that show this pattern of polymorphism (merozoite surface proteins MSP-1 , MSP-2 [25, 26], MSP-3 , and MSP-6 , and erythrocyte binding antigen EBA-175 ) are involved in host cell invasion. Immune recognition of some of these proteins has been associated with protection [30–33], and a proportion of antibodies may recognize allele-specific epitopes [31, 33, 34]. Whether any of this applies to the E. histolytica gene EHI_073630 remains to be seen. More extensive sampling of this region of the genome from more parasite isolates might indicate whether or not EHI_073630 is truly dimorphic and if the HM-1-like form of the gene is geographically restricted to Mexico, or segregates in other populations.
Sexual reproduction has not been observed in Entamoeba. However, since the publication of the genome of the E. histolytica HM-1:IMSS strain showed that it possesses the majority of a set of genes involved in meiosis, it has been suggested that E. histolytica may reproduce sexually [9, 35, 36]. Gene conversion, a process involving homologous recombination, appears to have occurred among paralogous genes in the E. histolytica and E. dispar lineages . The evidence from patterns of polymorphism presented here suggests that historical recombination events have broken up linkage between polymorphic sites and linkage has declined with increasing distance between sites on the same chromosome, consistent with the occurrence of meiotic recombination. Taken together, these lines of evidence strongly suggest that sexual reproduction has occurred in E. histolytica and adds it to the list of sexual protozoal parasites previously believed to be clonal, such as species of Leishmania  and Giardia . Whether sexual reproduction is common in parasite populations could be tested by genotyping large numbers of individuals from endemic populations and measuring linkage disequilibrium.
Large differences in the depth of coverage among genes, indicative of gene copy number variation, were seen among the sequenced genomes, suggesting a degree of plasticity and variation in gene family content. A number of examples of polymorphism in the copy number of segmental duplications of regions spanning multiple genes were seen, such as that seen on scaffold DS571330 that appears to have been duplicated many times in the Rahman strain. Segmental duplications previously described in the HM-1:IMSS genome  appear to be present in the HM-1A and HM-1B clones but to show differential copy number among the other sequenced genomes (Additional file 9). Genomic plasticity can be an important generator of diversity and may be an important mechanism of molecular evolution in Entamoeba. Genomic plasticity is an important form of genome diversity among Leishmania species and strains . Extensive size variation among homologous chromosomes is seen in Entamoeba  and in other parasitic protozoa such as trypanosomes . If we consider the Rahman duplication in scaffold DS571330, 25 duplications of a 12.4 kilobase region would add around 0.3 megabases to the chromosome containing scaffold DS571330, a significant contribution to chromosomal size variation among strains. It remains to be seen whether such plasticity is a feature of natural parasite lines or a phenomenon arising during strain axenization, which is known to affect genome content .
In conclusion, our results suggest that among E. histolytica parasites, genomic plasticity and recombination may be relatively more important mechanisms of generating genome diversity than single nucleotide mutation.
Materials and methods
Origins and growth of E. histolyticastrains and extraction of genomic DNA
Several E. histolytica isolates were sequenced. Isolates PVBM08F, PVBM08B, 2592100, MS84-1373 and MS27-5030 were axenized specifically for genome sequencing. The other isolates studied here are long-established laboratory strains.
PVBM08F and PVBM08B represent two culture isolations from the same individual, from feces and an intestinal biopsy, respectively. The male patient had recently spent 2 months in Liberia and presented with bloody diarrhoea and mucus, consistent with amoebic dysentery, and was strongly seropositive for E. histolytica. The cultures were kindly provided by Dra. Simonetta Gatti, Laboratory of Parasitology, Foundation IRCCS Policlinico San Matteo, Pavia, Italy.
MS84-1373 and MS27-5030 were established using routine monthly samples from asymptomatic children that were part of a long-term study of amoebiasis in Mirpur, Dhaka in Bangladesh . Isolate 2592100 was from a case of amoebic dysentery seen at a local hospital in Dhaka. These cultures were kindly provided by Dr Rashidul Haque, International Centre for Diarrhoeal Disease Research, Bangladesh.
All five cultures were established in Robinson's medium ; on arrival in London they were adapted to growth in LYSGM with 5% adult bovine serum . Cultures were then axenized via monoxenic cultures in LYI-S-2 (liver extract, yeast extract, iron, serum growth medium) with 15% adult bovine serum and Crithidia fasciculata as an associate, as described .
To generate large amounts of genomic DNA for genome sequencing, parasites were cultured axenically in LYI-S-2. Cells were grown to a high density and harvested, usually after 72 hours of growth. Cell cultures were centrifuged and the culture medium removed, then cells were washed twice in phosphate-buffered saline before lysis in 300 μl of QIAGEN cell lysis buffer (QIAGEN, Crawley, UK). Genomic DNA was purified using a modified version of the CTAB extraction protocol described elsewhere [46, 47], using two rounds of phenol:chloroform:isoamyl alcohol (25:24:1) extraction. DNA was re-suspended in nuclease-free water, ready for preparation of sequencing libraries.
SOLiD library preparation and sequencing
Purified genomic DNA was used to prepare SOLiD™ fragment libraries, according to the manufacturer's protocols (Life Technologies, Foster City, CA, USA). Briefly, genomic DNA was fragmented by sonication on a Covaris S2 sonicator (Covaris Inc., Woburn, MA, USA) and end-repaired in preparation for P1 and P2 adaptor ligation. Adaptors were ligated and the samples size selected and amplified by standard PCR. DNA was bound to SOLiD™ P1 beads and amplified by emulsion PCR, followed by enrichment for templated beads. The DNA was 3' modified before deposition on the sequencing slide, ensuring attachment of the beads to the slide. All libraries were sequencing on SOLiD™ 4 sequencer (Life Technologies), to produce 50 bp reads.
SOLiD read mapping
Sequence reads were mapped to the genome sequence of E. histolytica strain HM-1:IMSS using the Burrows-Wheeler Aligner (BWA) mapping software . The reference sequence matched that in data release 1.3 of the AmoebaDB database ; its 1,496 scaffolds contained 20,734,772 bases (excluding unknown bases represented by 'N'). Mapping parameters allowed up to four colorspace mismatches (that is, equivalent to up to two SNPs) per 48 bp read (after trimming off 5' and 3' terminal color calls), which represents up to approximately 4.2% divergence from the reference. The alignments were filtered to retain only uniquely aligned reads. These alignments were used for all further analyses.
Analysis of repetitiveness of the reference genome
Non-unique mapping to the reference genome will lead to gaps in coverage of the genome. In order to assess mapping to the reference, the relative repetitiveness and uniqueness of the reference genome was estimated. The reference sequence was used to generate 50 bp sequence fragments representing the entire sequence, each offset from the last by 1 bp. These fragments were mapped to the reference genome using the Bowtie v0.12.1 short read mapper , allowing 0 mismatches. The number of times each fragment exactly matched a genomic region was recorded and the proportions of reads mapped to 1, 2, 3 to 10, 11 to 100 and > 100 different positions were calculated.
Analysis of SNP variants
Nucleotide variants between mapped strain and reference were called separately for each strain using the SAMtools pileup software . Bases were called for each strain using the MAQ algorithm, assuming a ploidy of 4 . Putative SNP positions were filtered to remove those with coverage < 5, coverage in the top 5% (to remove potential artifactual heterozygous SNPs caused by 'stacking-up' of paralogous sequence), SNP quality < 20 or mapping quality < 20 (SNP and mapping quality scores are phred-scaled: 20 represents 99% accuracy of the call) or within 5 bp of a putative insertion or deletion. The effect of varying these threshold parameters on the predicted number of SNPs (in the Rahman strain) was assessed (Additional file 3). An independently generated dataset of Roche 454 reads representing the Rahman genome (manuscript in preparation) was used to validate SNP calls from the SOLiD data (Additional file 3). These reads were mapped to the same reference genome as the SOLiD data. The mean coverage across SNPs was 24×. All SNP calls were tabulated and are displayed in Additional file 2. In order to define a subset of high confidence candidate SNP markers to estimate a phylogeny and to test for evidence of recombination among strains, 3,696 SNPs that varied between strains but were putatively homozygous within each strain, occurred in coding sequences and had a base called in every strain were designated 'high quality candidate marker' SNPs (Additional file 4).
Estimating a phylogeny of strains
The 3,696 'high quality candidate marker' SNPs were used to estimate a phylogeny of the sequenced E. histolytica strains. E. histolytica-only phylogenies (based on 3,696 positions) were estimated using two phylogenetic approaches: distance-based analysis (neighbor-joining) and maximum likelihood (ML). Both methods were implemented in MEGA 5 . ML used the General Time Reversible model of evolution, selected as best using the model testing function in MEGA 5, while neighbor-joining used the maximum composite likelihood model. Statistical support for distance and ML trees was evaluated using bootstrapping with 1,000 replicates.
In order to root these phylogenies, a set of E. dispar variants at orthologous positions was defined. This was done by aligning regions flanking E. histolytica SNP positions (100 bp on either side) to the E. dispar genome assembly using NUCmer  and filtering the alignments to remove any that were not unambiguous best alignments from both species. This resulted in a set of 1,897 positions at which an E. dispar base could be confidently determined. This set of positions was used to build a bootstrapped neighbor-joining phylogeny using programs from the PHYLIP software package , which placed the E. dispar branch along the branch linking PVBM08B and PVBM08F to the other E. histolytica strains. The E. histolytica-only phylogenies were rooted along this branch.
Testing for evidence of recombination among lineages
The four-gametes (or four-haplotypes) test was applied to test for evidence of recombination during the evolutionary history of the E. histolytica strains. Under the infinite sites model of evolution, nucleotide sites mutate only once in the evolutionary history of a sample of related sequences, with no subsequent or back-mutation. Therefore, the maximum number of possible haplotypes formed by a pair of nucleotide sites in the absence of recombination is three. Four haplotypes can only be produced by a recombination event occurring between the two positions. Moreover, if recombination has happened, the occurrence of four haplotypes should increase in frequency with greater distance between variable nucleotide sites.
In order to detect this, 100,000 pairs of polymorphic 'high quality candidate marker' positions on the same scaffold (that is, putatively physically linked) were randomly sampled and the proportion of pairs showing four haplotypes was calculated for 100 distance intervals corresponding to centiles of the distance (each of 1,000 pairs of polymorphic sites). The proportion of pairs showing four haplotypes was plotted against the median distance for each interval. The correlation between this proportion and the distance between positions was tested by Spearman's rank correlation.
To strengthen the conclusion of meiotic recombination based on the results of the four-haplotypes test, we carried out simulations of samples undergoing recombination or gene conversion to compare the expected patterns of polymorphism. Simulations were carried out using Hudson's 'ms' software , which generates samples of sequence by coalescent simulation. In this case, two mutations were simulated (randomly assigned to branches of the coalescent tree) 100,000 times. Recombination was modeled by specifying a recombination parameter (4.Ne.r, where Ne is the effective population size and r is the per generation probability of a crossover occurring in the sequence). Gene conversion was modeled by specifying a parameter similar to the recombination parameter (4.Ne.f, where f is the per generation probability of a gene conversion event in the sequence) as well as the length of the 'converted' region. The proportion of four-haplotype SNP pairs per centile interval (that is, 1,000 simulations) was plotted.
Analysis of gene copy number variation
In order to detect variation in gene copy number among strains (including duplicated and missing genes), read counts were estimated for each gene in each strain. These counts were normalized for different library sizes between strains and for different gene lengths by expressing each as RPKM. In order to detect copy number variation, each strain was compared to HM-1:IMSS (in this case, the average RPKM for the HM-1A and HM-1B strains). Genes where the average HM1 RPKM was very high (top 12.5%) or low (bottom 12.5%) were not considered in the analysis, to avoid calling genes as missing when the lack of unique mapping is actually due to repetitiveness, leaving a set of 6,228 genes. Genes with RPKM values two-fold higher or two-fold lower were considered to have a variable number of copies. In addition, putative missing genes had to have an RPKM value of < 1. The analysis was run both for the original alignments and for alignments filtered so that only reads with unique start positions were considered. This second analysis was done to remove the possibility that high coverage was due to PCR duplicates arising during sequencing library preparation. However, with short read fragment libraries such as these, and particularly across short genes with deep coverage, many non-unique start positions are expected by chance, so removing them will dampen the signal of a high copy number. Therefore, the results of both analyses were shown.
Analysis of SNP polymorphism in gene coding sequences
SNPs in the coding regions of genes were classified as nonsynonymous (amino acid-altering) or synonymous (causing no change to the protein sequence) using custom perl scripts. Counts of the numbers of homozygous nonsynonymous and synonymous SNPs in polymorphic genes were tabulated (Additional file 5).
All sequence read data have been deposited in the EBI short read archive and are publicly available. All read data are labeled with the study accession number ERP001383 and individual samples are labeled with the following sample accession numbers: HM-1A, ERS132616; HM-1B, ERS132617; Rahman, ERS132618; 2592100, ERS132619; PVBM08B, ERS132620; PVBM08F, ERS132621; IULA:1092:1, ERS132622; HK-9, ERS132623; MS84-1373, ERS132624; MS27-5030, ERS132625.
merozoite surface protein
reads per kilobase per million mapped reads
single nucleotide polymorphism
serine: threonine and isoleucine rich protein.
This work was supported by a project grant from the Wellcome Trust. NH is supported by a Wolfson Merit Award from the Royal Society of Great Britain. We are grateful to Dra. Simonetta Gatti for providing parasite isolates PVBM08B and PVBM08F for axenization and to Dr Rashidul Haque and Dr Bill Petri for providing parasite isolates MS84-1373 and MS27-5030 (collected with funding from NIH grant no:2R01AI43596-13) for axenization. We also thank Miaomiao He for help with some analyses and two anonymous reviewers for their comments on the original manuscript.
- Pritt BS, Clark CG: Amebiasis. Mayo Clin Proc. 2008, 83: 1154-1159. 10.4065/83.10.1154.PubMedView ArticleGoogle Scholar
- Nozaki T, Kobayashi S, Takeuchi T, Haghighi A: Diversity of clinical isolates of Entamoeba histolytica in Japan. Arch Med Res. 2006, 37: 277-279.PubMedView ArticleGoogle Scholar
- Haque R, Ali IM, Sack RB, Farr BM, Ramakrishnan G, Petri WA: Amebiasis and mucosal IgA antibody against the Entamoeba histolytica adherence lectin in Bangladeshi children. J Infect Dis. 2001, 183: 1787-1793. 10.1086/320740.PubMedView ArticleGoogle Scholar
- Walsh JA: Problems in recognition and diagnosis of amebiasis: estimation of the global magnitude of morbidity and mortality. Rev Infect Dis. 1986, 8: 228-238. 10.1093/clinids/8.2.228.PubMedView ArticleGoogle Scholar
- Diamond LS, Clark CG: A redescription of Entamoeba histolytica Schaudinn, 1903 (Emended Walker, 1911) separating it from Entamoeba dispar Brumpt, 1925. J Eukaryot Microbiol. 1993, 40: 340-344. 10.1111/j.1550-7408.1993.tb04926.x.PubMedView ArticleGoogle Scholar
- Ali IK, Solaymani-Mohammadi S, Akhter J, Roy S, Gorrini C, Calderaro A, Parker SK, Haque R, Petri WA, Clark CG: Tissue invasion by Entamoeba histolytica: evidence of genetic selection and/or DNA reorganization events in organ tropism. PLoS Negl Trop Dis. 2008, 2: e219-10.1371/journal.pntd.0000219.PubMedPubMed CentralView ArticleGoogle Scholar
- Blessmann J, Van Linh P, Nu PA, Thi HD, Muller-Myhsok B, Buss H, Tannich E: Epidemiology of amebiasis in a region of high incidence of amebic liver abscess in central Vietnam. Am J Trop Med Hyg. 2002, 66: 578-583.PubMedGoogle Scholar
- Salit IE, Khairnar K, Gough K, Pillai DR: A possible cluster of sexually transmitted Entamoeba histolytica: genetic analysis of a highly virulent strain. Clin Infect Dis. 2009, 49: 346-353. 10.1086/600298.PubMedView ArticleGoogle Scholar
- Loftus B, Anderson I, Davies R, Alsmark UC, Samuelson J, Amedeo P, Roncaglia P, Berriman M, Hirt RP, Mann BJ, Nozaki T, Suh B, Pop M, Duchene M, Ackers J, Tannich E, Leippe M, Hofer M, Bruchhaus I, Willhoeft U, Bhattacharya A, Chillingworth T, Churcher C, Hance Z, Harris B, Harris D, Jagels K, Moule S, Mungall K, Ormond D, et al: The genome of the protist parasite Entamoeba histolytica. Nature. 2005, 433: 865-868. 10.1038/nature03291.PubMedView ArticleGoogle Scholar
- Lorenzi HA, Puiu D, Miller JR, Brinkac LM, Amedeo P, Hall N, Caler EV: New assembly, reannotation and analysis of the Entamoeba histolytica genome reveal new genomic features and protein content information. PLoS Negl Trop Dis. 2010, 4: e716-10.1371/journal.pntd.0000716.PubMedPubMed CentralView ArticleGoogle Scholar
- Willhoeft U, Tannich E: The electrophoretic karyotype of Entamoeba histolytica. Mol Biochem Parasitol. 1999, 99: 41-53. 10.1016/S0166-6851(98)00178-9.PubMedView ArticleGoogle Scholar
- Petri WA, Haque R, Mann BJ: The bittersweet interface of parasite and host: lectin-carbohydrate interactions during human invasion by the parasite Entamoeba histolytica. Annu Rev Microbiol. 2002, 56: 39-64. 10.1146/annurev.micro.56.012302.160959.PubMedView ArticleGoogle Scholar
- MacFarlane RC, Singh U: Identification of an Entamoeba histolytica serine-, threonine-, and isoleucine-rich protein with roles in adhesion and cytotoxicity. Eukaryot Cell. 2007, 6: 2139-2146. 10.1128/EC.00174-07.PubMedPubMed CentralView ArticleGoogle Scholar
- Tanabe K, Mackay M, Goman M, Scaife JG: Allelic dimorphism in a surface antigen gene of the malaria parasite Plasmodium falciparum. J Mol Biol. 1987, 195: 273-287. 10.1016/0022-2836(87)90649-8.PubMedView ArticleGoogle Scholar
- Polley SD, Weedall GD, Thomas AW, Golightly LM, Conway DJ: Orthologous gene sequences of merozoite surface protein 1 (MSP1) from Plasmodium reichenowi and P. gallinaceum confirm an ancient divergence of P. falciparum alleles. Mol Biochem Parasitol. 2005, 142: 25-31. 10.1016/j.molbiopara.2005.02.012.PubMedView ArticleGoogle Scholar
- Jeffares DC, Pain A, Berry A, Cox AV, Stalker J, Ingle CE, Thomas A, Quail MA, Siebenthall K, Uhlemann AC, Kyes S, Krishna S, Newbold C, Dermitzakis ET, Berriman M: Genome variation and evolution of the malaria parasite Plasmodium falciparum. Nat Genet. 2007, 39: 120-125. 10.1038/ng1931.PubMedPubMed CentralView ArticleGoogle Scholar
- Ayeh-Kumi PF, Ali IM, Lockhart LA, Gilchrist CA, Petri WA, Haque R: Entamoeba histolytica: genetic diversity of clinical isolates from Bangladesh as demonstrated by polymorphisms in the serine-rich gene. Exp Parasitol. 2001, 99: 80-88. 10.1006/expr.2001.4652.PubMedView ArticleGoogle Scholar
- Escueta-de Cadiz A, Kobayashi S, Takeuchi T, Tachibana H, Nozaki T: Identification of an avirulent Entamoeba histolytica strain with unique tRNA-linked short tandem repeat markers. Parasitol Int. 2010, 59: 75-81. 10.1016/j.parint.2009.10.010.PubMedView ArticleGoogle Scholar
- Haghighi A, Kobayashi S, Takeuchi T, Masuda G, Nozaki T: Remarkable genetic polymorphism among Entamoeba histolytica isolates from a limited geographic area. J Clin Microbiol. 2002, 40: 4081-4090. 10.1128/JCM.40.11.4081-4090.2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Samie A, Obi CL, Bessong PO, Houpt E, Stroup S, Njayou M, Sabeta C, Mduluza T, Guerrant RL: Entamoeba histolytica: genetic diversity of African strains based on the polymorphism of the serine-rich protein gene. Exp Parasitol. 2008, 118: 354-361. 10.1016/j.exppara.2007.09.008.PubMedView ArticleGoogle Scholar
- Zaki M, Reddy SG, Jackson TF, Ravdin JI, Clark CG: Genotyping of Entamoeba species in South Africa: diversity, stability, and transmission patterns within families. J Infect Dis. 2003, 187: 1860-1869. 10.1086/375349.PubMedView ArticleGoogle Scholar
- Beck DL, Tanyuksel M, Mackey AJ, Haque R, Trapaidze N, Pearson WR, Loftus B, Petri WA: Entamoeba histolytica: sequence conservation of the Gal/GalNAc lectin from clinical isolates. Exp Parasitol. 2002, 101: 157-163. 10.1016/S0014-4894(02)00113-3.PubMedView ArticleGoogle Scholar
- Bhattacharya D, Haque R, Singh U: Coding and noncoding genomic regions of Entamoeba histolytica have significantly different rates of sequence polymorphisms: implications for epidemiological studies. J Clin Microbiol. 2005, 43: 4815-4819. 10.1128/JCM.43.9.4815-4819.2005.PubMedPubMed CentralView ArticleGoogle Scholar
- Ghosh S, Frisardi M, Ramirez-Avila L, Descoteaux S, Sturm-Ramirez K, Newton-Sanchez OA, Santos-Preciado JI, Ganguly C, Lohia A, Reed S, Samuelson J: Molecular epidemiology of Entamoeba spp.: evidence of a bottleneck (Demographic sweep) and transcontinental spread of diploid parasites. J Clin Microbiol. 2000, 38: 3815-3821.PubMedPubMed CentralGoogle Scholar
- Smythe JA, Peterson MG, Coppel RL, Saul AJ, Kemp DJ, Anders RF: Structural diversity in the 45-kilodalton merozoite surface antigen of Plasmodium falciparum. Mol Biochem Parasitol. 1990, 39: 227-234. 10.1016/0166-6851(90)90061-P.PubMedView ArticleGoogle Scholar
- Thomas AW, Carr DA, Carter JM, Lyon JA: Sequence comparison of allelic forms of the Plasmodium falciparum merozoite surface antigen MSA2. Mol Biochem Parasitol. 1990, 43: 211-220. 10.1016/0166-6851(90)90146-D.PubMedView ArticleGoogle Scholar
- Okenu DM, Thomas AW, Conway DJ: Allelic lineages of the merozoite surface protein 3 gene in Plasmodium reichenowi and Plasmodium falciparum. Mol Biochem Parasitol. 2000, 109: 185-188. 10.1016/S0166-6851(00)00245-0.PubMedView ArticleGoogle Scholar
- Trucco C, Fernandez-Reyes D, Howell S, Stafford WH, Scott-Finnigan TJ, Grainger M, Ogun SA, Taylor WR, Holder AA: The merozoite surface protein 6 gene codes for a 36 kDa protein associated with the Plasmodium falciparum merozoite surface protein-1 complex. Mol Biochem Parasitol. 2001, 112: 91-101. 10.1016/S0166-6851(00)00350-9.PubMedView ArticleGoogle Scholar
- Ware LA, Kain KC, Sim Lee BK, Haynes JD, Baird JK, Lanar DE: Two alleles of the 175-kilodalton Plasmodium falciparum erythrocyte binding antigen. Mol Biochem Parasitol. 1993, 60: 105-109. 10.1016/0166-6851(93)90033-T.PubMedView ArticleGoogle Scholar
- Cavanagh DR, Dodoo D, Hviid L, Kurtzhals JA, Theander TG, Akanmori BD, Polley S, Conway DJ, Koram K, McBride JS: Antibodies to the N-terminal block 2 of Plasmodium falciparum merozoite surface protein 1 are associated with protection against clinical malaria. Infect Immun. 2004, 72: 6492-6502. 10.1128/IAI.72.11.6492-6502.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Conway DJ, Cavanagh DR, Tanabe K, Roper C, Mikes ZS, Sakihama N, Bojang KA, Oduola AM, Kremsner PG, Arnot DE, Greenwood BM, McBride JS: A principal target of human immunity to malaria identified by molecular population genetic and immunological analyses. Nat Med. 2000, 6: 689-692. 10.1038/76272.PubMedView ArticleGoogle Scholar
- Metzger WG, Okenu DM, Cavanagh DR, Robinson JV, Bojang KA, Weiss HA, McBride JS, Greenwood BM, Conway DJ: Serum IgG3 to the Plasmodium falciparum merozoite surface protein 2 is strongly associated with a reduced prospective risk of malaria. Parasite Immunol. 2003, 25: 307-312. 10.1046/j.1365-3024.2003.00636.x.PubMedView ArticleGoogle Scholar
- Polley SD, Tetteh KK, Lloyd JM, Akpogheneta OJ, Greenwood BM, Bojang KA, Conway DJ: Plasmodium falciparum merozoite surface protein 3 is a target of allele-specific immunity and alleles are maintained by natural selection. J Infect Dis. 2007, 195: 279-287. 10.1086/509806.PubMedView ArticleGoogle Scholar
- Osier FH, Murungi LM, Fegan G, Tuju J, Tetteh KK, Bull PC, Conway DJ, Marsh K: Allele-specific antibodies to Plasmodium falciparum merozoite surface protein-2 and protection against clinical malaria. Parasite Immunol. 2010, 32: 193-201. 10.1111/j.1365-3024.2009.01178.x.PubMedPubMed CentralView ArticleGoogle Scholar
- Stanley SL: Entamoeba histolytica genome: something old, something new, something borrowed and sex too?. Trends Parasitol. 2005, 21: 451-453. 10.1016/j.pt.2005.08.006.PubMedPubMed CentralView ArticleGoogle Scholar
- Ramesh MA, Malik SB, Logsdon JM: A phylogenomic inventory of meiotic genes; evidence for sex in Giardia and an early eukaryotic origin of meiosis. Curr Biol. 2005, 15: 185-191.PubMedGoogle Scholar
- Weedall GD, Sherrington J, Paterson S, Hall N: Evidence of gene conversion in genes encoding the Gal/GalNac lectin complex of Entamoeba. PLoS Negl Trop Dis. 2011, 5: e1209-10.1371/journal.pntd.0001209.PubMedPubMed CentralView ArticleGoogle Scholar
- Rougeron V, De Meeûs T, Kako Ouraga S, Hide M, Bañuls AL: "Everything you always wanted to know about sex (but were afraid to ask)" in Leishmania after two decades of laboratory and field analyses. PLoS Pathog. 2010, 6: e1001004-10.1371/journal.ppat.1001004.PubMedPubMed CentralView ArticleGoogle Scholar
- Birky CW: Giardia sex? Yes, but how and how much?. Trends Parasitol. 2010, 26: 70-74. 10.1016/j.pt.2009.11.007.PubMedView ArticleGoogle Scholar
- Rogers MB, Hilley JD, Dickens NJ, Wilkes J, Bates PA, Depledge DP, Harris D, Her Y, Herzyk P, Imamura H, Otto TD, Sanders M, Seeger K, Dujardin JC, Berriman M, Smith DF, Hertz-Fowler C, Mottram JC: Chromosome and gene copy number variation allow major structural change between species and strains of Leishmania. Genome Res. 2011, 21: 2129-2142. 10.1101/gr.122945.111.PubMedPubMed CentralView ArticleGoogle Scholar
- Callejas S, Leech V, Reitter C, Melville S: Hemizygous subtelomeres of an African trypanosome chromosome may account for over 75% of chromosome length. Genome Res. 2006, 16: 1109-1118. 10.1101/gr.5147406.PubMedPubMed CentralView ArticleGoogle Scholar
- Mukherjee C, Clark CG, Lohia A: Entamoeba shows reversible variation in ploidy under different growth conditions and between life cycle phases. PLoS Negl Trop Dis. 2008, 2: e281-10.1371/journal.pntd.0000281.PubMedPubMed CentralView ArticleGoogle Scholar
- Haque R, Mondal D, Karim A, Molla IH, Rahim A, Faruque AS, Ahmad N, Kirkpatrick BD, Houpt E, Snider C, Petri WA: Prospective case-control study of the association between common enteric protozoal parasites and diarrhea in Bangladesh. Clin Infect Dis. 2009, 48: 1191-1197. 10.1086/597580.PubMedPubMed CentralView ArticleGoogle Scholar
- Clark CG, Diamond LS: Methods for cultivation of luminal parasitic protists of clinical importance. Clin Microbiol Rev. 2002, 15: 329-341. 10.1128/CMR.15.3.329-341.2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Stechmann A, Hamblin K, Pérez-Brocal V, Gaston D, Richmond GS, van der Giezen M, Clark CG, Roger AJ: Organelles in Blastocystis that blur the distinction between mitochondria and hydrogenosomes. Curr Biol. 2008, 18: 580-585. 10.1016/j.cub.2008.03.037.PubMedPubMed CentralView ArticleGoogle Scholar
- Clark CG, Diamond LS: The Laredo strain and other 'Entamoeba histolytica-like' amoebae are Entamoeba moshkovskii. Mol Biochem Parasitol. 1991, 46: 11-18. 10.1016/0166-6851(91)90194-B.PubMedView ArticleGoogle Scholar
- DNA Isolation. [http://entamoeba.lshtm.ac.uk/dnaisoln.htm]
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25: 1754-1760. 10.1093/bioinformatics/btp324.PubMedPubMed CentralView ArticleGoogle Scholar
- Aurrecoechea C, Barreto A, Brestelli J, Brunk BP, Caler EV, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Iodice J, Kissinger JC, Kraemer ET, Li W, Nayak V, Pennington C, Pinney DF, Pitts B, Roos DS, Srinivasamoorthy G, Stoeckert CJ, Treatman C, Wang H: AmoebaDB and MicrosporidiaDB: functional genomic resources for Amoebozoa and Microsporidia species. Nucleic Acids Res. 2011, 39: D612-D619. 10.1093/nar/gkq1006.PubMedPubMed CentralView ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.PubMedPubMed CentralView ArticleGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMedPubMed CentralView ArticleGoogle Scholar
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28: 2731-2739. 10.1093/molbev/msr121.PubMedPubMed CentralView ArticleGoogle Scholar
- Delcher AL, Phillippy A, Carlton J, Salzberg SL: Fast algorithms for large-scale genome alignment and comparision. Nucleic Acids Res. 2002, 30: 2478-2483. 10.1093/nar/30.11.2478.PubMedPubMed CentralView ArticleGoogle Scholar
- PHYLIP. [http://evolution.genetics.washington.edu/phylip.html]
- Hudson RR: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002, 18: 337-338. 10.1093/bioinformatics/18.2.337.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.