Evaluating short-read sequence data from the highly redundant, novel transcriptome of Polarella glacialis

Gibbons, Theodore R; Concepcion, Gregory T; Bachvaroff, Tsvetan R; Delwiche, Charles F

doi:10.1186/1465-6906-12-S1-P5

Poster presentation
Published: 19 September 2011

Evaluating short-read sequence data from the highly redundant, novel transcriptome of Polarella glacialis

Theodore R Gibbons¹,
Gregory T Concepcion¹,
Tsvetan R Bachvaroff² &
…
Charles F Delwiche¹

Genome Biology volume 12, Article number: P5 (2011) Cite this article

1929 Accesses
Metrics details

Background

Dinoflagellates are a diverse group of ecologically important eukaryotic algae, the global impact of which ranges from the large-scale primary production of oxygen [1] to devastating toxic algal blooms [2]. These organisms have exceptionally large genomes (10⁹ to 10¹¹ bases) [3] and highly duplicated genes (which can occur thousands of times within a single genome) [4]. These and other unusual characteristics have made dinoflagellates difficult to study using traditional molecular biology techniques. Sequence data for dinoflagellates are correspondingly sparse, and not a single genome sequence has been published to date.

As part of our project called Assembling the Dinoflagellate Tree of Life (DAToL), our laboratory has sequenced the transcriptome of Polarella glacialis. Its genome is estimated to be only 3 Gb in size, making it one of the smallest known dinoflagellate genomes. Because we had to rely on de novo assemblers that had been tested using data from organisms that are extremely divergent from dinoflagellates, we took special care in our attempts to validate the data. Before expanding our analyses to include additional dinoflagellates, we compared the results from different sequencing and assembly methods.

Methods

Total RNA was extracted from cultured P. glacialis. This sample was then divided and shipped to Macrogen for rRNA degradation, library preparation and sequencing. One library was sequenced on one-eighth of a Roche/454 GS FLX picotiter plate using Titanium chemistry. A second library was sequenced using one lane on an Illumina GAIIx sequencer for 78 cycles in both directions (paired end). The sequences were assembled using Newbler, MIRA, Oases and Trinity, and they were analyzed using various custom scripts.

Results

The total amount of unassembled 454 sequence data added to less than one-third of the combined lengths of only those Trinity transcripts that had a significant BLAST hit against a sequence in GenBank, indicating that we did not achieve complete coverage with our 454 data.

Conclusions

Our primary hypothesis was that the longer read lengths of the 454 data might allow the corresponding assemblers to better resolve repetitive sequences, which could be instrumental for assembling conserved regions within highly duplicated genes. Our failure to obtain complete coverage with the 454 dataset undermined our ability to test this hypothesis, although we made several other interesting observations. Notably, despite the vast disparity in the depth of the coverage between the 454 and Illumina assemblies, we observed unique, apparently real sequences within some of the 454 contigs.

References

Yang EJ, Choi JK, Hyun JH: Distribution and structure of heterotrophic protist communities in the northeast equatorial Pacific Ocean.Mar Biol 2004, 146:1–15.
Article Google Scholar
Wang DZ: Neurotoxins from marine dinoflagellates: a brief review.Mar Drugs 2008, 6:349e731.
Article Google Scholar
Hou Y, Lin S: Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.PLoS ONE 2009, 4:e6978.
Article PubMed PubMed Central Google Scholar
Bachvaroff TR, Place AR: From stop to start: tandem gene arrangement, copy number andtrans-splicing sites in the dinoflagellateAmphidinium carterae.PLoS ONE 2008, 3:e2929.
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, 20742, USA
Theodore R Gibbons, Gregory T Concepcion & Charles F Delwiche
Smithsonian Environmental Research Center, Edgewater, MD, 21037, USA
Tsvetan R Bachvaroff

Authors

Theodore R Gibbons
View author publications
You can also search for this author in PubMed Google Scholar
Gregory T Concepcion
View author publications
You can also search for this author in PubMed Google Scholar
Tsvetan R Bachvaroff
View author publications
You can also search for this author in PubMed Google Scholar
Charles F Delwiche
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gibbons, T.R., Concepcion, G.T., Bachvaroff, T.R. et al. Evaluating short-read sequence data from the highly redundant, novel transcriptome of Polarella glacialis. Genome Biol 12 (Suppl 1), P5 (2011). https://doi.org/10.1186/1465-6906-12-S1-P5

Download citation

Published: 19 September 2011
DOI: https://doi.org/10.1186/1465-6906-12-S1-P5

Evaluating short-read sequence data from the highly redundant, novel transcriptome of Polarella glacialis

Background

Methods

Results

Conclusions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Genome Biology

Contact us

Evaluating short-read sequence data from the highly redundant, novel transcriptome of Polarella glacialis

Background

Methods

Results

Conclusions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us