Web Apollo: a web-based genomic annotation editing platform
© Lee et al.; licensee BioMed Central Ltd. 2013
Received: 10 May 2013
Accepted: 30 August 2013
Published: 30 August 2013
Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.
KeywordsGENOME COLLABORATIVE EDITOR
The multitude of genome browsers in genomics all grew out of the need to 'see' the full array of predictions and alignments, their relative positions and their component parts. Among these are a small number of more sophisticated genome 'editors' which allow users to go beyond passive viewing to interactively modifying and refining precise locations and structures of genome functional elements. The desktop version of Apollo , Artemis , and FMAP  are all examples of such tools. The genome sequencing and annotation paradigm typically involved a large, national genome center that undertook the raw sequencing in coordination with gene prediction pipelines and subsequent manual curation (for example, RefSeq , Ensembl , FlyBase , Wormbase , Saccharomyces Genome Database , The Arabidopsis Information Resource , and Mouse Genome Informatics ). The Model Organism Databases (MODs) often include staff members (that is, biocurators) who review and amend the gene structures. The Human and Vertebrate Analysis and Annotation (HAVANA) team at the Sanger Institute manually annotates the human , mouse , and zebrafish  genomes. The amended predictions are subsequently used either as training sets or as empirical standards whose alignments are used to improve prediction software's accuracy. For example, the HAVANA team uses their in-house genome editor (Otterlace ) to manually annotate, and then the improved annotations are fed back into the Ensembl  pipeline during subsequent quarterly runs .
Unfortunately, while this model of a central biocuration team is considered the gold standard for genome annotation, it scales poorly. Technical advances have made sequencing faster and cheaper, thereby democratizing genome-scale sequencing and allowing a rapidly growing number of researchers to launch sequencing projects ranging from population, to evolutionary, to phenotype, to disease, to classroom projects across a huge spectrum of organisms. And, while next generation sequencing technology provides annotators with significantly more information, this, perhaps paradoxically, actually increases the need for manual review because there are more biological data points to assess and integrate. Individual researchers and small research groups do not have access to a centralized biocuration team, but their need for hand curation is often greater than that of a large genome center due to their focused interest in a particular gene family, pathway or evolutionary relationship, and the generally lower quality of the genome assembly.
An ideal solution would conceptually be a 'genome wiki', where curators could collaboratively edit genome annotations online, much like the distributed curators of a wiki document . Biological text corpora, successfully exposed to 'crowd-sourced' curation via the wiki-type model, include Wikipedia pages directly associated with human genes  along with pages for protein  and RNA domain families . Other projects offer similar wiki-like editing features for text, including revision control . However, while editable textboxes have been present in browsers since the earliest days of the web, a completely integrated genome editor that operates seamlessly in the web browser (and saves annotations to a persistent data store in a client-server model) has been lacking. The natural user interface for genome data is the genome browser, and a true 'genome wiki' should allow curators to edit annotations seamlessly from within the genome browser.
Our development team included investigators representing multiple genome research communities who carried out usability testing to evaluate the effectiveness of Web Apollo's interface and annotation management. We took this user-centered-design approach to ensure real world usability was built into the system from the ground up. They evaluated usability by revising annotations for honeybee (Apis mellifera) and, from the outset, for community annotation of insect genomes such as ants (Cardiocondyla obscurior, Pogonomyrmex barbatus, and Wasmannia auropunctata), leading to a better understanding of the biology of these insects and simultaneously evaluating the effectiveness of the software.
This section briefly explains Web Apollo's core operations for importing data, editing, and exporting protein-coding gene models. Additionally we describe additional features supporting the annotation of corrections to lower quality genome assemblies, import and visualization of transcriptome data, and real-time collaboration.
Protein-coding gene annotation
To annotate a gene, curators commonly proceed by: (1) locating the region of interest; (2) inspecting all available gene predictions and biological evidence aligned to the region; (3) creating a gene model; (4) if necessary, modifying these gene models using the editing functions; (5) corroborating the accuracy of the annotation by comparing the resulting annotation with available homologs; and (6) ensuring that correct naming conventions and relevant comments have been added, utilizing available literature as needed.
Importing genomic data: Using server-side middleware, the system can load data tracks from a variety of sources, including the UCSC genome database , Chado databases , Ensembl DAS , and GenBank XML . In our recent experience, however, the most common sources of genomic information are the laboratories of individual researchers themselves and therefore we focused our attention on direct loading of genomic data files. The system accepts results of computational genomic analyses in the standard, widely used file formats GFF3 (Generic File Format, a de facto standard for sharing analysis results), SAM (Sequence Alignment/Map, accepted standard for efficient representation of high throughput sequencing alignments ), BAM (binary version of SAM), and BigWig (a binary index of 'wiggle' formatted files for the storage of dense, continuous data ). The initial server for an organism is typically primed with data using the combined output from a full genome analysis pipeline, such as MAKER . Working with the MAKER developers, a feature that dynamically instantiates a Web Apollo server as the final step in a MAKER run has been implemented. In addition, users may augment pipeline results with other data, either during the initial installation and configuration process (in which case it is stored on the server), or loading them dynamically from a local file or URL during a session. The URL alternative makes it possible for a group of users to share their data without having to add it to the central server, for example to share and display the output from a Galaxy process .
Locating the region of interest: Due to the highly fragmented nature of low-coverage genome assemblies with hundreds or thousands of scaffolds, selecting a chromosomal region of interest is not always a straightforward task. To assist in locating a region of interest users may deploy the 'Search Sequence' tool, which queries the assembled genome with a gene or chromosomal region of interest using a BLAT search (BLAST-like Alignment Tool ). This feature was implemented using a plug-in architecture, allowing support for search tools other than BLAT with minor additions to the source code. BLAT may point to multiple potential regions containing the query sequence when paralogs are present, and/or when the gene of interest is split across two or more genomic fragments. This search results in list of regions that a user can then chose from by simply clicking on a region's row to display that region in the browser.
Creating a gene model: Curators begin the manual annotation process by selecting and dragging the most appropriate computational results into the 'User-created Annotations' area, a writable 'white board' track where they can modify transcripts and individual exons. Alternately there is also the option to automatically promote one of the computational prediction sets. Due to the redundancy of available evidence for highly expressed transcripts, and the fluid growth of the available evidence, we expressly decided not to include any meta-data listing the evidence tracks used to create an annotation. The former would cause the meta-data captured to balloon, and the latter would make it extremely difficult to maintain data integrity. In our experience it is more effective to keep track of dates. If the annotation itself is dated (both for creation and for modification) as well as the evidence, then it is a straightforward operation to compare these and flag discrepancies. It is also important to use the available screen area optimally, particularly as the volume of information increases. Towards this end we added the capacity to restrict the view to a single strand, and to lock the editable white-board track into position so it is visible regardless of whether the user scrolls vertically.
Modifying a gene model: Basic editing operations such as deleting, merging, splitting, or duplicating a transcript or part of one, can be accessed from a pop-up menu available for each feature using a right-click of the mouse. To modify exon boundaries, users click to select the subject exon and drag either one of the edges. Apollo facilitates correct determination of exon boundaries by highlighting matching edges across the annotation and evidence tracks, by coloring the CDS annotation and evidence features according to their reading frame (that is, the frame of each exon is indicated by its color, and thus any features with conflicting frames displays in a different color), and by flagging non-canonical splice-sites in the user's annotations. The resulting protein sequence can be used to determine the biological credibility of a gene model by querying highly curated protein databases. Editing requests from different users arrive at the server one at a time (because of the network) and are handled in their order of arrival. The unit of operation includes all the additional edits that are intrinsic to the original operation, that is, if an exon is deleted or shortened then the parent transcript and parent gene are modified as well. The second edit request will either overwrite the first edit, which the first user will be able to see immediately, or in very rare cases of a contradictory edit (for example, an exon being deleted by the first user and then a request to change its boundary by the second user) the second user will receive and error warning, and the annotation will remain as edited by the first user. All operations performed in the 'User-created Annotations' track are recorded in the history and can be reversed or repeated with the 'Undo' and 'Redo' options.
Exporting data: To conduct further analyses, users may export their annotations as FASTA-formatted sequences, GFF3 files, or record them in a Chado database.
Visualizing stage and cell-type specific transcription
List of currently known servers.
Server set up in progress
Server set up and analysis in progress
Used one contig to teach a GMOD course
Server set up in progress
Server set up in progress
Server set up and analysis in progress
Leaf cutting ant
Server is available for ongoing annotation
Leaf cutting ant
Eastern bumble bee
Used internally to test deployment
Used internally; new assembly in progress
Ant, tramp species
Community is currently annotating
Server is available for ongoing annotation
Red harvester ant
Community is currently annotating
Computational gene prediction has begun
Community is currently annotating
Nine spine stickleback
Analysis in progress
Over 40 species
Analysis in progress
Given that manual annotation is critical to achieving accurate and reliable gene models the issue now becomes how can this process be scaled up to meet the needs of the growing number of genome research projects taking place at smaller facilities and in individual labs. With the shift in sequence data generation, the burden of curation is falling largely on research consortia or ad hoc community efforts. Some sequencing centers have supported consortium annotation efforts, either by providing websites for community members to submit annotations (for example, [34–38]), by collaborating with a centralized, external genome annotation group (for example, [39–42]), or by providing Otterlace (for example, ). However, more and more often research communities are organizing manual curation efforts among themselves, independent of sequencing centers.
Desktop Apollo gained popularity among smaller groups and over time it became one of the standards used by smaller-scale genome projects in research communities dispersed throughout the world. However, its original design legacy did not make it a perfect fit for the needs of these smaller genome projects. Installation was at times an insurmountable technical hurdle for groups lacking an on-site bioinformaticist. Furthermore, there was no support for automatically sharing annotations among members of the research team. Groups were constrained to saving files to disk and e-mailing these to one another, which is slow, inconvenient, and creates additional bookkeeping work as conflicts were resolved by database curators taking the time to contact the disagreeing annotators individually. With the need to provide a seamlessly integrated annotation flow for smaller teams of researchers in mind we built Web Apollo focusing on support for collaborative annotation efforts. By being browser-based it allows users to see changes made by collaborators working on the same region, in real time, which enables community annotators to quickly resolve issues among themselves directly. Early in the project we made the decision to build the Web Apollo client using the visualization techniques of an existing web-based genome browser, JBrowse , the best of the genome browsers alternatives available, thereby lowering overall development costs.
Web Apollo also addresses two key requirements that are particular to the smaller community annotation projects [44, 45]. First, recent research communities tend to organize into teams based on areas of biological expertise, often preferring to annotate specific genes or gene families, rather than entire scaffolds. Web Apollo allows users to quickly identify their specific loci of interest by integrating BLAT sequence comparison as an optional entry point. Second, the norm for smaller sequencing efforts is fragmented rather than polished assemblies. Web Apollo scaffold list sorting features provide easy access to scaffolds based on identifiers, even when the assembly consists of tens of thousands of scaffolds.
The establishment of best practices and quality control becomes increasingly important with the wide range of genomic expertise available within different research communities. Research projects must develop appropriate standards given their data and offer some training to assure the success of any community annotation project. The built-in quality control features of Web Apollo are similar to those used in desktop Apollo and other annotation editors such as Otterlace. These include flagging non-consensus splice sites and validating translation of coding sequences. In addition we have developed tutorials and a demonstration site to train users in the gestures required for annotation. Accessibility over the web makes it easy to hold long-distant training sessions.
But perhaps most importantly for the continued improvement of the annotations is that Web Apollo allows continued input to gene annotation as long as a server is maintained for the genome, thus researchers can continue to improve annotations as more data is collected over time. If a research community chooses to follow the 'gatekeeper' approach to community annotation , Web Apollo also makes it easy for the gatekeeper to view and revise annotations.
As sequencing technologies advance and analytical packages improve, the software providing the visualization and the annotation tools needed for iterative refinement, will necessarily have to keep step. There are a number of natural and powerful extensions to a tool like Web Apollo that will enable more analysis functions to be carried out within a browser.
In the immediate future enhancing the convenience and curatorial utilities for biologists is of central importance. We propose to add the capability to annotate further genomic feature types including cis-regulatory regions, transcription factor binding sites, and non-coding RNAs, along with providing an intuitive way to browse, navigate and visualize these. Another improvement is extending the current methods of accessing data to include data from UCSC  and Ensembl  by adding support for UCSC data hubs and the Ensembl REST API via the basic JBrowse platform. In addition, the introduction of composite tracks that can utilize multiple data files by integrating metadata about how the files are related, for example sequencing read alignment data in a BAM file and coverage plots derived from those alignments in a BigWig file. This will enable a single track to show sequence read alignments at high zoom levels and transition to showing derived coverage plots at lower resolutions, without the high overhead of dynamically calculating the coverage plot from the alignments. The ability to compose integrated tracks of closely related data, independent of particular input formats, will be extremely useful in other situations, such as a single track combining variant data with background population frequency data. Biologists will also be empowered by enriching feature meta-data to include other attributes, such as description, and status flags in the user's dialog box for editing textual and related identifier information. For example a status flag could be used to signal that a team member requests a review of their annotation. The choice of attributes a curator can edit would be configurable so that each project can decide precisely what meta-data attributes are appropriate for their needs. Other enhancements would offer increased assistance to dispersed research teams, by supporting fine-grained, track-by-track sharing options controlled by the user on the client-side, rather than sharing access coarsely genome-by-genome. This way a researcher can choose with whom to share their individual data tracks (this is available now, but limited to the server side). Most importantly there are several seemingly disparate problems that can be addressed with the same technical solution; challenges such as the fragmented nature of some assemblies, the length of the intronic regions for some genes, and the desire to annotate a single gene family or set of duplicated genes simultaneously. Each of these require that distant regions of the genome be brought into the same visual field - which can be done by synthetically splicing the different regions into a single virtual genome sequence as was done in the Integrated Genome Browser , and which our current team of developers have the expertise to implement. As an open-source project we welcome contributions from the community to address these and other natural enhancements to provide a feature-rich, powerful genomic research environment.
Our two over-arching aims are actually two perspectives on the same work. Integration with related community annotation projects whose aims are complementary will enrich the feature set available to the user. Specific integration examples include: (1) establishing interactive, dynamic re-analysis of a particular genomic region using Galaxy or SeqWare  for example, rerunning with different analysis parameters; (2) placing a newly predicted protein into a protein family using PANTHER services ; (3) using protein family information to examine possible roles a protein may have in particular pathways through interactions with the Reactome pathway annotator ; and (4) offering connections to resources such as WikiGenes  or RFAM:Wikipedia  which focus on capturing more textual types of information.
From a targeted audience point of view actively working with researchers in a wide variety of domains will ensure that Apollo is responsive to biologist's requirements and meets their needs. For smaller genome research investigations ease of installation, an enriched set of annotation capabilities and integration with other community annotation projects are key. We also envision Apollo's increased use in educational and classroom settings. This is one motivation for emphasizing integration with analytical pipeline services such as Galaxy and providing tutorials, training, and annotation guidelines. Lastly, Apollo can support research groups whose focus is exploring genotype to phenotype correlations for the study of human disease. For this group we have already implemented some initial prototypes for enhanced visualization of sequence polymorphisms and variation data, and mockups for allelic frequency and dynamic visualization of the effect or impact a set of variants may have on functional genomic elements. For each of these domains we will continue to take a user-centered design approach and directly engage with the researchers in these areas through future iterations of the framework, as well as with software developers who can contribute to the overall platform.
The current challenge is scaling to accommodate the growing amount of work. These projects must operate using a new paradigm, requiring new software workflows and training in the nuances of genomic annotation. A framework that can enable any individual researcher to generate their own sequence data, run an analysis pipeline using a remote service to analyze their organism of interest, and ultimately generate their own models to publish. Web Apollo represents a major step toward achieving the goal of an integrated genomic analysis environment. It provides a comprehensive toolbox to biologists for manually annotating the features of the genome(s) they are investigating.
Web Apollo is comprised of three components: a web-based client, an annotation editing server, and a server-side data service that provides the client with data from different files and databases (Figure 1). These three software components are open source and available free of charge.
The Annotation-Editing Engine is written in Java. It handles all the necessary logic for editing and deals with the complexities of modifications in a biological context, where a single change can have multiple cascading effects (for example, splitting or merging transcripts). The Annotation-Editing Engine currently supports: (1) adding and deleting transcripts; (2) merging and splitting transcripts; (3) manually setting the translation start for a transcript (otherwise the longest ORF is automatically calculated with every edit); (4) flipping the strand for a transcript; (5) adding and deleting exons from existing transcripts; (6) changing exon boundaries; and (7) merging and splitting exons, including the ability to search for canonical splice sites to create a biologically relevant intron when splitting an exon. The Annotation-Editing Engine uses a plug-in architecture, which assists in the identification of isoforms wherever overlapping transcripts are present; the architecture allows groups to configure customized rules to determine whether two transcripts should come from the same gene or from separate ones. Currently, we provide options for 'no overlap' (every transcript comes from a separate gene regardless of whether it overlaps another transcript), 'simple overlap' (a transcript is considered an isoform if it has any overlap with an existing transcript), and 'ORF overlap' (a transcript is considered an isoform only if it overlaps another transcript's coding region, in the same frame). Lastly, as previously described in the 'Sequence alterations' section of the Results, the Annotation-Editing Engine also supports editing of genomic insertions, deletions, and substitutions.
Edits are stored persistently in the server, allowing users to quickly recover their data in the event of unexpected browser or server crashes. We employ a two-stage editing approach. First, data are stored in a BerkeleyDB database for live edits, which provides very responsive storage and retrieval of annotations. Edit histories are also stored in the BerkeleyDB database. Later, after they have been reviewed, these edits can be exported to different formats for further analysis or for non-Web Apollo specific storage. The data exporters also implement a plug-in based architecture that allows easy addition of new exporters. Currently, we support exporting annotations to FASTA, GFF3, and Chado.
In a multiple user environment, user permissions and authentication are important. The server offers multiple levels of user permissions, allowing project owners to decide with whom to share their work, and whether to allow read-only or read-and-write access. User authentication implements a plug-in based architecture, allowing users to adopt their own authentication back-end if needed. We currently support authentication through either a Web Apollo specific SQL database or through Mozilla's Persona authentication service . The server supports multiple, concurrent users through synchronized updates over multiple browser instances, so that every edit is immediately visible to all users who are viewing or editing the same region. The server employs the Comet model to allow the server to push data to clients in real time. The client and server use a long held HTTP connection and when edits are made, the server pushes these updates to the client without it having to explicitly request them.
The server also allows searching of genomic sequences. Its plug-in based architecture allows any number of searching strategies to be used without having to modify the searching framework. Currently Web Apollo supports BLAT for nucleotide and translating searches.
Server-side genomic data service
The second data service we have implemented is a server-side component called Trellis that supports dynamic queries to genomic data sources over HTTP. Trellis is implemented as a Java servlet and uses plug-in architecture for both data sources and output formats. Data source plug-ins are implemented for directly querying the UCSC MySQL database, the Chado Postgres database, and servers supporting the Distributed Annotation System (DAS) protocol . An output plug-in converts responses to the JBrowse JSON format used by the Web Apollo client. This service is considered dynamic because if the data source is updated with new data, the JSON returned will reflect this.
We tested server installation and the user interface using new genome assemblies and computed evidence data for Apis mellifera (honey bee) and Bombus impatiens (bumble bee), contributed by the Honey Bee and Bumble Bee Genome Sequencing Consortiums. We performed additional testing and created a demonstration instance, available at , using published bovine genome data . The test datasets from real consortiums allowed us to develop solutions to several formatting issues that may otherwise be problematic in future installations. The sources of gene prediction evidence included NCBI Gnomon , Ensembl , GLEAN , MAKER , N-SCAN , Fgenesh, Fgenesh++ [58, 59], Augustus , Geneid , and SGP2 . Protein homolog alignments had been generated by Exonerate . Alignments of Sanger-sequenced ESTs and contigs were generated by Exonerate, GMAP  or Splign . Alignments of RNASeq data were from TopHat .
The original process for setting up the Web Apollo server requires familiarity with server administration, with database administration, and with the applications used by Web Apollo . To facilitate the installation process and assist researchers in overcoming these requirements, we recently developed two solutions. The first is 'GMOD-in-the-Cloud' , a virtual machine for deployment on the cloud, which comes with Web Apollo (among other GMOD tools) already installed. This provides a great solution for researchers who do not have any restrictions on hosting their instances and data elsewhere. In addition, for those who manage sensitive data that may need to be kept away from shared spaces and the cloud, we have provided a virtual machine, which can be deployed locally .
This work was supported by the National Institutes of Health grant numbers 5R01GM080203 from the National Institute of General Medical Sciences and 5R01HG004483 from the National Human Genome Research Institute, and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
We would especially like to thank Mitch Skinner for the development of JBrowse, Nomi Harris for preliminary work on the Web Apollo client, Thomas Down for permitting the reuse of his binary parsing libraries for BAM and BigWig files (Down et al. 2011), and Carson Holt for providing a direct connection to MAKER.
We would also like to thank Sue Brown, Sanjay Chellapilla, Daniel Ence, Juergen Gadau, Nicolae Herndon, Elisabeth Huguet, Carolyn Lawrence, Dan Lawson, Sasha Mikheyev, Barry Moore, Jan Oettler, Xiang Qin, Lukas Schrader, Kim Worley, Mark Yandell, Jing-Jiang Zhou for feedback on server installation, data management, and/or user interface. We thank the Honey Bee and Bumble Bee Genome Sequencing Consortiums for genome assemblies and accompanying computed gene prediction evidence for honey bee (Amel_4.5) and Bombus impatiens (Bimp_1.0), which we used in development and beta testing. We thank Anna Bennett for input on formatting of test datasets.
- Lewis SE, Searle SM, Harris N, Gibson M, Lyer V, Richter J, Wiel C, Bayraktaroglir L, Birney E, Crosby MA, Kaminker JS, Matthews BB, Prochnik SE, Smithy CD, Tupy JL, Rubin GM, Misra S, Mungall CJ, Clamp ME: Apollo: a sequence annotation editor. Genome Biol. 2002, 3: RESEARCH0082-PubMedPubMed CentralView ArticleGoogle Scholar
- Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16: 944-945. 10.1093/bioinformatics/16.10.944.PubMedView ArticleGoogle Scholar
- Eeckman FH, Durbin R: ACeDB and macace. Methods Cell Biol. 1995, 48: 583-605.PubMedView ArticleGoogle Scholar
- Pruitt KD, Tatusova T, Brown GR, Maglott DR: NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012, 40: D130-135. 10.1093/nar/gkr1079.PubMedPubMed CentralView ArticleGoogle Scholar
- Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Garcia-Giron C, Gordon L, Hourlier T, Hunt S, Juettemann T, Kahari AK, Keenan S, Komorowska M, Kulesha E, Longden I, Maurel T, McLaren WM, Muffato M, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, et al: Ensembl 2013. Nucleic Acids Res. 2013, 41: D48-55. 10.1093/nar/gks1236.PubMedPubMed CentralView ArticleGoogle Scholar
- Marygold SJ, Leyland PC, Seal RL, Goodman JL, Thurmond J, Strelets VB, Wilson RJ: FlyBase: improvements to the bibliography. Nucleic Acids Res. 2013, 41: D751-757. 10.1093/nar/gks1024.PubMedPubMed CentralView ArticleGoogle Scholar
- Yook K, Harris TW, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, de la Cruz N, Duong A, Fang R, Ganesan U, Grove C, Howe K, Kadam S, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Nash B, Ozersky P, Paulini M, Raciti D, Rangarajan A, Schindelman G, Shi X, Schwarz EM, Ann Tuli M, Van Auken K, Wang D, et al: WormBase 2012: more genomes, more data, new website. Nucleic Acids Res. 2012, 40: D735-741. 10.1093/nar/gkr954.PubMedPubMed CentralView ArticleGoogle Scholar
- Cherry JM, Hong EL, Amundsen C, Balakrishnan R, Binkley G, Chan ET, Christie KR, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschmann JE, Hitz BC, Karra K, Krieger CJ, Miyasato SR, Nash RS, Park J, Skrzypek MS, Simison M, Weng S, Wong ED: Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 2012, 40: D700-705. 10.1093/nar/gkr1029.PubMedPubMed CentralView ArticleGoogle Scholar
- Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E: The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 2012, 40: D1202-1210. 10.1093/nar/gkr1090.PubMedPubMed CentralView ArticleGoogle Scholar
- Bult CJ, Eppig JT, Blake JA, Kadin JA, Richardson JE: The mouse genome database: genotypes, phenotypes, and models of human disease. Nucleic Acids Res. 2013, 41: D885-891. 10.1093/nar/gks1115.PubMedPubMed CentralView ArticleGoogle Scholar
- Internation Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431: 931-945. 10.1038/nature03001.View ArticleGoogle Scholar
- Church DM, Goodstadt L, Hillier LW, Zody MC, Goldstein S, She X, Bult CJ, Agarwala R, Cherry JL, DiCuccio M, Hlavina W, Kapustin Y, Meric P, Maglott D, Birtle Z, Marques AC, Graves T, Zhou S, Teague B, Potamousis K, Churas C, Place M, Herschleb J, Runnheim R, Forrest D, Amos-Landgraf J, Schwartz DC, Cheng Z, Lindblad-Toh K, Eichler EE, et al: Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 2009, 7: e1000112-10.1371/journal.pbio.1000112.PubMedPubMed CentralView ArticleGoogle Scholar
- Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, McLaren S, Sealy I, Caccamo M, Churcher C, Scott C, Barrett JC, Koch R, Rauch GJ, White S, Chow W, Kilian B, Quintais LT, Guerra-Assuncao JA, Zhou Y, Gu Y, Yen J, Vogel JH, Eyre T, Redmond S, Banerjee R, et al: The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013, 496: 498-503. 10.1038/nature12111.PubMedPubMed CentralView ArticleGoogle Scholar
- Otterlace. [http://www.sanger.ac.uk/resources/software/otterlace/]
- Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E, Searle SM, Clamp M: The Ensembl automatic gene annotation system. Genome Res. 2004, 14: 942-950. 10.1101/gr.1858004.PubMedPubMed CentralView ArticleGoogle Scholar
- Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, et al: GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012, 22: 1760-1774. 10.1101/gr.135350.111.PubMedPubMed CentralView ArticleGoogle Scholar
- Salzberg SL: Genome re-annotation: a wiki solution?. Genome Biol. 2007, 8: 102-10.1186/gb-2007-8-6-r102.PubMedPubMed CentralView ArticleGoogle Scholar
- Huss JW, Orozco C, Goodale J, Wu C, Batalov S, Vickers TJ, Valafar F, Su AI: A gene wiki for community annotation of gene function. PLoS Biol. 2008, 6: e175-10.1371/journal.pbio.0060175.PubMedPubMed CentralView ArticleGoogle Scholar
- Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD: The Pfam protein families database. Nucleic Acids Res. 2012, 40: D290-301. 10.1093/nar/gkr1065.PubMedPubMed CentralView ArticleGoogle Scholar
- Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, Finn RD, Nawrocki EP, Kolbe DL, Eddy SR, Bateman A: Rfam: Wikipedia, clans and the "decimal" release. Nucleic Acids Res. 2011, 39: D141-145. 10.1093/nar/gkq1129.PubMedPubMed CentralView ArticleGoogle Scholar
- Sterck L, Billiau K, Abeel T, Rouze P, Van de Peer Y: ORCAE: online resource for community annotation of eukaryotes. Nat Methods. 2012, 9: 1041-10.1038/nmeth.2242.PubMedView ArticleGoogle Scholar
- Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH: JBrowse: a next-generation genome browser. Genome Res. 2009, 19: 1630-1638. 10.1101/gr.094607.109.PubMedPubMed CentralView ArticleGoogle Scholar
- Kuhn RM, Haussler D, Kent WJ: The UCSC genome browser and associated tools. Brief Bioinform. 2013, 14: 144-161. 10.1093/bib/bbs038.PubMedPubMed CentralView ArticleGoogle Scholar
- Mungall CJ, Emmert DB: A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics. 2007, 23: i337-346. 10.1093/bioinformatics/btm189.PubMedView ArticleGoogle Scholar
- Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L: The distributed annotation system. BMC Bioinformatics. 2001, 2: 7-10.1186/1471-2105-2-7.PubMedPubMed CentralView ArticleGoogle Scholar
- GenBank XML. [http://www.ncbi.nlm.nih.gov/IEB/ToolBox/XML/]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D: BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010, 26: 2204-2207. 10.1093/bioinformatics/btq351.PubMedPubMed CentralView ArticleGoogle Scholar
- Cantarel BL, Korf I, Robb SM, Parra G, Ross E, Moore B, Holt C, Sanchez Alvarado A, Yandell M: MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008, 18: 188-196.PubMedPubMed CentralView ArticleGoogle Scholar
- Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11: R86-10.1186/gb-2010-11-8-r86.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.PubMedPubMed CentralView ArticleGoogle Scholar
- Alkan C, Sajjadian S, Eichler EE: Limitations of next-generation genome sequence assembly. Nat Methods. 2011, 8: 61-65. 10.1038/nmeth.1527.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10: 57-63. 10.1038/nrg2484.PubMedPubMed CentralView ArticleGoogle Scholar
- Honey Bee Genome Sequencing Consortium: Insights into social insects from the genome of the honeybee Apis mellifera. Nature. 2006, 443: 931-949. 10.1038/nature05260.View ArticleGoogle Scholar
- Kirkness EF, Haas BJ, Sun W, Braig HR, Perotti MA, Clark JM, Lee SH, Robertson HM, Kennedy RC, Elhaik E, Gerlach D, Kriventseva EV, Elsik CG, Graur D, Hill CA, Veenstra JA, Walenz B, Tubio JM, Ribeiro JM, Rozas J, Johnston JS, Reese JT, Popadic A, Tojo M, Raoult D, Reed DL, Tomoyasu Y, Kraus E, Mittapalli O, Margam VM, et al: Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle. Proc Natl Acad Sci USA. 2010, 107: 12168-12173. 10.1073/pnas.1003379107.PubMedPubMed CentralView ArticleGoogle Scholar
- Sea Urchin Genome Sequencing Consortium: The genome of the sea urchin Strongylocentrotus purpuratus. Science. 2006, 314: 941-952.PubMed CentralView ArticleGoogle Scholar
- Tribolium Genome Sequencing Consortium: The genome of the model beetle and pest Tribolium castaneum. Nature. 2008, 452: 949-955. 10.1038/nature06784.View ArticleGoogle Scholar
- Werren JH, Richards S, Desjardins CA, Niehuis O, Gadau J, Colbourne JK, Werren JH, Richards S, Desjardins CA, Niehuis O, Gadau J, Colbourne JK, Beukeboom LW, Desplan C, Elsik CG, Grimmelikhuijzen CJ, Kitts P, Lynch JA, Murphy T, Oliveira DC, Smith CD, van de Zande L, Worley KC, Zdobnov EM, Aerts M, Albert S, Anaya VH, Anzola JM, Barchuk AR, Behura SK, et al: Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science. 2010, 327: 343-348. 10.1126/science.1178028.PubMedView ArticleGoogle Scholar
- Bovine Genome Sequencing and Analysis Consortium: The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science. 2009, 324: 522-528.View ArticleGoogle Scholar
- Heliconius Genome Consortium: Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature. 2012, 487: 94-98.Google Scholar
- International Aphid Genomics Consortium: Genome sequence of the pea aphid Acyrthosiphon pisum. PLoS Biol. 2010, 8: e1000313-10.1371/journal.pbio.1000313.View ArticleGoogle Scholar
- Suen G, Teiling C, Li L, Holt C, Abouheif E, Bornberg-Bauer E, Bouffard P, Caldera EJ, Cash E, Cavanaugh A, Denas O, Elhaik E, Fave MJ, Gadau J, Gibson JD, Graur D, Grubbs KJ, Hagen DE, Harkins TT, Helmkampf M, Hu H, Johnson BR, Kim J, Marsh SE, Moeller JA, Munoz-Torres MC, Murphy MC, Naughton MC, Nigam S, Overson R, et al: The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genet. 2011, 7: e1002007-10.1371/journal.pgen.1002007.PubMedPubMed CentralView ArticleGoogle Scholar
- Groenen MA, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, Rogel-Gaillard C, Park C, Milan D, Megens HJ, Li S, Larkin DM, Kim H, Frantz LA, Caccamo M, Ahn H, Aken BL, Anselmo A, Anthon C, Auvil L, Badaoui B, Beattie CW, Bendixen C, Berman D, Blecha F, Bomberg J, Bolund L, Bosse M, Botti S, Bujie Z, et al: Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 2012, 491: 393-398. 10.1038/nature11622.PubMedPubMed CentralView ArticleGoogle Scholar
- Elsik CG, Worley KC, Zhang L, Milshina NV, Jiang H, Reese JT, Childs KL, Venkatraman A, Dickens CM, Weinstock GM, Gibbs RA: Community annotation: procedures, protocols, and supporting tools. Genome Res. 2006, 16: 1329-1333. 10.1101/gr.5580606.PubMedView ArticleGoogle Scholar
- Reese JT, Childers CP, Sundaram JP, Dickens CM, Childs KL, Vile DC, Elsik CG: Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome. BMC Genomics. 2010, 11: 645-10.1186/1471-2164-11-645.PubMedPubMed CentralView ArticleGoogle Scholar
- Loveland JE, Gilbert JG, Griffiths E, Harrow JL: Community gene annotation in practice. Database (Oxford). 2012, 2012: bas009-10.1093/database/bas009.View ArticleGoogle Scholar
- Nicol JW, Helt GA, Blanchard SG, Raja A, Loraine AE: The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics. 2009, 25: 2730-2731. 10.1093/bioinformatics/btp472.PubMedPubMed CentralView ArticleGoogle Scholar
- O'Connor BD, Merriman B, Nelson SF: SeqWare Query Engine: storing and searching sequence data in the cloud. BMC Bioinformatics. 2010, 11 (Suppl 12): S2-10.1186/1471-2105-11-S12-S2.PubMedPubMed CentralView ArticleGoogle Scholar
- Mi H, Muruganujan A, Thomas PD: PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 2013, 41: D377-386. 10.1093/nar/gks1118.PubMedPubMed CentralView ArticleGoogle Scholar
- Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, Jupe S, Kalatskaya I, Mahajan S, May B, Ndegwa N, Schmidt E, Shamovsky V, Yung C, Birney E, Hermjakob H, D'Eustachio P, Stein L: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 2011, 39: D691-697. 10.1093/nar/gkq1018.PubMedPubMed CentralView ArticleGoogle Scholar
- Hoffmann R: A wiki for the life sciences where authorship matters. Nat Genet. 2008, 40: 1047-1051. 10.1038/ng.f.217.PubMedView ArticleGoogle Scholar
- Mozilla Persona. [http://www.mozilla.org/en-US/persona/]
- JSON. [http://www.json.org/]
- Web Apollo Demo. [http://genomearchitect.org/WebApolloDemo/]
- Souvorov A, T T, D L: Eukariotic Genome Annotation with Gnomon - a Multi-step Combined Gene Prediction Tool. ISMB. 2004Google Scholar
- Elsik CG, Mackey AJ, Reese JT, Milshina NV, Roos DS, Weinstock GM: Creating a honey bee consensus gene set. Genome Biol. 2007, 8: R13-10.1186/gb-2007-8-1-r13.PubMedPubMed CentralView ArticleGoogle Scholar
- van Baren MJ, Koebbe BC, Brent MR: Using N-SCAN or TWINSCAN to predict gene structures in genomic DNA sequences. Curr Protoc Bioinformatics. 2007, Chapter 4:Unit 4 8Google Scholar
- Salamov AA, Solovyev VV: Ab initio gene finding in Drosophila genomic DNA. Genome Res. 2000, 10: 516-522. 10.1101/gr.10.4.516.PubMedPubMed CentralView ArticleGoogle Scholar
- Solovyev V: Statistical Approaches in Eukaryotic Gene Prediction. Handbook of Statistical Genetics. Edited by: Balding DJ, Bishop M, Cannings C. 2007, Chichester: John Wiley & Sons, 97-159.View ArticleGoogle Scholar
- Stanke M, Schoffmann O, Morgenstern B, Waack S: Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006, 7: 62-10.1186/1471-2105-7-62.PubMedPubMed CentralView ArticleGoogle Scholar
- Parra G, Blanco E, Guigo R: GeneID in Drosophila. Genome Res. 2000, 10: 511-515. 10.1101/gr.10.4.511.PubMedPubMed CentralView ArticleGoogle Scholar
- Parra G, Agarwal P, Abril JF, Wiehe T, Fickett JW, Guigo R: Comparative gene prediction in human and mouse. Genome Res. 2003, 13: 108-117. 10.1101/gr.871403.PubMedPubMed CentralView ArticleGoogle Scholar
- Slater GS, Birney E: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005, 6: 31-10.1186/1471-2105-6-31.PubMedPubMed CentralView ArticleGoogle Scholar
- Wu TD, Watanabe CK: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005, 21: 1859-1875. 10.1093/bioinformatics/bti310.PubMedView ArticleGoogle Scholar
- Kapustin Y, Souvorov A, Tatusova T, Lipman D: Splign: algorithms for computing spliced alignments with identification of paralogs. Biol Direct. 2008, 3: 20-10.1186/1745-6150-3-20.PubMedPubMed CentralView ArticleGoogle Scholar
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-1111. 10.1093/bioinformatics/btp120.PubMedPubMed CentralView ArticleGoogle Scholar
- Web Apollo Installation. [http://www.gmod.org/wiki/WebApollo_Installation]
- GMOD-in-the-Cloud. [http://www.gmod.org/wiki/Cloud]
- Web Apollo Virtual Machine User Guide. [http://genomearchitect.org/webapollo/virtual_machine/docs/user_guide.html]
- Web Apollo Releases. [http://genomearchitect.org/webapollo/releases]
- Google Code. [http://code.google.com]
- GitHub. [http://github.com]
- Web Apollo. [http://genomearchitect.org]
- Web Apollo User Guide. [http://genomearchitect.org/webapollo/docs/webapollo_user_guide.pdf]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.