DA: In cancer genetics the biggest challenge is integrating data from genome sequencing, transcriptomes and the epigenome, so that it makes sense. The problem is no longer acquiring the data but 'embracing the chaos'. Some of the Boolean logic approaches are making inroads in this area. In mouse genetics, it's all about engineering the mouse genome faster. The TAL nucleases, which can be used to tailor the mouse genome with base-pair precision, potentially represent a big advance in this area, but we need to understand if they have off-target activity. In genome sequencing, read length really matters for assembly and while the last few years of short-read sequencing have been amazing we really need long-read technology that is truly scalable and accurate (and cheap).
BB: The mission of computational biology is to answer biological and biomedical questions by using computation in support of or in place of laboratory procedures, with one goal being to get more accurate answers at a greatly reduced cost. Three major emerging challenges are: how to make sense of massively accumulating data, how to develop reasonable gold standards for testing our algorithms, and how best to integrate computational studies with real biological experiments (on both sides).
The past two decades have seen an exponential increase in genomic and biomedical data, which will soon outstrip advances in computing power to perform current methods of analysis. Extracting new science from these massive datasets will require not only faster computers; it will also require smarter algorithms. Moore's Law has been a great friend of computational biologists: the amount of processing you can do per dollar of compute hardware is more or less doubling every year. Back in the 1990s, the growth rate of genomic data was balanced by the growth rate of computing speeds. However, one way this balance is being disrupted is by the advent of next-generation sequencing. The size of genomic databases is going up by a factor of 10 every year, far outstripping the growth in our computational capacity. It's tempting to think that cloud computing is going to solve this problem, but that's not the case. It doesn't change the problem that the data are increasing exponentially faster than computing power per dollar. The only solution is to discover fundamentally better algorithms for processing these databases. Better algorithms can make an enormous difference. In fact, you've got to devise algorithms that are so fast that, in some cases, they can't even grow linearly in the size of the databases.
Another big challenge in computational biology is the determination of gold-standard datasets for training computational techniques. For example, consider the problem of determining orthology relationships across species. What we really want to identify are functional orthologs (that is, genes that perform the same functions across various species). Direct experimental data about this are scarce. The most commonly available datasets capture this only indirectly by looking at, say, sequence similarity between the genes. There are many computational approaches that use this indirect data to predict such orthology relationships, but determining which one works best is difficult. One direction we have been exploring is using protein and genetic interaction data to improve orthology prediction by better capturing function correspondence. It would really help if we had even a limited set of proteins for which gold-standard orthology information was available. All the computational techniques can then be better trained. However, we are still some distance away from having any gold-standard orthology sets. There are many other problem domains where being able to generate good gold-standard datasets would significantly improve our ability to use computational methods.
The final challenge is the need to improve the integration of biological and computational methods. In some domains, algorithmic thinking is already very tightly integrated into the process of experiment design, execution and interpretation. Genome sequencing is a great example of where such integration has yielded great success. In other cases, however, biological methods use computational analysis only as an afterthought; for example, many studies of cell signaling could benefit greatly from having knowledge of innovative computational techniques applied early in the design stage, so that the right data are available to enable the full power of these methods to be applied. The converse of this criticism also applies to computer scientists. We need to have a better understanding of the subtleties of various biological experiments. Far too often, enough biological details are abstracted away so that the solution loses its biological relevance.
OH: Access to adequate clinical research samples in cancer genetics is one of the most important challenges in cancer genetics. Collection of samples by biopsy or surgical resection has been traditionally performed for clinical care only. It is currently extremely hard to use the same samples for research for various reasons, from preparation methods, to logistic or consent. Several institutions like the University of California, San Diego are developing master protocols to systematically consent a majority of oncology patients and collect samples from surgery or biopsy for investigational purposes. The resistance is high, in general legitimated by the patient's protection, but people start to understand that it is the only way to eventually deliver the promises of personalized diagnostics and care.
Another challenge is to educate people about genomics and to tone down the natural hype of the genomics field. Investigators involved in clinical and translational projects are in some way victims of the hype created by the fantastic and recent technological advances. I frequently talk to clinicians who are enthusiastic about sequencing their samples, but many projects often fall short due to the ignorance of the requirement of sample number or quality. For example, there is no good rationale to sequence the whole genome of thousands of samples except to make it a general resource for the community. Biological questions might be better addressed with a more focused approach, such as sequencing exons or candidate regions in properly selected patients. The hype of the sequencing field is a wonderful catalyzer of novel ideas and provides much needed public exposure of our field, but we have to regularly educate prospective collaborators on basic notions of genetics or the reality of the sample preparation or data analysis. At our institution, my function in the University of California, San Diego, Clinical and Translational Research Institute (a Clinical and Translational Science Awards funded entity)  is to do just that: consult with people and help them with the design, preparation and analysis of their translational genomic experiments.
Finally the last challenge is to transform the academic review system in our institutions. Traditional institutions expect faculty to lead independent projects typically funded through the R01 NIH grants. However, genomics has traditionally functioned differently, following the principle of team science, where multiple principle investigators contribute to a large endeavor. This was the case for the Human Genome Project and The HapMap project, and today the 1000 Genome Consortium and The Cancer Genome Atlas, for example. This 'big science' is usually financed through alternative sources of funding requiring collaborations and multiple principle investigators, and the results do not always lead to first or last author publications for the majority of the participants despite their essential roles. Traditional institutions that promote faculty based on R01 awards and last author publications do not always recognize this aspect. This divergence does not favor the retention of brilliant researchers in academic genomic research. Some institutions, such as Harvard or The Ontario Institute for Cancer Research, have established alternative academic review criteria that recognize participation in team science and allow investigators to successfully grow in this environment. At the time when funding is becoming scarce and more directed to specific projects, let's hope that more institutions will follow these examples.
CH: (i) Understanding the systems-level ecological rules governing microbial community structure, (ii) relating the human microbiome to health and disease, and (iii) streamlining methods for turning next-generation data into actionable biology. Addressing the combination of the first two challenges will help us realize some of the human microbiome's potential as a means of diagnosis and therapeutic intervention. Investigating the first challenge in particular should let us leverage systems biology's successes in molecular biology during studies of microbial communities. The second will likewise feed back into the broader metagenomics community by identifying 'interesting' microbiome properties, environments and phenotypes on which to focus. Finally, the third challenge includes finding ways to collaborate on biological 'big science' projects, to collectively analyze sequence data (of all sorts, not just metagenomic), and to leverage shared computing resources. All of these continue to be necessary to solve the considerable data management and interpretation challenges brought about by next-generation sequencing technologies. These technologies will continue to accelerate biological discovery - but there are still many opportunities for computational methods to accelerate that acceleration.
SL: Now that it's possible to profile transcription factor binding using ChIP-seq, the ability to predict the target genes and the direction of their expression changes upon factor activation or inactivation is still an important challenge. For factor binding, there are often thousands of genes nearby binding, but only a minority of the nearby genes really show differential expression and we don't know why. Also, for transcription factors with multiple functions such as CTCF (for example, transcriptional repressor and insulator) the challenge is whether we can differentiate their functions from ChIP-seq of other factors or histone marks. Approaches such as HiC and ChIA-PET can identify genome-wide higher-order chromatin interactions, which have the potential to answer this question, although there are still technical and cost challenges for these techniques to be widely adopted.
Performing ChIP-seq or DNase-seq with a small amount of starting material is a challenge. Currently one needs 100,00 to 500,000 cells to do a histone mark ChIP-seq, and 1 million to 2 million cells for transcription factor ChIP-seq. To make ChIP-seq or DNase-seq work well on tissues or tumors, it is important to start from smaller numbers of cells. The laboratories of Peggy Farnham  and Brad Bernstein , and many other laboratories, have explored this issue. Recently the Gronemeyer group published a new method to linear amplify picogram DNA . Commercial companies like Illumina are developing kits for library construction from <1 ng of DNA, and third-generation sequencing techniques promise to offer a better solution to working with small amounts of starting material.
Finally, there are many transcription factors, chromatin-modifying enzymes and histone marks functioning together to regulate gene expression. The specificity (for example, which transcription factors specifically recruit which histone marks or histone modifying enzymes) and the cooperativity (for example, which factors are pioneering factors for the binding of other factors) of these factors in different cells or conditions are still poorly understood. Without understanding this, the effect of epigenetic drugs could be hard to interpret. As sequencing technologies increase throughput, multiplex ChIP-seq would allow us to investigate many more conditions in combination and we might have a better answer for this question.
CM: In my specific area of interest, genetic interaction networks, there are a few challenges we face as a community. (i) Scalable technology for mapping genetic interactions for other phenotypes, conditions and organisms, especially higher eukaryotes. The yeast community has been very successful in the past several years at developing technology for rapid construction of combinatorial mutants to map genetic interactions. Specifically, the typical approach is to look for combinations of mutations that result in a surprising phenotype (usually fitness defect) given the phenotypes of the mutations introduced independently. These efforts have produced global interaction maps covering millions of combinatorial mutants, which have proven to be quite useful for understanding gene function and general organization of the cell. In yeast, efforts are underway to expand these maps to other phenotypes and other conditions; this requires new scalable technologies given the space of possible experiments. Such maps in higher eukaryotes will be important for understanding the genetic basis for complex phenotypes and disease and developing new therapeutic approaches, but the technology for mapping these interactions is still relatively limited in throughput. Several exciting efforts are underway to address this challenge, most of them leveraging RNA interference technology. The past year has produced new successes in Drosophila and human cell lines but continued focus on improving and scaling the technology will be fruitful. (ii) Translating insights about genetic interactions from perturbation studies to questions in population genomics. The focus of the genetic interaction community has largely been on precise combinatorial genetic perturbation in single individuals (for example, standard lab strains). This approach is attractive because the effects of perturbations can be studied in a controlled genetic background. However, we would ultimately like to leverage this knowledge about how genetic variations combine to influence phenotype to understand the link between genotypic and phenotypic variation across individuals in a population. The latter challenge is of course the main goal of genome-wide association studies in humans; to date these have struggled to explain large portions of the heritable phenotypic variation. Applying insights derived from large-scale perturbation studies to the population genomics questions will be an interesting direction, especially as the mapping technologies become more feasible in higher eukaryotes. There are new opportunities and the necessary data to make progress on this front in yeast with the recent sequencing and phenotyping of several Saccharomyces cerevisiae strains. The combination of this information on genomic and phenotypic variation, combined with extensive functional studies on the reference genome, will provide a good testing ground for new methods in this area. (iii) Leveraging functional genomic data across species. As I noted above, a more general challenge is the problem of leveraging functional genomic data across species to speed the process of functional characterization. Even the most basic question in systems biology, 'What are all of the genetic components related to biological process X?', has not been answered comprehensively in most species, particularly in higher eukaryotes. Enormous resources have been spent generating functional data in model organisms, but these data are relatively underutilized for mapping functions in other species. The paper from McGary et al.  provides a nice demonstration of how insights from relatively data-rich model systems can be used to direct experimental investigation of genes related to specific phenotypes in more complex organisms, and I suspect similar approaches can be developed in other settings. Accomplishing this will require new computational infrastructure and tools to support integration and comparative analysis of functional genomic data.
AO: Getting a handle on the propensity and type of RNA editing that is occurring is a fascinating area which, as yet, has not been fully resolved . It has been documented that the sequence of RNA can be modified post-transcriptionally, resulting in an RNA sequence that is different from the DNA from which it was derived. High-throughput sequencing technologies give us the opportunity to study RNA editing on a genome-wide scale and there have been several publications recently on this topic [41, 42]. However, there is quite a debate about how frequently this actually occurs. In my view, results are probably influenced by biases in mapping procedures (see Joe Pickrell's blog  and the recent paper by Schrider et al. ) and it will be fascinating to see how this debate gets resolved in the near future and what the results mean for the diversity of the transcriptome.
There are many projects producing massive amounts of sequencing data. One of the major scientific challenges right now is the integration of different types of data to explain a biological phenomenon. For example, the ENCODE project is producing genome-wide expression, trascription factor and epigenetic data on many different cell types . Making sense of even just a small fraction of these data sets is extremely challenging and will require major breakthroughs in analysis and interpretation. In particular, I believe the integration of epigenetic and expression data will be a major challenge over the next few years and there are many specific questions that are unresolved. There are two questions that I think are particularly interesting in the area of data integration. (i) How can we describe the epigenetic landscape and how is it related to development and disease? There are over 100 epigentic histone modifications known to date and more are being discovered all the time. Therefore, there are millions of possible epigenetic combinations that could be predictive of expression and function, and most probably only a small fraction of these are important, however. Recently there has been some excellent work published on combining epigenetic marks to annotate the genome [46, 47]. (ii) How is alternative spicing controlled and what is the role of epigenetics? Next-generation sequencing has shown that many genes in the genome have multiple isoforms; however, the mechanisms that control the switching between alternative transcripts are not well understood. Recently there have been extremely fascinating observations that show an important role for epigenetics in controlling splicing events (for example, [48, 49]). There is still a long way to go in order to try and integrate epigenetic and expression data on a genome-wide scale.
JR: (i) What properties of large non-coding RNA genes would identify subfamilies and classes? Imagine the text book had already been written for non-coding RNAs and someone recently discovered protein genes. One of the first things to do is identify functional domains (for example, helix-loop-helix, DNA-binding domains, and so on) that could be extrapolated to families related by functional properties. With RNA it's a bit trickier but initial progress is being made for large non-coding RNAs using co-expression with proteins, a process termed 'guilt by association'. We recently got a glimpse of some of the first emerging global properties after mapping and characterizing 8,000 long non-coding RNAs . They are strikingly more tissue-specific than protein-coding genes, an interesting feature that we could potentially use in medical diagnostics. (ii) Why is there so much non-coding RNA? It's clear that there are numerous functional large non-coding RNAs but almost the entire genome is transcribed. Progress is being made by more global loss-of-function and gain-of-function experiments. (iii) What do these non-coding RNAs do and how do they do it? We have identified an emerging theme of non-coding RNA interacting with proteins and modulating their function. These RNA-protein complexes are important for maintaining cellular identity. We need to further understand the structural and functional elements that drive these interactions. If we could learn how these RNAs work, we could envision engineering them to guide stem cells into distinct cell types.
MW: (i) Single cell systems biology: many labs are making good progress toward this, from cell biological imaging screens, to measuring parameters in single cells. This will result in high resolution of biological information and will provide important insights into cell-to-cell variability in cellular networks. (ii) Dynamic networks: currently, many available networks collapse all measured interactions into a single graph. However, it is clear that only parts of the network are active in different cells or under particular conditions. Therefore, we need to start including spatiotemporal components of networks and their activity. Visualization is an important component of this, as is measuring reaction kinetics and the concentration of different biomolecules. (iii) Developing integrative networks that combine metabolism, protein-protein interactions, genetics and regulatory networks. So far, most studies focus on a single type of network. However, it is clear that many biological processes are controlled by a flow of information through different types of molecules, and thus networks, and often result in differences in cellular and organismal metabolism. To better capture the events important to a biological process, it will be important to combine all relevant, active networks into a single graph