NetGrep: fast network schema searches in interactomes
© Banks et al.; licensee BioMed Central Ltd. 2008
Received: 11 May 2008
Accepted: 18 September 2008
Published: 18 September 2008
NetGrep (http://genomics.princeton.edu/singhlab/netgrep/) is a system for searching protein interaction networks for matches to user-supplied 'network schemas'. Each schema consists of descriptions of proteins (for example, their molecular functions or putative domains) along with the desired topology and types of interactions among them. Schemas can thus describe domain-domain interactions, signaling and regulatory pathways, or more complex network patterns. NetGrep provides an advanced graphical interface for specifying schemas and fast algorithms for extracting their matches.
High-throughput experimental and computational approaches to characterize proteins and their interactions have resulted in large-scale biological networks for many organisms. These complex networks are composed of a number of distinct types of interactions: these include interactions between proteins that interact physically, that participate in a synthetic lethal or epistatic relationship, that are coexpressed, or where one phosphorylates or regulates another (for a review, see ). Though incomplete and noisy, these networks provide a holistic view of the functioning of the cell, and with appropriate computational analysis and experimental work have significant potential for helping to uncover precisely how complex biological processes are accomplished.
We have developed a network analysis system based on querying interactomes using templates corresponding to network patterns of interest. Searching for recurring patterns in biological data has been the backbone of much research in computational biology; for example, within the context of sequence analysis, it has given rise to extensive work on sequence alignments and sequence motif discovery and has resulted in large sequence motif libraries. Not surprisingly, within the burgeoning field of biological network analysis, considerable effort has been focused on uncovering recurring patterns within interactomes. Mapping homologous proteins with conserved interaction patterns in different interactomes has revealed shared modules and complexes recurring across a range of organisms [2–6]. Analysis of the wiring diagrams of interactomes has uncovered network motifs that occur more frequently than expected by chance [7–13]. Additionally, there has been much work on uncovering recurring domain-domain interactions in physical interactomes [14–23], both to suggest a physical basis for known interactions and to help predict new interactions. Most closely related to the work described here are previous attempts to query biological networks using particular user-supplied subgraphs [24–29].
In addition to allowing a broad range of network schema queries, NetGrep has an easy-to-use graphical interface for inputting schemas. For each user-input schema, NetGrep finds all of its matches in the chosen interactome. Although the search problem is a case of the computationally difficult subgraph isomorphism problem, we have been able to develop algorithms that take advantage of schema characteristics for biological networks. As a result, NetGrep's core algorithms are extremely fast in practice for queries with up to several thousand matches in the interactomes studied. Though speed is useful for individual user queries, it also makes it possible to systematically enumerate and query many network interaction patterns. For example, here we have systematically tested NetGrep's underlying algorithms by enumerating >100,000 schema queries with proteins described via GO molecular function terms and have found that for schemas with up to tens of thousands of matches, NetGrep can rapidly uncover all instances. Our algorithms can thus enable new analysis that characterizes networks with respect to the types and numbers of interaction patterns found (for example, see ).
Relationship to previous work
There are several previously developed tools for querying biological networks. While none of them have the functionality of NetGrep, we briefly review them here. Previous approaches fall broadly into the categories of network alignment, network motif finding, and specific subgraph queries, although these categories overlap.
Network alignment tools [4, 5, 37, 40] align protein-protein interaction networks by combining interaction topology and protein sequence similarity to identify conserved pathways. These tools can be used to identify schemas for which the criterion for matching a query protein to a target protein is sequence similarity. Network alignment has also been applied to metabolic networks , with proteins characterized by their enzyme classification. Algorithmically, these approaches are designed for aligning entire interactomes, and several of them are based on local alignments based on simpler linear or tree topologies. NetGrep in contrast is developed and optimized for general network schema queries, and has faster algorithms for the task at hand.
Several tools exist for uncovering network motifs or over-represented topological patterns in graphs [41, 42], and these could be used to find schemas consisting solely of unannotated proteins. These approaches do not, however, provide a mechanism for utilizing specific protein annotations, nor do they allow user defined queries. We note that while NetGrep can obtain instances to network motif queries, our algorithms are optimized for schemas utilizing protein descriptions and with up to tens of thousands of instances. Alternative algorithms, specifically developed for counting or approximating the total number of instances of network motifs [43, 44], may be more suitable if network motif queries are desired.
Allows arbitrary protein annotations
1 per node
Boolean combination of annotations
Multiple edge types in a network
Boolean combination of edge types
UI for searching/choosing annotations
Can be used with Cytoscape
Can be used as a standalone
Custom data sets provided
We have implemented NetGrep in Java so that it is easily portable among different operating systems. Users have the option of running a feature-limited version of the software on our server  or of downloading the fully featured program and running it locally. NetGrep can be used both as a standalone application or in conjunction with Cytoscape as a plugin if visualization of the results in network form is desired. A detailed description of how to use NetGrep is provided online . More formal descriptions of schemas, their instances in the interactome, and the algorithms used to uncover the instances are given in the 'Model and algorithm' section below.
Packaged data files
Data files are provided for the following model organisms to be used with NetGrep: S. cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. These files contain all the information necessary to run NetGrep, including protein information (names and aliases), interaction maps, and protein features.
One important feature of NetGrep is that none of the data are hard-coded into the program. Users can therefore use any node features or edge types desired when constructing networks; for example, custom or newly defined interaction types can be added. Additionally, creating data files for other, non-supported organisms is a straightforward process.
Describing proteins and interactions
Specifying inexact matches
The schemas described thus far are rigid in their structure. Occasionally, a user might prefer to specify that any number of proteins with a particular feature set interact in a cascade or that a given node in the schema not be absolutely required. NetGrep achieves this flexibility by allowing nodes in the schema to be designated as optional. When a schema contains an optional node, NetGrep will find matches both with and without the given protein. For example, to represent a signaling pathway as 'a protein in the membrane, which interacts with a succession of between one and three kinases, the last of which interacts with a transcription factor', one would build the given linear five-node pathway and designate two of the kinases as optional (Figure 2a). NetGrep would then find all three-, four-, and five-node matches within the network. Note that single nodes with more than two interactions cannot be designated as optional. When an optional node has two interactions, the interaction types are logically ORed for instances of the schema that have the optional node excluded.
Similarly, a significant problem with current interaction datasets is that they are incomplete. NetGrep provides a solution to this difficulty by also allowing interactions in a schema to be designated as optional. When a schema contains an optional interaction, NetGrep will allow matches even if the given interaction is not found in the network.
Matches and reliabilities
NetGrep has a user-set threshold that limits the number of matches reported for an input schema (Figure 3b). As a typical user is not expected to look through tens of thousands of matches, this threshold can be as low as 100 and as high as 50,000. For faster run times, a lower threshold is recommended; additionally, the threshold limits memory usage. Alternatively, if the total number of instances is greater than the highest allowed threshold, there is an advanced (somewhat slower) option that computes the total number of instances but does not explicitly enumerate them.
The instances of a query schema are returned by NetGrep, up to the user-defined threshold, and are sorted according to how confident we are of the underlying interactions. In particular, for each pair of proteins, we have a single precomputed reliability value between 0 and 1 that assesses how likely these two proteins are to interact (see 'Interaction reliabilities' section below). For each of the matches found by NetGrep, its overall reliability is computed by multiplying together the reliabilities corresponding to protein pairs that have interactions in the matches. The matches are sorted based on the negative log of this value, beginning with the most reliable (Figure 3d).
Running time comparisons
Running time (s)
Signaling pathway 1
Signaling pathway 2
SH3 domain interaction
ACT1 genetic interaction
GO MF running time comparisons
Running time (s)
GO:0003677, GO:0004386, GO:0004672
GO:0004386, GO:0004672, GO:0030528
GO:0003723, GO:0003723, GO:0003723
GO:0004386, GO:0003677, GO:0016874, GO:0016829
GO:0016787, GO:0030234, GO:0005515, GO:0008233
GO:0003677, GO:0003723, GO:0005515, GO:0005198
GO:0016787, GO:0005198, GO:0003677, GO:0016779
GO:0016787, GO:0016740, GO:0016779, GO:0030528
GO:0008233, GO:0016874, GO:0030234, GO:0005215
GO:0005515, GO:0004721, GO:0008233, GO:0016740
GO:0005515, GO:0008233, GO:0005198, GO:0005215
GO:0030528, GO:0005515, GO:0016740, GO:0005215
GO:0016740, GO:0005515, GO:0030528, GO:0005215
Model and algorithm
We give a formal specification of the problem. Let L be the set of possible protein labels (for example, Pfam motifs, protein IDs, and so on) and let T be the set of possible edge types (for example, physical, regulatory, and so on). An interaction network is represented as a mixed graph G = (V N ,E N ,A N ). V N is the set of vertices, with a vertex v ∈ V N for each protein. E N ⊆ V N × V N is the set of undirected edges, and A N ⊆ V N × V N is the set of arcs or directed edges. Vertices correspond to proteins and edges and arcs correspond to interactions. Each vertex v in the interaction network is associated with a set of features l(v) ⊂ L (specifying protein features), each edge (u,v) is associated with a set of types t e (u,v) ∈ T (specifying the undirected interactions between the proteins), and each arc (u,v) is associated with a set of types t a (u,v) ∈ T (specifying the directed interactions between the proteins). If there is no edge between u and v, t e (u,v) = ∅, and if there is no arc between u and v, t a (u,v) = ∅.
A network schema is a mixed graph H = (V S ,E S ,A S ) such that: (1) each vertex v ∈ V S is associated with description set D v such that each d ∈ D v is a subset of L (in NetGrep, the set D v is constructed via individual protein features in L and utilizing either intersections or unions over these features; for example, for a particular vertex v ∈ V S , if a union is taken over individual feature types, D v consists of singleton sets consisting of each of these features; note that D v can consist of one set, the emptyset, in the case of a wildcard vertex); (2) for every pair of vertices u and v such that (u,v) ∈ E S ∪ A S , there is an associated description set D'u,v⊂ T (in NetGrep, the set D'u,v is constructed via individual interaction types, and requiring either all of them, or just one of them; for example, for a particular pair of vertices u and v with desired edges or arcs between them, if all interactions are required, then D'u,vconsists of a single set consisting of all desired interaction types).
An instance of a network schema H in an interaction network G (that is, a match in the network for the schema) is a subgraph (V I ,E I ,A I ) where V I ⊂ V N , E I ⊂ E N , and A I ⊂ A N such that there is a one-to-one mapping f:V S →V I where: (1) for each v ∈ V S , there exists a d ∈ D v such that d ⊂ l(f(v)); (2) for each pair of vertices u,v ∈ V S with (u,v) ∈ E S ∪ A S , there exists a d' ∈ D'u,vsuch that d' ⊂ (t e (f(u),f(v)) ∪ t a (f(u),f(v))). Note that two distinct instances of a schema may share proteins and/or interactions; however, any two instances must differ in at least one protein. Network schemas are used to interrogate the interaction network for sets of proteins that match this description.
For each pair of proteins, we estimate the reliability of their having any interaction between them. In particular, we first partition all the observed underlying interactions in the interactome into several experimental groups. The reliability of each experimental group i is then evaluated as follows. For experiments determining non-genetic interactions, the reliability is estimated based on 'functional coherence' by computing s i as the fraction of interactions in that group that are between proteins sharing a high-level GO biological process slim term  (only pairs of interacting proteins that both have GO slim annotations are considered). We note that we do not use the functional coherence measure to assess genetic interaction experiments, as these types of interactions can bridge between pathways . Instead, for these experiments, the reliability is estimated based on a '2-hop' topological measure that has been shown to be highly predictive of genetic interactions . In particular, the reliability s i for an experimental group determining genetic interactions is estimated by computing the fraction of interactions in that group that additionally have paths of length two between them in the full interactome where either both interactions are genetic interactions or where one is a genetic interaction and the other is a physical interaction. Then, for a pair of proteins u and v, we consider all interactions j found between them, and treat them as independent events. The reliability r(u,v) between u and v is then computed as:
r(u,v) = 1 - Π j (1 - sg(j))
where j ranges over all interactions linking proteins u and v, and g(j) gives the experimental group of interaction j. If no interactions exist between the two proteins, r(u,v) = 0. This noisy-or scheme is similar to the one used for reliability estimation in [56, 57].
We partition our interactions into the following experimental groups. For physical and genetic interactions, there is one group for each individual high-throughput physical and genetic interaction experiment (defined as those that discover at least 50 interactions). All small-scale physical interaction experiments (defined as those that discover fewer than 50 interactions) are considered as belonging to a single group. Similarly, small-scale genetic interaction experiments are considered a single group. Experiments are identified by the combination of 'Experimental System' and 'PubMed ID' as reported by the BioGRID . All phosphorylation interactions in  are considered in one group. In the case of interactions that are associated with continuous numerical data, such as coexpression interactions (associated with the correlation coefficient) and regulatory interactions  (associated with the p-value for the binding), we assign each interaction to one of 20 uniform bins associated with the numerical data, and consider each bin as a separate group.
Searching for schemas
Finding the matches for a particular schema in a network corresponds to the computationally difficult subgraph isomorphism problem. A number of sophisticated algorithmic approaches for closely related problems on biological networks have been introduced earlier (for example, utilizing color coding ). Here, we obtain fast matches in practice utilizing a few key ideas. First, we pre-process the interactome to build fast look up tables mapping protein and interaction type labels to proteins associated with the labels. For each node in a schema, this allows us to quickly enumerate the set of all proteins that match the node's feature set. Second, we utilize the labeled schema nodes and schema edges to prune the search space. In particular, we constrain the proteins in each node match set by determining interaction matches along each edge in the schema. Finally, these interactions are cached for fast lookup in the last step, in which we enumerate the considerably smaller search space, and construct the full list of matches. We describe these steps in more detail below.
We first pre-process the interactome to maintain two hashes that map labels to proteins associated with those labels. HASH F maps protein features to sets of vertices described by those features (for example, all kinases), and HASH T maps edge types to pairs of proteins connected by an edge annotated with the types (for example, all proteins with physical interactions). For directed edge types, there are two separate entries in HASH T , one for each direction of the edge (for example, one for all kinases and one for all substrates). These hashes are used to quickly build, for any schema, its matches edge by edge.
When searching for instances of a particular schema, we associate with each node v in the schema a set of node matches NMATCH v , which contains all of the proteins in the interaction network that are described by that particular schema node (that is, the proteins that could be a match to that schema node). Specifically, we use HASH F to initialize NMATCH v with all the proteins that match v's feature set. When features are combined with a boolean AND, we take the intersection of the protein sets from HASH F , and when they are combined with a boolean OR, we take the union of the protein sets. For each edge e = (u,v) in the schema that has a single type (that is, is not composed of a boolean combination of types) or for which all edge types are required (that is, types are combined by a logical AND), we use HASH T to trim the proteins in each node match set. For example, if schema node v is connected by a physical edge, then we can remove all proteins from NMATCH v that are not found in the set from HASH T corresponding to all proteins in the network connected by a physical edge.
We next prune the sets of node matches as follows, or until any of them becomes empty (at which point we know that there are no matches to the query in the network). For each edge e = (u,v) in the schema, we use the network interaction map to remove all proteins from NMATCH u that do not interact with any of the proteins in NMATCH v given e's specified type. Although we could repeat this pruning step after each edge is processed, we have found it to be unnecessary because of two additional optimizations that we introduce. First, as we iterate through the edges in this step, we start with those edges whose endpoints contain the smallest sets of node matches and we progress in order; this optimization helps to reduce the size of the larger node match sets early on in the process. That is, we rank schema nodes based on the size of their node match sets, start with the node with the smallest node match set, and consider its edges first, starting with the neighbor with the smallest node match set. We then consider the node with the next smallest node match set, and so on. Second, as we iterate through the schema edges, we cache the matches for each edge, so that they can be quickly accessed in the next step where we find the actual matches. Note that this pruning step is skipped with optional nodes because edges connected to those nodes are not required. This pruning step is also skipped for edges if their match bins are too large (>1,000).
To find the sets of proteins that match the given schema, we iterate through each of the node match sets from smallest to largest, constructing matches as we go along. We note that this search order over the nodes provides a significant speed-up over a simpler approach that performs depth-first search from an arbitrary starting node in the schema. As we iterate through the nodes, for each protein p in a given match set representing node v in the schema, we constrain each larger match set representing node u in the schema as follows: if u and v are connected by an edge in the schema, we eliminate all proteins in u's match set that do not interact with p (using the cached matches from the pruning step above). Furthermore, we remove p from u's set if it is there (that is, we do not allow the same protein to occur in multiple positions of a match). We then set p as the matching protein at schema node v for this particular set of matches and traverse to the next largest node match set. Once a complete match to a schema is found, we backtrack and continue the search process.
If at any point the number of matches to a schema exceeds the user-defined threshold (Figure 2b), the search is terminated and NetGrep returns just those matches found up to that point. Once all matches to a schema are found, they are sorted by their interaction reliability, as described above.
When a schema displays an inherent symmetry, it is often the case that the same set of proteins redundantly occurs in multiple instances. Consider, for example, the symmetric linear three-node schema A-B-A, where the edges are undirected, and the first and last nodes have identical feature sets and are symmetric around the middle node. One might find among the matches of this schema the proteins p1-p2-p3 and p3-p2-p1. NetGrep is able to determine that a given schema is symmetric and excludes these superfluous matches from the results returned by the search. The test for symmetry exploits the fact that for any two given nodes in a schema to be symmetric they need to have the exact same feature set and degree; for all pairs of nodes u and v in the schema for which this is true, the algorithm recursively checks all pairs of nodes connected to these two target nodes (that is, one connected to u and one connected to v) for symmetry, following any given edge just one time. This is equivalent to a depth first search over the schema. The base case in the recursive algorithm occurs when two target nodes are connected to each other or when they are connected to the same node.
If a query is determined to be symmetric, redundant matches are ignored during the search. To accomplish this task, each protein in the interaction network is first assigned an arbitrary unique ID number, as are each of the nodes in the query schema. Then, for any two symmetric nodes A and B in a query schema where the ID of A is smaller than the ID of B, we require that the ID of any protein matching node A be smaller than the ID of a protein matching node B in any given instance. All instances for which this requirement is not met for each of the symmetric nodes are ignored.
We have introduced NetGrep, a powerful Java system for searching protein interactomes for instances of user-supplied labeled subgraphs, or network schemas, and have provided fully-featured data files for several organisms. NetGrep allows a wide-range of possible queries that supersede many previously studied interaction patterns. Finally, we have described an algorithm for solving the labeled subgraph isomorphism problem that is fast and effective in practice for biological networks.
Availability and requirements
Project name: NetGrep
Project home page: 
Operating systems: Windows, Mac OS, Linux
Programming language: Java
Other requirements: Java 1.5 or higher
License: Open source with GNU General Public License
Additional data files
The following additional data are available. Additional data file 1 is a table listing the GO molecular function slim terms used in the systematic testing of our program.
MS thanks the NSF for grants MCB-0093399 and CCF-0542187, and the NIH for grants CA041086 and GM076275. EB is supported by the Quantitative and Computational Biology Program NIH grant T32 HG003284. This research has also been supported by the NIH Center of Excellence grant P50 GM071508. The authors thank the members of the Singh group for many helpful discussions.
- Zhu X, Gerstein M, Snyder M: Getting connected: analysis and principles of biological networks. Genes Dev. 2007, 21: 1010-1024. 10.1101/gad.1528707.PubMedView ArticleGoogle Scholar
- Kelley BP, Sharan R, Karp RM, Sittler T, Root DE, Stockwell BR, Ideker T: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci USA. 2003, 100: 11394-11399. 10.1073/pnas.1534710100.PubMedPubMed CentralView ArticleGoogle Scholar
- Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, Sittler T, Karp RM, Ideker T: Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA. 2005, 102: 1974-1979. 10.1073/pnas.0409522102.PubMedPubMed CentralView ArticleGoogle Scholar
- Koyutürk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A: Pairwise alignment of protein interaction networks. J Comput Biol. 2006, 13: 182-199. 10.1089/cmb.2006.13.182.PubMedView ArticleGoogle Scholar
- Flannick J, Novak A, Srinivasan B, McAdams H, Batzoglou S: Graemlin: general and robust alignment of multiple large interaction networks. Genome Res. 2006, 16: 1169-1181. 10.1101/gr.5235706.PubMedPubMed CentralView ArticleGoogle Scholar
- Singh R, Xu J, Berger B: Pairwise global alignment of protein interaction networks by matching neighborhood topology. Proceedings of the 11th International Conference on Research in Computational Molecular Biology (RECOMB): Oakland, CA, USA; 21-25 April 2007. Edited by: Speed TP, Huang H. 2007, New York: Springer, 4453: 16-31. [Lecture Notes in Computer Science]View ArticleGoogle Scholar
- Shen-Orr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002, 31: 64-68. 10.1038/ng881.PubMedView ArticleGoogle Scholar
- Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U: Network motifs: simple building blocks of complex networks. Science. 2002, 298: 824-827. 10.1126/science.298.5594.824.PubMedView ArticleGoogle Scholar
- Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber G, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002, 298: 799-804. 10.1126/science.1075090.PubMedView ArticleGoogle Scholar
- Yeger-Lotem E, Sattath S, Kashtan N, Izkovitz S, Milo R, Pinter RY, Alon U, Margalit H: Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc Natl Acad Sci USA. 2004, 101: 5934-5939. 10.1073/pnas.0306752101.PubMedPubMed CentralView ArticleGoogle Scholar
- Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004, 431: 308-312. 10.1038/nature02782.PubMedView ArticleGoogle Scholar
- Zhang LV, King OD, Wong SL, Goldberg DS, Tong AH, Lesage G, Andrews B, Bussey H, Boone C, Roth FP: Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J Biol. 2005, 4: 6-10.1186/jbiol23.PubMedPubMed CentralView ArticleGoogle Scholar
- Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, Fasolo J, Guo H, Jona G, Breitkreutz A, Sopko R, McCartney RR, Schmidt MC, Rachidi N, Lee SJ, Mah AS, Meng L, Stark MJ, Stern DF, De Virgilio C, Tyers M, Andrews B, Gerstein M, Schweitzer B, Predki PF, Snyder M: Global analysis of protein phosphorylation in yeast. Nature. 2005, 438: 679-684. 10.1038/nature04187.PubMedView ArticleGoogle Scholar
- Sprinzak E, Margalit H: Correlated sequence-signatures as markers of protein-protein interaction. J Mol Biol. 2001, 311: 681-692. 10.1006/jmbi.2001.4920.PubMedView ArticleGoogle Scholar
- Gomez SM, Lo SH, Rzhetsky A: Probabilistic prediction of unknown metabolic and signal-transduction networks. Genetics. 2001, 159: 1291-1298.PubMedPubMed CentralGoogle Scholar
- Wojcik J, Schäcter V: Protein-protein interaction map inference using interacting domain profile pairs. Bioinformatics. 2001, 17 (Suppl 1): S296-S305.PubMedView ArticleGoogle Scholar
- Deng M, Mehta S, Sun F, Chen T: Inferring domain-domain interactions from protein-protein interactions. Genome Res. 2002, 12: 1540-1548. 10.1101/gr.153002.PubMedPubMed CentralView ArticleGoogle Scholar
- Giot L, Bader J, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao Y, Ooi C, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, et al: A protein interaction map of Drosophila melanogaster. Science. 2003, 302: 1727-1736. 10.1126/science.1090289.PubMedView ArticleGoogle Scholar
- Pagel P, Wong P, Frishman D: A domain interaction map based on phylogenetic profiling. J Mol Biol. 2004, 344: 1331-1346. 10.1016/j.jmb.2004.10.019.PubMedView ArticleGoogle Scholar
- Riley R, Lee C, Sabatti C, Eisenberg D: Inferring protein domain interactions from databases of interacting proteins. Genome Biol. 2005, 6: R89-10.1186/gb-2005-6-10-r89.PubMedPubMed CentralView ArticleGoogle Scholar
- Nye TM, Berzuini C, Gilks WR, Babu MM, Teichmann SA: Statistical analysis of domains in interacting protein pairs. Bioinformatics. 2005, 21: 993-1001. 10.1093/bioinformatics/bti086.PubMedView ArticleGoogle Scholar
- Guimarães KS, Jothi R, Zotenko E, Przytycka TM: Predicting domain-domain interactions using a parsimony approach. Genome Biol. 2006, 7: R104-10.1186/gb-2006-7-11-r104.PubMedPubMed CentralView ArticleGoogle Scholar
- Itzhaki Z, Akiva E, Altuvia Y, Margalit H: Evolutionary conservation of domain-domain interactions. Genome Biol. 2006, 7: R125-10.1186/gb-2006-7-12-r125.PubMedPubMed CentralView ArticleGoogle Scholar
- Pinter RY, Rokhlenko O, Yeger-Lotem E, Ziv-Ukelson M: Alignment of metabolic pathways. Bioinformatics. 2005, 21: 3401-3408. 10.1093/bioinformatics/bti554.PubMedView ArticleGoogle Scholar
- Lacroix V, Fernandes CG, Sagot MF: Motif search in graphs: Application to metabolic networks. IEEE/ACM Trans Comput Biol Bioinform. 2006, 3: 360-368. 10.1109/TCBB.2006.55.PubMedView ArticleGoogle Scholar
- Ferro A, Giugno R, Pigola G, Pulvirenti A, Skripin D, Bader GD, Sasha D: NetMatch: a Cytoscape plugin for searching biological networks. Bioinformatics. 2007, 23: 910-912. 10.1093/bioinformatics/btm032.PubMedView ArticleGoogle Scholar
- Tian Y, McEachin RC, Santos C, States DJ, Patel JM: SAGA: a subgraph matching tool for biological graphs. Bioinformatics. 2007, 23: 232-239. 10.1093/bioinformatics/btl571.PubMedView ArticleGoogle Scholar
- Dost B, Shlomi T, Gupta N, Ruppin E, Bafna V, Sharan R: QNet: a tool for querying protein interaction networks. Proceedings of the 11th International Conference on Research in Computational Molecular Biology (RECOMB): Oakland, CA, USA; 21-25 April 2007. Edited by: Speed TP, Huang H. 2007, New York: Springer, 4453: 1-15. [Lecture Notes in Computer Science]View ArticleGoogle Scholar
- Cheng Q, Kaur D, Harrison R, Zelikovsky A: Filling metabolic pathways. Proceedings of the RECOMB Satellite Conference on Systems Biology: University of California, San Diego, CA, USA; 30 November-1. 2007, DecemberGoogle Scholar
- Hulo N, Sigrist CJ, Le Saux V, Langendijk-Genevaux PS, Bordoli L, Gattiker A, De Castro E, Bucher P, Bairoch A: Recent improvements to the PROSITE database. Nucleic Acids Res. 2004, 32 (Database issue): D134-D137. 10.1093/nar/gkh044.PubMedPubMed CentralView ArticleGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein familes database. Nucleic Acids Res. 2004, 32 (Database issue): D138-D141. 10.1093/nar/gkh121.PubMedPubMed CentralView ArticleGoogle Scholar
- Schultz J, Milpetz F, Bork P, Ponting CP: SMART, a simple modular architecture research tool: Identification of signaling domains. Proc Natl Acad Sci USA. 1998, 95: 5857-5864. 10.1073/pnas.95.11.5857.PubMedPubMed CentralView ArticleGoogle Scholar
- Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P: SMART 4.0: towards genomic data integration. Nucleic Acids Res. 2004, 32 (Database issue): D142-D144. 10.1093/nar/gkh088.PubMedPubMed CentralView ArticleGoogle Scholar
- Gough J, Karplus K, Hughey R, Chothia C: Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol. 2001, 313: 903-919. 10.1006/jmbi.2001.5080.PubMedView ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- Steffen M, Petti A, Aach J, D'haeseleer P, Church G: Automated modeling of signal transduction networks. BMC Bioinformatics. 2002, 3: 34-10.1186/1471-2105-3-34.PubMedPubMed CentralView ArticleGoogle Scholar
- Kelley BP, Yuan B, Lewitter F, Sharan R, Stockwell BR, Ideker T: PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res. 2004, 32 (Web Server issue): W83-W88. 10.1093/nar/gkh411.PubMedPubMed CentralView ArticleGoogle Scholar
- Pawson T, Nash P: Assembly of cell regulatory systems through protein interaction domains. Science. 2003, 300: 445-452. 10.1126/science.1083653.PubMedView ArticleGoogle Scholar
- Banks E, Nabieva E, Chazelle B, Singh M: Organization of physical interactomes as uncovered by network schemas. PLoS Comput Biol.
- Kalaev M, Smoot M, Ideker T, Sharan R: NetworkBLAST: comparative analysis of protein networks. Bioinformatics. 2008, 24: 594-596. 10.1093/bioinformatics/btm630.PubMedView ArticleGoogle Scholar
- Wernicke S, Rasche F: Fanmod: a tool for fast network motif detection. Bioinformatics. 2006, 22: 1152-1153. 10.1093/bioinformatics/btl038.PubMedView ArticleGoogle Scholar
- Schreiber F, Schwöbbermeyer H: MAVisto: a tool for the exploration of network motifs. Bioinformatics. 2005, 21: 3572-3574. 10.1093/bioinformatics/bti556.PubMedView ArticleGoogle Scholar
- Grochow J, Kellis M: Network motif discovery using subgraph enumeration and symmetry breaking. Proceedings of the 11th International Conference on Research in Computational Molecular Biology (RECOMB): Oakland, CA, USA; 21-25 April 2007. Edited by: Speed TP, Huang H. 2007, New York: Springer, 4453: 92-106. [Lecture Notes in Computer Science]View ArticleGoogle Scholar
- Alon N, Dao P, Hajirasouliha I, Hormozdiari F, Sahinalp SC: Biomolecular network motif counting and discovery by color coding. Bioinformatics. 2008, 24: i241-i249. 10.1093/bioinformatics/btn163.PubMedPubMed CentralView ArticleGoogle Scholar
- Pandey J, Koyutürk M, Kim Y, Szpankowski W, Subramanian S, Grama A: Functional annotation of regulatory pathways. Bioinformatics. 2007, 23: i377-i386. 10.1093/bioinformatics/btm203.PubMedView ArticleGoogle Scholar
- Giugno R, Shasha D: GraphGrep: a fast and universal method for querying graphs. Proceedings of the International Conference on Pattern Recognition (ICPR): 11-15 August 2002; Quebec, Canada. 2002, IEEE Computer Society, 2: 112-115.Google Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.PubMedPubMed CentralView ArticleGoogle Scholar
- NetGrep. [http://genomics.princeton.edu/singhlab/netgrep/]
- NetGrep User's Guide. [http://genomics.princeton.edu/singhlab/netgrep/guide.html]
- Breitkreutz BJ, Stark C, Tyers M: Osprey: a network visualization system. Genome Biol. 2003, 4: R22-10.1186/gb-2003-4-3-r22.PubMedPubMed CentralView ArticleGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431: 99-104. 10.1038/nature02800.PubMedPubMed CentralView ArticleGoogle Scholar
- Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003, 302: 249-255. 10.1126/science.1087447.PubMedView ArticleGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, et al: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32 (Database issue): D258-D261. 10.1093/nar/gkh066.PubMedGoogle Scholar
- Tong A, Lesage G, Bader G, Ding H, Xu H, Xin X, Young J, Berriz G, Brost R, Chang M, Chen Y, Cheng X, Chua G, Friesen H, Goldberg D, Haynes J, Humphries C, He G, Hussein S, Ke L, Krogan N, Li Z, Levinson J, Lu H, Minard P, Munyana C, Parsons A, Ryan O, Tonikian R, Roberts T, et al: Global mapping of the yeast genetic interaction network. Science. 2004, 303: 808-813. 10.1126/science.1091317.PubMedView ArticleGoogle Scholar
- Wong SL, Zhang LV, Tong AH, Li Z, Goldberg DS, King OD, Lesage G, Vidal M, Andrews B, Bussey H, Boone C, Roth FP: Combining biological networks to predict genetic interactions. Proc Natl Acad Sci USA. 2004, 101: 15682-15687. 10.1073/pnas.0406614101.PubMedPubMed CentralView ArticleGoogle Scholar
- von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B: STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 2003, 31: 258-261. 10.1093/nar/gkg034.PubMedPubMed CentralView ArticleGoogle Scholar
- Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics. 2005, 21 (Suppl 1): i302-i310. 10.1093/bioinformatics/bti1054.PubMedView ArticleGoogle Scholar
- Biomart. [http://www.ebi.ac.uk/biomart/martview/]
- Clusters of Orthologous Groups. [http://www.ncbi.nlm.nih.gov/COG/new/]
- Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya A, et al: InterPro, progress and status in 2005. Nucleic Acids Res. 2005, 33 (Database issue): D201-D205. 10.1093/nar/gki106.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.