Construction of network hierarchy
A hierarchical network is a directed network for which all nodes are assigned to a unique hierarchical level from 1 to L, where L is the total number of levels (L ≥2). Generally, a hierarchical network contains three types of edges according to their directionality: a downward edge (pointing from a higher level node to a lower level node), an upward edge (pointing from a higher level node to a lower level node), and a horizontal edge (pointing from a node to another node in the same level). To infer the hierarchy of a directed network, we developed a hierarchical score maximization algorithm described as follows.
First, given a directed network with assigned hierarchical structure, we define a metric called hierarchy score as:
\( HS=\frac{N_d+{N}_h}{N_u+{N}_h} \), where N_{d}, N_{u,} and N_{h} are the number of downward edges, upward edges, and horizontal edges, respectively. The metric essentially measures the ratio of N_{d} to N_{u} balanced by N_{h}. It takes a value from 0 to + ∞, with a higher HS indicating more downward edges relative to upward edges in a network. Specifically, when N_{u} = N_{h} = 0, the network will have a HS of + ∞.
Second, for a directed network we employ a simulated annealing procedure [37] to infer its hierarchical structure by arranging nodes into L levels (L is a predefined parameter). This procedure is as follows:

(1)
We initiate from randomly assigning each node to a level, calculate the corresponding HS score hs_{0}, and setting the initial energy as E_{0} = −hs_{0}.

(2)
We adjust the hierarchy iteratively to optimize the hierarchical structure. Specifically, at iteration i, we randomly select a node, adjust the hierarchy by randomly placing it into another level and recalculate the hierarchical score and energy (hs_{i} = −E_{i}) of the resulting new hierarchy. We compute the energy change ∆E = E_{i}E_{i1}; if ∆E < 0, we accept the hierarchy adjustment; otherwise we accept the adjustment with a probability P = exp(−∆E/CT), where C is a constant and T is temperature that are used to tune the probability P.

(3)
We repeat this procedure p times until E is minimized (that is, HS is maximized). In practice, we gradually lower the temperature T at each step to adjust the sensitivity of annealing. This procedure results in an optimized hierarchical network with maximized HS score.
Third, we perform the abovedescribed simulated annealing algorithm k = 1,000 times to obtain 1,000 inferred hierarchical networks. We do this because in many cases the optimum hierarchy is not unique. For example, some nodes are topologically identical in a directed network, and changing their level assignment coordinately will not change the overall hierarchical score. Based on these 1,000 inferred hierarchical networks, we calculate the probability that each node is assigned to each level, which results in a probability matrix for each node as seen in Figure 1C. This matrix can be regarded as a probabilistic hierarchical network, which is more informative and more precisely describes the hierarchical structure of a directed network than methods that omit this procedure.
Fourth, we provide a most likely hierarchical network based on the probabilistic hierarchical matrix. Specifically, we assign each node to the level for which the prior step assigns it the highest probability. It should be noted that the confidence of the assignment might vary from node to node, depending on the value of the maximum probability. Typically, however, most of the nodes have high certainty in terms of the level assignment (for example, the probability in the assigned level is >60%).
To determine an appropriate p (the number of steps in each simulated annealing procedure) and k (the number of each simulated annealing runs), we plot the hierarchy score against p and k, respectively. For a network with more nodes and edges, a larger p should be used as can be determined based on the HS vs. p plot. When a suitable p is used, the resulting HS should be stable against k when k is >100.
In practice, the HSM method can be used conjunction with other hierarchy inference methods. For example, one may start from the hierarchical structure inferred by the VS algorithm, and use the simulated annealing procedure method to further optimize the hierarchy score. Namely, instead of randomly selecting nodes during the simulated annealing optimization, we can focus on adjusting the levels of ambiguous nodes from VS output to improve the efficiency. Such a strategy will combine the advantages of the two hierarchy inference approaches.
Determination of the number of hierarchical levels
The HSM algorithm requires a predefined L, the number of hierarchical levels. L can be determined based on the prior knowledge about the directed network of interest. If no prior knowledge is available, we can specify different L values (e.g. L = 2, 3, …, 8) and choose a proper L by comparing the resulting hierarchical networks. However, the HS score is not directly comparable for hierarchical networks with different number of levels, because networks with larger L tend to have higher HSs. We thereby define a corrected hierarchical score (CHS) as the following:
$$ CHS=\frac{O\left({N}_d\right)/E\left({N}_d\right)+O\left({N}_h\right)/E\left({N}_h\right)}{O\left({N}_u\right)/E\left({N}_u\right)+O\left({N}_h\right)/E\left({N}_h\right)}, $$
where O(N_{d}), O(N_{u}), and O(N_{h}) are the observed number of downward, upward, and horizontal edges, respectively; E(N_{d}), E(N_{u}), and E(N_{h}) are the expected number of downward, upward, and horizontal edges, respectively. E(N_{d}), E(N_{u}), and E(N_{h}) are calculated as:
$$ E\left({N}_d\right)={\displaystyle \sum_{i>j}{S}_i{S}_j}; $$
$$ E\left({N}_u\right)={\displaystyle \sum_{i<j}{S}_i{S}_j}; $$
$$ E\left({N}_h\right)={\displaystyle \sum_{i=j}{S}_i{S}_j}, $$
where S_{i} and S_{j} are the number of nodes in level i and level j, respectively. The CHS is directly comparable between hierarchical networks with different L values, and can also be used to compare the degree of hierarchy between different directed networks. The CHS takes a value from 1 for random network without a hierarchical structure to ∞ for a network with a perfect hierarchy (for example, a tree as in Figure 2).
To determine the number of hierarchical level L for a network, one can employ the HSM algorithm across a range of L values and choose the L for which the HSM algorithm yields the highest CHS. In some cases, the CHS will keep increasing with the increase of L, because there is more freedom to optimize the hierarchy with larger L values. In this situation, one can plot the CHS against L values, and choose the L at which no significant CHS improvement is achieved. In addition, other information is also important to determine the L for a directed network. For example, it is reasonable to require L to be no larger than the diameter of the network, namely, the greatest distance between any pair of connected nodes.
Calculation of probabilistic hierarchical score
To more accurately measure the hierarchical structure of a probabilistic hierarchical network, we define a new metric called the probabilistic hierarchical score (PHS). For an edge i → j in a network with L levels, the probability of this edge being downward is \( {\displaystyle {\sum}_{L_i>{L}_j}P\left({L}_i,i\right)P\left({L}_j,j\right)} \), where P(L
_{
i
}, i) and P(L
_{
j
}, j) are the probability of the node i and j in level L_{i} and L_{j}, respectively. Similarly, the probability of i → j to be upward is \( {\displaystyle {\sum}_{L_i<{L}_j}P\left({L}_i,i\right)P\left({L}_j,j\right)} \); and the probability of i → j to be horizontal is \( {\displaystyle {\sum}_{L_i<{L}_j}P\left({L}_i,i\right)P\left({L}_j,j\right)} \). Thus after taking into account all edges in the network, we define PHS as the following:
$$ PHS=\frac{{\displaystyle {\sum}_{\left(i\to j\right)\in \left\{e\right\}}{\displaystyle {\sum}_{L_i>{L}_j}P\left({L}_i,i\right)P\left({L}_j,j\right)}}+{\displaystyle {\sum}_{\left(i\to j\right)\in \left\{e\right\}}{\displaystyle {\sum}_{L_i={L}_j}P\left({L}_i,i\right)P\left({L}_j,j\right)}}}{{\displaystyle {\sum}_{\left(i\to j\right)\in \left\{e\right\}}{\displaystyle {\sum}_{L_i<{L}_j}P\left({L}_i,i\right)P\left({L}_j,j\right)+{\displaystyle {\sum}_{\left(i\to j\right)\in \left\{e\right\}}{\displaystyle {\sum}_{L_i={L}_j}P\left({L}_i,i\right)P\left({L}_j,j\right)}}}}}. $$
The level is indexed in an increasing order from bottom to top. Namely, level i is higher than level j in the hierarchy, if i > j.
Estimation of the hierarchy significance for a directed network
Given a directed network, the HSM algorithm infers its optimum hierarchical structure by maximizing the HS score. Although the resulting HS score can measure the degree of hierarchy of a network, it does not tell us whether a directed network has a significantly hierarchical structure. To address this issue, we compare a directed network with random networks to evaluate its hierarchical significance. Here we use the ErdosRenyi random graph model as the null model. In a network, each pair of nodes has an equal chance to be connected by an edge [38]. We generate 1,000 ErdosRenyi random networks with the same number of nodes and edges, and calculate their HSs using the HSM algorithm. Given a specified number of layers (L), the P value of hierarchy for the network of interest is then computed as the fraction of random networks with a HS (the same L is specified) equal to or greater than the interested network. Alternatively, assuming a Gaussian distribution of the HSs of the random networks, we calculate the Zscore for the interested network: Z = (HSμ)/σ, where μ and σ are the mean and standard deviation of the HS scores of those random networks with L levels; the P value is calculated by referring to a standard normal distribution, and for each L a corresponding P value is calculated (L = 2, 3, …, 8). The hierarchy for the whole network is calculated as the minimum Pvalue adjusted by the Bonferroni multipletesting correction method. We note that the significance estimation depends on the selection of the null model. To generate the random networks, other null models can be used and certain constraints can be applied as required.
Calculation of dyadic reciprocity and Krackhardt hierarchy score
Traditionally, 1dyadic reciprocity and Krackhardt hierarchy score are often used to quantify the extent of asymmetry in directed network [32]. The dyadic reciprocity is defined as the proportion of node pairs in a directed network that are symmetric (that is, reachable from either direction). Krackhardt hierarchy score is the fraction of node pairs in the directed network that are reachable from one direction. These two metrics measure the degree of asymmetry of a directed network, which is different from the hierarchy we introduce in this study. Our hierarchy by nature implies a toptobottom orientation, whereas the ‘asymmetry’ is nondirectional. We use the R package ‘sna’ to calculate dyadic reciprocity and Krackhardt hierarchy score. The global reaching centrality of networks is calculated using the method introduced by Mones et al. [33].
Directed networks used in this study
In this study, we examine eight directed networks, including five biological networks, one ecological network (food web network), one social network (political blogs network), and one computer network (P2P file sharing network). The five biological networks are the yeast regulation network, the human regulation network, the yeast phosphorylation network, the human phosphorylation network, and the worm neural network.
The yeast regulome was downloaded from Jothi et al. [21], in which most of the TFgene interactions were identified by ChIPchip experiments [9,10], and the remaining were collected based on other biochemical studies [3942]. The human regulome is constructed based on ChIPseq data from the ENCODE project [43], based on which the target genes of more than 120 TFs are determined by a probabilistic model [44]. For TFs with multiple ChIPseq datasets, the target genes represent a union of targets in all of the available datasets. The kinasesubstrate interactions in the yeast phosphorylome are collected from protein chip experiment by Ptacek et al. [11] and the phosphorylation site data collected by Freschi et al. from several largescale studies [36]. The human phosphorylome is available from the PhosphoNetworks database, which is based on experimental determined kinasesubstrate relationships [38]. In our hierarchical study, we include only TFTF interactions in the two regulomes and kinasekinase interactions in the two phosphorylomes.
The worm neural network contains the interaction of one neuron to another via synaptic or gap junctions in worm [45]. The food web network is from Ulanowicz et al., which contains the carbon exchange from one species to another occurring during the wet season in the cypress wetlands of south Florida [46]. The Political blogs network contains hyperlinks between weblogs on US politics being recorded in 2005 [3]. The P2P filesharing network is one of a series of Gnutella network created in 2002, in which nodes represent host computers in the Gnutella computer network and edges represent connections between the hosts [47].
Properties of yeast genes and proteins
The list of yeast essential genes was determined by a yeast gene deletion project and was downloaded from the Saccharomyces genome database (SGD) [48]. The Ka/Ks ratios of ortholog genes between S. cerevisiae and S. pombe orthologs were from Wall et al. [49]. The physical and genetic interactions of yeast genes were also downloaded from the SGD database [50,51]. Specifically, to calculate the number of physical interacting partners of yeast kinases the proteinprotein interactions between yeast kinases were obtained from Breitkreutz et al. [12]. The mRNA abundance and mRNA halflife data were obtained from previous studies [52,53]. The protein halflife data came from Belle et al. [54]. The protein abundance and protein noise data were available from Newman et al. [55]. To determine the protein noise, the single cell expression level of a protein was measured in a population of yeast cells and then the ratio of the standard deviation to its mean abundance was calculated. For a protein, the noise is represented as the difference between its noise value and the median over all proteins, named as deviation from median (DM). Budding or budding neck localization of yeast kinases was obtained from Huh et al. [56]. The cellular component associated with yeast kinases was annotated by SGD, which are manually curated based on previous publications.
Enrichment of interactions between different levels
To examine whether TFs/kinases are more likely to physically/genetically interact within the same level or between two levels (Figure 6A), we calculated the enrichment of interactions in all pairs of levels: TT, TM, TB, MM, MB, and BB. Using physical interactions between TFs as the example, the significance of enrichment or depletion is calculated as follows. First, given a physical interaction network with n nodes and e edges, we compute the probability for a pair of randomly selected genes to interact: p = e/[n(n1)/2]. Second, we assume that the number of TFTF interactions (denoted as i) within a level or between two different levels follows a binomial distribution: \( \Pr \left(x=i\right)=f\left(i;b,p\right)={C}_b^i{p}^i{\left(1p\right)}^{bi} \), where b is the number of all possible TFTF pairs. Considering selfinteractions, b = m(m + 1)/2 for intralevel interactions with m TFs (that is, TT, MM, or BB), and b = m_{1}m_{2} for interactions between two levels with m_{1} and m_{2} TFs, respectively (that is, TM, TB, and MB). Finally, the P values are calculated as P(x ≥ i) for enrichment (that is, the probability of observing an equal or greater number of interactions) and P(x ≤ i) for depletion (that is, the probability of observing an equal or smaller number of interactions) of physical interactions between these TFs.
To estimate whether two kinases share a significantly large number of physical partners, genetic partners or substrates (Figure 6B), we examine their degree of overlap and calculate its significance using Fisher’s exact test (that is, hypergeometric test).
Gene ontology analysis
We used the DAVID Gene Ontology (GO) annotation tool [57] to investigate the functional enrichment of kinases in the three levels of our hierarchical network for phosphorylome (Figure 3B). The whole list of the 94 kinases in the network is used as the background for enrichment analysis. A similar analysis is also used to study the functional enrichment of substrates specific to kinases from each of the three levels. In this case, we use the whole yeast gene list as the background. R code for this analysis is available from http://gersteinlab.org/proj/hinet.