Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding

Fig. 1

Average number of proteins (ANP) in the GO families of nine different levels (LEVEL 2 to LEVEL 10 as shown in Additional file 1: Fig. S3). There was a clear descending trend of ANPs from the top level (LEVEL 2) to the bottom one (LEVEL 10). Since the ANP of one family indicated its representativeness among all families, this figure denoted a gradual decrease of the representativeness of a family with the penetration into deeper level. Therefore, the nine levels could be classified into two groups based on their ANPs: the “Head Label Levels” (ANP of their GO families ≥ 2,000) and the “Tail Label Levels” (ANP of their GO families < 2,000). As shown, the total number (5,323) of GO families in the “Tail Label Levels” was > 10 times larger than that (459) of the “Head Label Levels”, and such kind of data distribution induced a serious ‘long-tail problem’ as described in the previous pioneering publication [18]

Back to article page