Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding

Fig. 3

A schematic illustration of the procedure used in this study facilitating sequence-based multi-scale protein representation. The way how sequences were converted to feature similarity-based image (ProMAP) and protein similarity-based vector (ProSIM) was shown. (a) generation of feature/protein distance matrix and ‘template map’; (b) production of ProSIM (based on PDM) and ProMAP (based on template map) for each protein. On the one hand, a method realizing the image-like protein representation was constructed (ProMAP) to capture the intrinsic correlations among protein features. As illustrated, a template map for each protein was first constructed by a consecutive process of ‘protein representation’ using PROFEAT, ‘similarity calculation’ using cosine similarity, ‘dimensionality reduction’ using UMAP or PCA, ‘coordinate allocation’ using Jonker-Volgenant algorithm, etc. Then, ProMAP was produced for each protein by mapping the intensities of all protein features to their corresponding locations in the constructed template map (illustrated on the right side of Fig. 3b). On the other hand, an approach considering the global relevance among proteins was proposed (ProSIM) to convert ‘independent’ vector to a ‘globally-relevant’ protein representation. As shown, a protein distance matrix (PDM) was first generated by following the consecutive process of ‘protein representation’ using PROFEAT and ‘similarity calculation’ using cosine similarity. Finally, ProSIM was generated for each protein by retrieving directly from each row of the newly generated PDM (shown in the left side of Fig. 3b)

Back to article page