Abstract
|
Biomedical knowledge graphs are crucial to support data-intensive applications in the life
sciences and health care. These graphs can be extended by generating a heterogeneous graph
that contains both ontology terms and biomedical entities. However, state-of-the-art approaches for Gene Ontology representation learnings are constrained to homogeneous graphs
that cannot represent different node types and relations. To address this limitation, we present GoVec to produce representations seamlessly for both ontologies and biological entities
by utilizing meta-path-based representation learning in the heterogeneous graph. The resulting vectors can be used in many bioinformatics applications, particularly for calculating
semantic similarity and extracting relations among biological entities. We verify the approach’s usefulness by comparing the resulting semantic similarities with the manually produced
similarities by the experts. Furthermore, the superiority of the GoVec is shown by an extensive set of quantitative and qualitative evaluations. Two downstream tasks, including
protein–protein interaction and protein family similarity, are evaluated in comparison with
many state-of-the-art approaches. Finally, as a qualitative visual representation, the separability of various protein families is examined and visually separable groups of proteins are
generated, which shows the capability of GoVec representations to embed functional semantics into the vectors.
|