|
Abstract
|
Protein-protein interaction (PPI) networks are essential for understanding almost all biological processes: cell signaling, metabolism, immune response, disease development, and drug action. Accurate prediction of missing or new PPIs is very important because experimental methods (yeast two-hybrid, mass spectrometry, etc.) are expensive, time-consuming, and often noisy or incomplete. Computational link prediction helps fill gaps in PPI maps, discover new protein functions, identify disease modules, and suggest drug targets. In recent years, graph embedding methods like GraphSAGE and Node2Vec have shown good results for PPI tasks by learning low-dimensional representations from network structure. However, PPI networks are very noisy (false positives/negatives), sparse, and imbalanced. Purely structural embeddings often miss important biological context: proteins interact based not only on topology but also on sequence similarity, shared domains, Gene Ontology functions, and physicochemical properties. Integrating these biological features improves prediction, but simple concatenation or early fusion often fails because views are not aligned—they live in different spaces and capture different information.
Self-supervised contrastive learning has become powerful for representation learning in biology (e.g., protein language models, graph contrastive methods). Aligning multiple views contrastively can force the model to learn consistent and complementary knowledge across structural and biological perspectives. This is especially needed in PPI because data from high-throughput experiments is unreliable, and models must generalize across species or conditions. Our proposed method addresses these challenges by introducing contrastive multi-view alignment with biological augmentation. It creates robust embeddings that combine local/global topology with deep biological priors. This can lead to better performance on imbalanced datasets, higher generalization to unseen protein
|