مشخصات پژوهش

صفحه نخست /TripletProt: Deep ...
عنوان
TripletProt: Deep Representation Learning of Proteins based on Siamese Networks
نوع پژوهش مقاله چاپ شده
کلیدواژه‌ها
Proteins , Task analysis , Computational modeling , Training , Protein engineering , Feature extraction , Computational efficiency
چکیده
Recently, pretrained representations have gained attention in various machine learning applications. These methods involve considerable computational costs for training the model, hence motivating alternative approaches for representation learning. We introduce TripletProt, a new approach for protein representation learning based on the Siamese neural networks. Representation learning of biological entities which capture essential features can alleviate most of the challenges associated with supervised learning in bioinformatics. The most important distinction of our proposed method is relying on the protein-protein interaction (PPI) network. The computational cost of the generated representations for any potential application is significantly lower than comparable methods since the length of the representations is significantly smaller than that in other approaches. TripletProt and in general Siamese Network offer great potentials for the protein informatics tasks and can be widely applied to similar tasks. We evaluate TripletProt comprehensively in protein functional annotation tasks including sub-cellular localization (14 categories) and gene ontology prediction (more than 2000 classes), which are both challenging multi-class multi-label classification machine learning problems. We compare the performance of TripletProt with the state-of-the-art approaches including recurrent language model-based approach (i.e., UniRep), as well as protein-protein interaction (PPI) network and sequence-based method (i.e., DeepGO).
پژوهشگران اسماعیل نورانی (نفر اول)، احسان الدین عسگری (نفر دوم)، آلیس مک هاردی (نفر سوم)، محمد مفرد (نفر چهارم)