چکیده
|
Pathogenic microorganisms exploit host cellular mechanisms and evade host defense mechanisms through molecular
pathogen‐host interactions (PHIs). Therefore, comprehensive analysis of these PHI networks should be an initial step for
developing effective therapeutics against infectious diseases. Computational prediction of PHI data is gaining increasing
demand because of scarcity of experimentally‐found data. Prediction of protein‐protein interactions (PPIs) within PHI
systems can be formulated as a classification problem, which requires the knowledge of non‐interacting protein pairs. This
is a restricting requirement, since we lack datasets which report non‐interacting protein pairs. In this study, we formulated
the “computational prediction of PHI data” problem using embedding kernelized heterogeneous data. This eliminates the
above‐mentioned requirement and enables us to predict new interactions without randomly labeling protein pairs as noninteracting.
Domain‐domain associations are used to filter the predicted results leading to 175 novel PHIs between 170
human proteins and 105 viral proteins. To compare our results with the state of the art studies whose approach is using a
binary classification formulation, we modified our settings to consider the same formulation. Detailed evaluations are
conducted and our results provide more than 10 percent improvements for accuracy and AUC (area under the receiving
operating curve) results in comparison with the state of the art methods.
|