Authors: GuangHui Liu HongBin Shen DongJun Yu
Publish Date: 2015/11/12
Volume: 249, Issue: 1-2, Pages: 141-153
Abstract
Accurately predicting protein–protein interaction sites PPIs is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs Machinelearningbased computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction However directly applying traditional machine learning algorithms which often assume that samples in different classes are balanced often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem In this study we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a datacleaning procedure and reducing predicted false positives with a postfiltering procedure First a machinelearningbased datacleaning procedure is applied to remove those marginal targets which may potentially have a negative effect on training a model with a clear classification boundary from the majority samples to relieve the severity of class imbalance in the original training dataset then a prediction model is trained on the cleaned dataset finally an effective postfiltering procedure is further used to reduce potential false positive predictions Stringent crossvalidation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method which exhibits highly competitive performance compared with existing stateoftheart sequencebased PPIs predictors and should supplement existing PPI prediction methodsThis work was supported by the National Natural Science Foundation of China No 61373062 61233011 and 61222306 the Jiangsu Postdoctoral Science Foundation No 1201027C the Natural Science Foundation of Jiangsu No BK20141403 the China Postdoctoral Science Foundation No 2014T70526 and 2013 M530260 the Fundamental Research Funds for the Central Universities No 30920130111010 and “The Six Top Talents” of Jiangsu Province No 2013XXRJ022
Keywords: