Authors: Dino Kečo Abdulhamit Subasi Jasmin Kevric
Publish Date: 2016/12/19
Volume: 30, Issue: 5, Pages: 1601-1610
Abstract
Cancer classification is one of the main steps during patient healing process This fact enforces modern clinical researchers to use advanced bioinformatics methods for cancer classification Cancer classification is usually performed using gene expression data gained in microarray experiment and advanced machine learning methods Microarray experiment generates huge amount of data and its processing via machine learning methods represents a big challenge In this study twostep classification paradigm which merges genetic algorithm feature selection and machine learning classifiers is utilized Genetic algorithm is built in MapReduce programming spirit which makes this algorithm highly scalable for Hadoop cluster In order to improve the performance of the proposed algorithm it is extended into a parallel algorithm which process on microarray data in distributed manner using the Hadoop MapReduce framework In this paper the algorithm was tested on eleven GEMS data sets 9 tumors 11 tumors 14 tumors brain tumor 1 lung cancer brain tumor 2 leukemia 1 DLBCL leukemia 2 SRBCT and prostate tumor and its accuracy reached 100 for less than 25 selected features The proposed cloud computingbased MapReduce parallel genetic algorithm performed well on gene expression data In addition the scalability of the suggested algorithm is unlimited because of underlying Hadoop MapReduce platform The presented results indicate that the proposed method can be effectively implemented for realworld microarray data in the cloud environment In addition the Hadoop MapReduce framework demonstrates substantial decrease in the computation time
Keywords: