Authors: Saharon Rosset Claudia Perlich Grzergorz Ćwirszcz Prem Melville Yan Liu
Publish Date: 2009/12/23
Volume: 20, Issue: 3, Pages: 439-468
Abstract
Two major data mining competitions in 2008 presented challenges in medical domains KDD Cup 2008 which concerned cancer detection from mammography data and Informs Data Mining Challenge 2008 dealing with diagnosis of pneumonia based on patient information from hospital files Our team won both of these competitions and in this paper we share our lessons learned and insights We emphasize the aspects that pertain to the general practice and methodology of medical data mining rather than to the specifics of each modeling competition We concentrate on three topics information leakage its effect on competitions and proofofconcept projects consideration of reallife model performance measures in model construction and evaluation and relational learning approaches to medical data mining tasks
Keywords: