Journal Title
Title of Journal: Data Min Knowl Disc
|
Abbravation: Data Mining and Knowledge Discovery
|
|
|
|
|
Authors: Xiaoxin Yin Jiawei Han Philip S Yu
Publish Date: 2007/07/06
Volume: 15, Issue: 3, Pages: 321-348
Abstract
Most structured data in reallife applications are stored in relational databases containing multiple semantically linked relations Unlike clustering in a single table when clustering objects in relational databases there are usually a large number of features conveying very different semantic information and using all features indiscriminately is unlikely to generate meaningful results Because the user knows her goal of clustering we propose a new approach called CrossClus which performs multirelational clustering under user’s guidance Unlike semisupervised clustering which requires the user to provide a training set we minimize the user’s effort by using a very simple form of user guidance The user is only required to select one or a small set of features that are pertinent to the clustering goal and CrossClus searches for other pertinent features in multiple relations Each feature is evaluated by whether it clusters objects in a similar way with the user specified features We design efficient and accurate approaches for both feature selection and object clustering Our comprehensive experiments demonstrate the effectiveness and scalability of CrossClusThe work was supported in part by the US National Science Foundation NSF IIS0313678 and NSF BDI0515813 and an IBM Faculty Award Any opinions findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect views of the funding agencies
Keywords:
.
|
Other Papers In This Journal:
|