Journal Title
Title of Journal: Data Min Knowl Disc
|
Abbravation: Data Mining and Knowledge Discovery
|
|
|
|
|
Authors: Dino Ienco Céline Robardet Ruggero G Pensa Rosa Meo
Publish Date: 2012/01/15
Volume: 26, Issue: 2, Pages: 217-254
Abstract
The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications Such data represent objects of a certain type connected to other types of data the features so that the overall data schema forms a star structure of interrelationships Coclustering these data involves the specification of many parameters such as the number of clusters for the object dimension and for all the features domains In this paper we present a novel coclustering algorithm for heterogeneous starstructured data that is parameterless This means that it does not require either the number of row clusters or the number of column clusters for the given feature spaces Our approach optimizes the Goodman–Kruskal’s τ a measure for crossassociation in contingency tables that evaluates the strength of the relationship between two categorical variables We extend τ to evaluate coclustering solutions and in particular we apply it in a higher dimensional setting We propose the algorithm CoStar which optimizes τ by a local search approach We assess the performance of CoStar on publicly available datasets from the textual and image domains using objective external criteria The results show that our approach outperforms stateoftheart methods for the coclustering of heterogeneous data while it remains computationally efficient
Keywords:
.
|
Other Papers In This Journal:
|