Authors: Olof Görnerup Daniel Gillblad Theodore Vasiloudis
Publish Date: 2016/08/30
Volume: 51, Issue: 2, Pages: 531-560
Abstract
Appropriately defining and efficiently calculating similarities from large data sets are often essential in data mining both for gaining understanding of data and generating processes and for building tractable representations Given a set of objects and their correlations we here rely on the premise that each object is characterized by its context ie its correlations to the other objects The similarity between two objects can then be expressed in terms of the similarity between their contexts In this way similarity pertains to the general notion that objects are similar if they are exchangeable in the data We propose a scalable approach for calculating all relevant similarities among objects by relating them in a correlation graph that is transformed to a similarity graph These graphs can express rich structural properties among objects Specifically we show that concepts—abstractions of objects—are constituted by groups of similar objects that can be discovered by clustering the objects in the similarity graph These principles and methods are applicable in a wide range of fields and will be demonstrated here in three domains computational linguistics music and molecular biology where the numbers of objects and correlations range from small to very largeThis work was funded by the Swedish Foundation for Strategic Research Stiftelsen för strategisk forskning and the Knowledge Foundation Stiftelsen för kunskaps och kompetensutveckling The authors would like to thank the anonymous reviewers for their valuable comments
Keywords: