Journal Title
Title of Journal: Data Min Knowl Disc
|
Abbravation: Data Mining and Knowledge Discovery
|
|
|
|
|
Authors: Jefrey Lijffijt Panagiotis Papapetrou Kai Puolamäki
Publish Date: 2014/12/09
Volume: 29, Issue: 6, Pages: 1838-1864
Abstract
In order to find patterns in data it is often necessary to aggregate or summarise data at a higher level of granularity Selecting the appropriate granularity is a challenging task and often no principled solutions exist This problem is particularly relevant in analysis of data with sequential structure We consider this problem for a specific type of data namely event sequences We introduce the problem of finding the best set of window lengths for analysis of event sequences for algorithms with realvalued output We present suitable criteria for choosing one or multiple window lengths and show that these naturally translate into a computational optimisation problem We show that the problem is NPhard in general but that it can be approximated efficiently and even analytically in certain cases We give examples of tasks that demonstrate the applicability of the problem and present extensive experiments on both synthetic data and real data from several domains We find that the method works well in practice and that the optimal sets of window lengths themselves can provide new insight into the dataWe thank Heikki Mannila for useful discussions and feedback This work was supported by the the Finnish Doctoral Programme in Computational Sciences FICS the Finnish Centre of Excellence for Algorithmic Data Analysis Research ALGODAN and the Finnish Centre of Excellence in Computational Inference Research COIN We acknowledge the computational resources provided by Aalto ScienceIT project
Keywords:
.
|
Other Papers In This Journal:
|