Ranking episodes using a partition model

Authors: Nikolaj Tatti

Publish Date: 2015/05/15

Volume: 29, Issue: 5, Pages: 1312-1342

PDF Link

Abstract

One of the biggest setbacks in traditional frequent pattern mining is that overwhelmingly many of the discovered patterns are redundant A prototypical example of such redundancy is a freerider pattern where the pattern contains a true pattern and some additional noise events A technique for filtering freerider patterns that has proved to be efficient in ranking itemsets is to use a partition model where a pattern is divided into two subpatterns and the observed support is compared to the expected support under the assumption that these two subpatterns occur independently In this paper we develop a partition model for episodes patterns discovered from sequential data An episode is essentially a set of events with possible restrictions on the order of events Unlike with itemset mining computing the expected support of an episode requires surprisingly sophisticated methods In order to construct the model we partition the episode into two subepisodes We then model how likely the events in each subepisode occur close to each other If this probability is high—which is often the case if the subepisode has a high support—then we can expect that when one event from a subepisode occurs then the remaining events occur also close by This approach increases the expected support of the episode and if this increase explains the observed support then we can deem the episode uninteresting We demonstrate in our experiments that using the partition model can effectively and efficiently reduce the redundancy in episodesAssume that a sequence S = s 1 ldots s n covers an episode G If there is a source vertex v such that s 1 = lab mathopen left vright then s 2 ldots s n covers G setminus v Otherwise s 2 ldots s n covers GIf there is no source vertex in G with a label s 1 then gr mathopen left M Sright = gr mathopen left M Sright Now the lemma implies that S covers G and the induction assumption implies that gr mathopen left M Sright = GIf there is a source vertex v in G such that lab mathopen left vright = s 1 then gr mathopen left M Sright = gr mathopen left M S Gvright Note that the Gv and its descendants form exactly M mathopen left Hright where H = G setminus v That is gr mathopen left M Sright = G if and only if gr mathopen left MH Sright = H The lemma implies that S covers H and the induction assumption implies that gr mathopen left MH Sright = H which proves the proposition square

Ranking episodes using a partition model

Authors: Nikolaj Tatti

Publish Date: 2015/05/15

Volume: 29, Issue: 5, Pages: 1312-1342

PDF Link

Abstract

Keywords:

References

Other Papers In This Journal:

Search Result: