Journal Title
Title of Journal: The VLDB Journal
|
Abbravation: The VLDB Journal
|
Publisher
Springer Berlin Heidelberg
|
|
|
|
Authors: Rainer Gemulla Peter J Haas Wolfgang Lehner
Publish Date: 2013/02/14
Volume: 22, Issue: 6, Pages: 753-772
Abstract
A variety of schemes have been proposed in the literature to speed up query processing and analytics by incrementally maintaining a boundedsize uniform sample from a dataset in the presence of a sequence of insertion deletion and update transactions These algorithms vary according to whether the dataset is an ordinary set or a multiset and whether the transaction sequence consists only of insertions or can include deletions and updates We report on subtle nonuniformity issues that we found in a number of these prior boundedsize sampling schemes including some of our own We provide workarounds that can avoid the nonuniformity problem these workarounds are easy to implement and incur negligible additional cost We also consider the impact of nonuniformity in practice and describe simple statistical tests that can help detect nonuniformity in new algorithmsFirst note that in the first two assertions of the theorem Asubseteq R k implies that Ale k so that expressions such as kA are always nonnegative Next observe that T=l if and only if exactly M items are accepted into the sample during the first l1 steps of Bernoulli sampling and then an item is accepted at the lth step The probability of the first event is binomial genfrac00ptl1MqM1ql1M The probability of the second event is q Because successive steps of Bernoulli sampling are independent the joint probability of the two events is simply the product of the individual probabilities which yields the definition of pi k given in the theorem
Keywords:
.
|
Other Papers In This Journal:
|