Authors: Oleg Seleznjev Bernhard Thalheim
Publish Date: 2008/07/31
Volume: 12, Issue: 1, Pages: 63-89
Abstract
In many database applications in telecommunication environmental and health sciences bioinformatics physics and econometrics realworld data are uncertain and subjected to errors These data are processed transmitted and stored in large databases We consider stochastic modelling for databases with uncertain data and for some basic database operations for example join selection with exact and approximate matching Approximate join is used for merging or data deduplication in large databases Distribution and mean of the join sizes are studied for random databases A random database is treated as a table with independent random records with a common distribution or a set of random tables These results can be used for integration of information from different databases multiple join optimization and various probabilistic algorithms for structured random data
Keywords: