Lineage Processing on Correlated Probabilistic Databases

Bhargav Kanagal

Amol Deshpande

SIGMOD Conference, ACM SIGMOD(2010), pp. 675-686

Download Google Scholar

Abstract

In this paper, we address the problem of scalably evaluating conjunctive queries over correlated probabilistic databases containing tuple or attribute uncertainties. Like previous work, we adopt a two-phase approach where we first compute "lineages" of the output tuples, and then compute the probabilities of the lineage formulas. However unlike previous work, we allow for arbitrary and complex correlations to be present in the data, captured via a forest of "junction trees". We observe that evaluating even read-once (tree structured) lineages (e.g., those generated by "hierarchical" conjunctive queries), polynomially computable over tuple independent probabilistic databases, is \#P-complete for lightly correlated probabilistic databases like "Markov sequences". We characterize the complexity of exact computation of the probability of the lineage formula on a correlated database using a parameter called "lwidth" (analogous to the notion of "treewidth"). For lineages that result in low lwidth, we compute exact probabilities using a novel message passing algorithm, and for lineages that induce large lwidths, we develop approximate Monte Carlo algorithms to estimate the result probabilities. We scale our algorithms to very large correlated probabilistic databases using the previously proposed INDSEP data structure. To mitigate the complexity of lineage evaluation, we develop optimization techniques to process a batch of lineages by sharing computation across formulas, and to exploit any independence relationships that may exist in the data. Our experimental study illustrates the benefits of using our algorithms for processing lineage formulas over correlated probabilistic databases.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Lineage Processing on Correlated Probabilistic Databases

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Lineage Processing on Correlated Probabilistic Databases

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities