Authors: Peng Li TinYu Wu XinMing Li Hong Luo Mohammad S Obaidat
Publish Date: 2016/09/12
Volume: 73, Issue: 4, Pages: 1509-1531
Abstract
The inability to effectively construct data supply chain in distributed environments is becoming one of the top concerns in big data area Aiming at this problem a novel method of constructing data supply chain based on layered PROV is proposed First to abstractly describe the data transfer processes from creation to distribution a data provenance specification presented by W3C is used to standardize the information records of data activities within and across data platforms Then a distributed PROV data generation algorithm for multiplatform is designed Further we propose a tiered storage management of provenance based on summarization technology which reduces the provenance records by compressing mid versions so as to realize multilevel management of PROV In specific we propose a hierarchical visual technique based on a layered query mechanism which allows users to visualize data supply chain from general to detail The experimental results show that the proposed approach can effectively improve the construction performance for data supply chain
Keywords: