Authors: Jiajia Miao Quanyuan Wu Yan Jia
Publish Date: 2009/12/12
Volume: 16, Issue: 6, Pages: 976-
Abstract
Supposing that the overall situation is dug out from the distributed monitoring nodes there should be two critical obstacles heterogenous schema and instance to integrating heterogeneous data from different monitoring sensors To tackle the challenge of heterogenous schema an instancebased approach for schema mapping named instancebased machinelearning IML approach was described And to solve the problem of heterogenous instance a novel approach called statisticbased clustering SBC approach which utilized clustering and statistics technologies to match large scale sources holistically was also proposed These two algorithms utilized the machineleaning and clustering technology to improve the accuracy Experimental analysis shows that the IML approach is more precise than SBC approach reaching at least precision of 81 and recall rate of 82 Simulation studies further show that SBC can tackle large scale sources holistically with 85 recall rate when there are 38 data sources
Keywords: