Publication Date
11-2007
Abstract
In the past, communications were much slower than computations. As a result, researchers and practitioners collected different data into huge databases located at a single location such as NASA and US Geological Survey. At present, communications are so much faster that it is possible to keep different databases at different locations, and automatically select, transform, and collect relevant data when necessary. The corresponding cyberinfrastructure is actively used in many applications. It drastically enhances scientists' ability to discover, reuse and combine a large number of resources, e.g., data and services.
Because of this importance, it is desirable to be able to gauge the the uncertainty of the results obtained by using cyberinfrastructure. This problem is made more urgent by the fact that the level of uncertainty associated with cyberinfrastructure resources can vary greatly -- and that scientists have much less control over the quality of different resources than in the centralized database. Thus, with the cyberinfrastructure promise comes the need to analyze how data uncertainty propagates via this cyberinfrastructure.
When the resulting accuracy is too low, it is desirable to produce the provenance of this inaccuracy: to find out which data points contributed most to it, and how an improved accuracy of these data points will improve the accuracy of the result. In this paper, we describe algorithms for propagating uncertainty and for finding the provenance for this uncertainty.
Comments
Technical Report: UTEP-CS-07-56
Published in Rafi L. Muhanna and Robert L. Mullen (eds.), Proceedings of the International Workshop on Reliable Engineering Computing REC'08, Savannah, Georgia, February 20-22, 2008, pp. 199-234.