Publication Date
3-2011
Abstract
One of the main objectives of collecting data in statistical databases (medical databases, census databases) is to find important correlations between different quantities. To enable researchers to looks for such correlations, we should allow them them to ask queries testing different combinations of such quantities. However, when we receive answers to many such questions, we may inadvertently disclose information about individual patients, information that should be private.
One way to preserve privacy in statistical databases is to store {\it ranges} instead of the original values. For example, instead of an exact age of a patient in a medical database, we only store the information that this age is, e.g., between 60 and 70. This idea solves the privacy problem, but it make statistical analysis more complex. Different possible values from the corresponding ranges lead, in general, to different values of the corresponding statistical characteristic; it is therefore desirable to find the range of all such values.<\p>
It is known that for mean and variance, there exist feasible algorithms for computing such ranges. In this paper, we show that similar algorithms are possible for another important statistical characteristic -- covariance, whose value is important in computing correlations.<\p>
Comments
Technical Report: UTEP-CS-11-11
Published in: Ronald R. Yager, Marek Z. Reformat, Shahnaz N. Shahbazova, and Sergei Ovchinnikov (eds.), Proceedings of the World Conference on Soft Computing, San Francisco, CA, May 23-26, 2011.