Publication Date



Technical Report: UTEP-CS-03-17

Published in: Proceedings of the Annual Conference of the North American Fuzzy Information Processing Society NAFIPS'03, Chicago, Illinois, July 24-26, 2003, pp. 365-370.


Geospatial databases generally consist of measurements related to points (or pixels in the case of raster data), lines, and polygons. In recent years, the size and complexity of these databases have increased significantly and they often contain duplicate records, i.e., two or more close records representing the same measurement result. In this paper, we use fuzzy measures to address the problem of detecting duplicates in a database consisting of point measurements. As a test case, we use a database of measurements of anomalies in the Earth's gravity field that we have compiled. We show that a natural duplicate deletion algorithm requires (in the worst case) quadratic time, and we propose a new asymptotically optimal O(n*log(n)) algorithm. These algorithms have been successfully applied to gravity databases. We believe that they will prove to be useful when dealing with many other types of point data.