Publication Date
11-1-2022
Abstract
In many practical situations -- e.g., when preparing examples for a machine learning algorithm -- we need to label a large number of images or speech recordings. One way to do it is to pay people around the world to perform this labeling; this is known as crowdsourcing. In many cases, crowd-workers generate not only answers, but also their degrees of confidence that the answer is correct. Some crowd-workers cheat: they produce almost random answers without bothering to spend time analyzing the corresponding image. Algorithms have been developed to detect such cheaters. The problem is that many crowd-workers cannot describe their degree of confidence by a single number, they are more comfortable providing an interval of possible degrees. To apply anomaly-detecting algorithms to such interval data, we need to select a single number from each such interval. Empirical studies have shown that the most efficient selection is when we select the arithmetic average. In this paper, we explain this empirical result by showing that arithmetic average is the only selection that satisfies natural invariance requirements.
Comments
Technical Report: UTEP-CS-22-112