Publication Date




Short version published in Proceedings of the Fifth International Conference on Intelligent Technologies InTech'04, Houston, Texas, December 2-4, 2004; full paper published in Journal of Advanced Computational Intelligence and Intelligent Informatics, 2006, Vol. 10, No. 3, pp. 260-264.


To check whether a new algorithm is better, researchers use traditional statistical techniques for hypotheses testing. In particular, when the results are inconclusive, they run more and more simulations (n2>n1, n3>n2, ..., nm) until the results become conclusive. In this paper, we point out that these results may be misleading. Indeed, in the traditional approach, we select a statistic and then choose a threshold for which the probability of this statistic "accidentally" exceeding this threshold is smaller than, say, 1%. It is very easy to run additional simulations with ever-larger n. The probability of error is still 1% for each ni, but the probability that we reach an erroneous conclusion for at least one of the values ni increases as m increases. In this paper, we design new statistical techniques oriented towards experiments on simulated data, techniques that would guarantee that the error stays under, say, 1% no matter how many experiments we run.

tr04-29a.pdf (67 kB)
Updated version: UTEP-CS-04-29a

tr04-29.pdf (67 kB)
Original file: UTEP-CS-04-29