Publication Date

2-2018

Comments

Technical Report: UTEP-CS-18-08

Abstract

Traditionally, in machine learning, the quality of the result improves steadily with time (usually slowly but still steadily). However, as we start applying reinforcement learning techniques to solve complex tasks -- such as teaching a computer to play a complex game like Go -- we often encounter a situation in which for a long time, then is no improvement, and then suddenly, the system's efficiency jumps almost to its maximum. A similar phenomenon occurs in human learning, where it is known as the aha-moment. In this paper, we provide a possible explanation for this phenomenon, and show that this explanation leads to the need to reward students for effort as well, not only for their results.

Share

COinS