Publication Date



Technical Report: UTEP-CS-21-02

To appear in Journal of Combinatorics, Information, and System Sciences JCISS, 2021, Vol. 45.


In a usual Numerical Methods class, students learn that gradient descent is not an efficient optimization algorithm, and that more efficient algorithms exist, algorithms which are actually used in state-of-the-art numerical optimization packages. On the other hand, in solving optimization problems related to machine learning -- and, in particular, in currently most efficient deep learning -- gradient descent (in the form of backpropagation) is much more efficient than any of the alternatives that have been tried. How can we reconcile these two statements? In this paper, we explain that, in reality, there is no contradiction here. Namely, in usual applications of numerical optimization, we want to attain the smallest possible value of the objective function. Thus, after a few iterations, it is necessary to switch from gradient descent -- which only works efficiently when we are sufficiently far away from the actual minimum -- to more sophisticated techniques. On the other hand, in machine learning, as we show, attaining the actual minimum is not what we want -- this would be over-fitting. We actually need to stop way before we reach the actual minimum. Thus, we do not need to get too close to the actual minimum -- and so, there is no need to switch from gradient descent to any more sophisticated (and more time-consuming) optimization technique. This explains why -- contrary to what students learn in Numerical Methods -- gradient descent is the most efficient optimization technique in machine learning applications.