At present, the most efficient machine learning technique is deep learning, in which non-linearity is attained by using rectified linear functions s(x)=max(0,x). Empirically, these functions work better than any other nonlinear functions that have been tried. In this paper, we provide a possible theoretical explanation for this empirical fact. This explanation is based on the fact that one of the main applications of neural networks is decision making, when we want to find an optimal solution. We show that the need to adequately deal with situations when the corresponding optimization problem is feasible -- i.e., for which the objective function is convex -- uniquely selects rectified linear activation functions.