At present, the most successful machine learning technique is deep learning, that uses rectified linear activation function (ReLU) s(x) = max(x,0) as a non-linear data processing unit. While this selection was guided by general ideas (which were often imprecise), the selection itself was still largely empirical. This leads to a natural question: are these selections indeed the best or are there even better selections? A possible way to answer this question would be to provide a theoretical explanation of why these selections are -- in some reasonable sense -- the best. This paper provides a possible theoretical explanation for this empirical fact.