One of the main objectives of science and engineering is to predict the future state of the world -- and to come up with devices and strategies that would make this future state better. In some practical situations, we know how the state changes with time -- e.g., in meteorology, we know the partial differential equations that describes the atmospheric processes. In such situations, prediction becomes a purely computational problem. In many other situations, however, we do not know the equation describing the system's dynamics. In such situations, we need to learn this dynamics from data. At present, the most efficient way of such learning is to use deep learning -- training a neural network with a large number of layers. To make this idea truly efficient, several trial-and-error-based heuristics were discovered, such as the use of rectified linear neurons, softmax, etc. In this chapter, we show that the empirical success of many of these heuristics can be explained by optimization-under-uncertainty techniques.