In the traditional neural networks, the outputs of each layer serve as inputs to the next layer. It is known that in many cases, it is beneficial to also allow outputs from pre-previous etc. layers as inputs. Such networks are known as residual. In this paper, we provide a possible theoretical explanation for the empirical success of residual neural networks.