One of the main motivations for using artificial neural networks was to speed up computations. From this viewpoint, the ideal configuration is when we have a single nonlinear layer: this configuration is computationally the fastest, and it already has the desired universal approximation property. However, the last decades have shown that for many problems, deep neural networks, with several nonlinear layers, are much more effective. How can we explain this puzzling fact? In this paper, we provide a possible explanation for this phenomena: that the universal approximation property is only true in the idealized setting, when we assume that all computations are exact. In reality, computations are never absolutely exact. It turns out that if take this non-exactness into account, then one-nonlinear-layer networks no longer have the universal approximation property, several nonlinear layers are needed -- and several layers is exactly what deep networks are about.
Technical Report: UTEP-CS-22-43