Abstract

We develop a theory that allows to characterize the fundamental limits of learning in deep neural networks. Concretely, we consider Kolmogorov-optimal approximation through deep neural networks with the guiding theme being a relation between the epsilon-entropy of the hypothesis class to be learned and the complexity of the approximating network in terms of connectivity and memory requirements for storing the network topology and the quantized weights and biases. The theory we develop educes remarkable universality properties of deep networks. Specifically, deep networks can Kolmogorov-optimally learn essentially any hypothesis class. In addition, we find that deep networks provide exponential approximation accuracy—i.e., the approximation error decays exponentially in the number of non-zero weights in the network—of widely different functions including the multiplication operation, polynomials, sinusoidal functions, general smooth functions, and even one-dimensional oscillatory textures and fractal functions such as the Weierstrass function, both of which do not have any known methods achieving exponential approximation accuracy. We also show that in the approximation of sufficiently smooth functions finite-width deep networks require strictly smaller connectivity than finite-depth wide networks. We conclude with an outlook on the further role our theory could play.