An Information-Theoretic View of Generalization via Wasserstein Distance

Citation:

Hao Wang, Mario Diaz, José Cândido S. Santos Filho, and Flavio P. Calmon. 2019. “An Information-Theoretic View of Generalization via Wasserstein Distance.” In Proc. IEEE Int. Symp. on Inf. Theory (ISIT).
isit19generalization.pdf246 KB

Abstract:

We capitalize on the Wasserstein distance to obtain two information-theoretic bounds on the generalization error of learning algorithms. First, we specialize the Wasserstein distance into total variation, by using the discrete metric. In this case we derive a generalization bound and, from a strong data-processing inequality, show how to narrow the bound by adding Gaussian noise to the output hypothesis. Second, we consider the Wasserstein distance under a generic metric. In this case we derive a generalization bound by exploiting the geometric nature of the Kantorovich-Rubinstein duality theorem. We illustrate the use of these bounds with examples. Our bounds can handle certain cases in which existing bounds via mutual information fail.
Last updated on 07/16/2019