Shahab Asoodeh, Mario Diaz, Fady Alajaji, and Tamas Linder. 2019. “
Estimation Efficiency Under Privacy Constraints.” IEEE Transaction on Information Theory, 65, 3, Pp. 1512 - 1534.
Publisher's VersionAbstractWe investigate the problem of estimating a random variable Y under a privacy constraint dictated by another correlated random variable X. When X and Y are discrete, we express the underlying privacy-utility tradeoff in terms of the privacy-constrained guessing probability h(PXY, epsilon), and the maximum probability Pc(Y|Z) of correctly guessing Y given an auxiliary random variable Z, where the maximization is taken over all P Z|Y ensuring that Pc(X|Z) ≤ epsilon for a given privacy threshold epsilon ≥ 0. We prove that h(PXY, epsilon) is concave and piecewise linear in epsilon, which allows us to derive its expression in closed form for any epsilon when X and Y are binary. In the non-binary case, we derive h(PXY, epsilon) in the high-utility regime (i.e., for sufficiently large, but nontrivial, values of epsilon) under the assumption that Y and Z have the same alphabets. We also analyze the privacy-constrained guessing probability for two scenarios in which X, Y, and Z are binary vectors. When X and Y are continuous random variables, we formulate the corresponding privacy-utility tradeoff in terms of sENSR(PXY, epsilon), the smallest normalized minimum mean squared-error (MMSE) incurred in estimating Y from a Gaussian perturbation Z. Here, the minimization is taken over a family of Gaussian perturbations Z for which the mmse of f(X) given Z is within a factor (1-epsilon) from the variance of f(X) for any non-constant real-valued function f. We derive tight upper and lower bounds for sENSR when Y is Gaussian. For general absolutely continuous random variables, we obtain a tight lower bound for sENSR(PXY , epsilon) in the high privacy regime, i.e., for small epsilon.
t-it19.pdf Hsiang Hsu, Shahab Asoodeh, and Flavio Calmon. 2019. “
Information-Theoretic Privacy Watchdogs.” ISIT 2019. Paris, France.
Publisher's VersionAbstractGiven a dataset comprised of individual-level data, we consider the problem of identifying samples that may be disclosed without incurring a privacy risk. We address this challenge by designing a mapping that assigns a "privacy-risk score" to each sample. This mapping, called the privacy watchdog, is based on a sample-wise information leakage measure called the information density, deemed here lift privacy. We show that lift privacy is closely related to well-known information-theoretic privacy metrics. Moreover, we demonstrate how the privacy watchdog can be implemented using the Donsker-Varadhan representation of KL-divergence. Finally, we illustrate this approach on a real-world dataset.
isit19.pdf Tingran Gao, Shahab Asoodeh, Yi Huang, and James Evans. 2019. “
Wasserstein Soft Label Propagation on Hypergraphs: Algorithm and Generalization Error Bounds.” AAAI 2019 33 (1). Honolulu, Hawaii.
Publisher's VersionAbstractInspired by recent interests of developing machine learning and data mining algorithms on hypergraphs, we investigate in this paper the semi-supervised learning algorithm of propagating ”soft labels” (e.g. probability distributions, class membership scores) over hypergraphs, by means of optimal transportation. Borrowing insights from Wasserstein propagation on graphs [Solomon et al. 2014], we re-formulate the label propagation procedure as a message-passing algorithm, which renders itself naturally to a generalization applicable to hypergraphs through Wasserstein barycenters. Furthermore, in a PAC learning framework, we provide generalization error bounds for propagating one-dimensional distributions on graphs and hypergraphs using 2-Wasserstein distance, by establishing the algorithmic stability of the proposed semisupervised learning algorithm. These theoretical results also shed new lights upon deeper understandings of the Wasserstein propagation on graphs.
aaai19.pdf