n this paper, we consider the problem of responding to a count query (or any other integer-valued queries) evaluated on a dataset containing sensitive attributes. To protect the privacy of individuals in the dataset, a standard practice is to add continuous noise to the true count. We design a differentially-private mechanism which adds integer-valued noise allowing the released output to remain integer. As a trade-off between utility and privacy, we derive privacy parameters $\epsilon$ and $\delta$ in terms of the the probability of releasing an erroneous count under the assumption that the true count is no smaller than half the support size of the noise. We then numerically demonstrate that our mechanism provides higher privacy guarantee compared to the discrete Gaussian mechanism that is recently proposed in the literature.
We propose an information-theoretic technique for analyzing privacy guarantees of general online algorithms. Specifically, we demonstrate that differential privacy guarantees of iterative algorithms can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for f-divergences. Our technique relies on generalizing the Dobrushin's contraction coefficient for total variation distance to an f-divergence known as E_\gamma-divergence that underlies the differential privacy. As an example, we apply our technique to derive the differential privacy parameters of online gradient descent algorithm. Moreover, we also show that this framework can be tailored to batch learning algorithms that can be implemented with one pass over the training dataset.
Graph-based semi-supervised learning (SSL) assigns labels to initially unlabelled vertices in a graph. Graph neural networks (GNNs), esp. graph convolutional networks (GCNs), inspired the current-state-of-the art models for graph-based SSL problems. GCNs inherently assume that the labels of interest are numerical or categorical variables. However, in many real-world applications such as co-authorship networks, recommendation networks, etc., vertex labels can be naturally represented by probability distributions or histograms. Moreover, real-world network datasets have complex relationships going beyond pairwise associations. These relationships can be modelled naturally and flexibly by hypergraphs. In this paper, we explore GNNs for graph-based SSL of histograms. Motivated by complex relationships (those going beyond pairwise) in real-world networks, we propose a novel method for directed hypergraphs. Our work builds upon existing works on graph-based SSL of histograms derived from the theory of optimal transportation. A key contribution of this paper is to establish generalisation error bounds for a one-layer GNN within the framework of algorithmic stability. We also demonstrate our proposed methods' effectiveness through detailed experimentation on real-world data.
We consider three different variants of differential privacy (DP), namely approximate DP, R\'enyi DP (RDP), and hypothesis test differential privacy. In the first part, we develop a machinery for optimally relating approximate DP to RDP based on the joint range of two $f$-divergences that underlie the approximate DP and RDP. In particular, this enables us to derive the optimal DP parameters of a mechanism that satisfies a given level of RDP. As an application, we apply our result to the moments accountant framework for characterizing privacy guarantees of noisy stochastic gradient descent (SGD). When compared to the state-of-the-art, our bounds may lead to about 100 more stochastic gradient descent iterations for training deep learning models for the same privacy budget. In the second part, we establish a relationship between RDP and hypothesis test differential privacy which allows us to translate the RDP constraint into a tradeoff between type I and type II error probabilities of a certain hypothesis testing. We then demonstrate that for noisy SGD our result leads to tighter privacy guarantees compared to the recently proposed $f$-DP framework for some range of parameters.
Identifying features that leak information about sensitive attributes is a key challenge in the design of information obfuscation mechanisms. In this paper, we propose a framework to identify information-leaking features via information density estimation. Here, features whose information densities exceed a pre-defined threshold are deemed information-leaking features. Once these features are identified, we sequentially pass them through a targeted obfuscation mechanism with a provable leakage guarantee in terms of E_\gamma-divergence. The core of this mechanism relies on a data-driven estimate of the trimmed information density for which we propose a novel estimator, named the trimmed information density estimator (TIDE). We then use TIDE to implement our mechanism on three real-world datasets. Our approach can be used as a data-driven pipeline for designing obfuscation mechanisms targeting specific features.
We derive the optimal differential privacy (DP) parameters of a mechanism that satisfies a given level of Renyi´ differential privacy (RDP). Our result is based on the joint range of two f-divergences that underlie the approximate and the Renyi variations of differential privacy. We apply our result to´ the moments accountant framework for characterizing privacy guarantees of stochastic gradient descent. When compared to the state-of-the-art, our bounds may lead to about 100 more stochastic gradient descent iterations for training deep learning models for the same privacy budget.
In this work, we propose a new technique for deriving the differential privacy parameters in the context of federated learning when only the last update is publicly released. In this approach, we interpret each iteration as a Markov kernel and quantify its impact on privacy parameters via the contraction coefficient of a certain f-divergence that underlies differential privacy. To do so, we generalize the well-known Dobrushin's ergodicity coefficient, originally defined in terms of total variation distance, to a family of f-divergences.
We study the problem of finding the element within a convex set of conditional distributions with the smallest f-divergence to a reference distribution. Motivated by applications in machine learning, we refer to this problem as model projection since any probabilistic classification model can be viewed as a conditional distribution. We provide conditions under which the existence and uniqueness of the optimal model can be guaranteed and establish strong duality results. Strong duality, in turn, allows the model projection problem to be reduced to a tractable finite-dimensional optimization. Our application of interest is fair machine learning: the model projection formulation can be directly used to design fair models according to different group fairness metrics. Moreover, this information-theoretic formulation generalizes existing approaches within the fair machine learning literature. We give explicit formulas for the optimal fair model and a systematic procedure for computing it.
We investigate the framework of privacy amplification by iteration, recently proposed by Feldman et al., from an information-theoretic lens. We demonstrate that differential privacy guarantees of iterative mappings can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for f-divergences. In particular, by generalizing the Dobrushin's contraction coefficient for total variation distance to an f-divergence known as E_\gamma-divergence, we derive tighter bounds on the differential privacy parameters of the projected noisy stochastic gradient descent algorithm with hidden intermediate updates.
We investigate the problem of estimating a random variable Y under a privacy constraint dictated by another correlated random variable X. When X and Y are discrete, we express the underlying privacy-utility tradeoff in terms of the privacy-constrained guessing probability h(PXY, epsilon), and the maximum probability Pc(Y|Z) of correctly guessing Y given an auxiliary random variable Z, where the maximization is taken over all P Z|Y ensuring that Pc(X|Z) ≤ epsilon for a given privacy threshold epsilon ≥ 0. We prove that h(PXY, epsilon) is concave and piecewise linear in epsilon, which allows us to derive its expression in closed form for any epsilon when X and Y are binary. In the non-binary case, we derive h(PXY, epsilon) in the high-utility regime (i.e., for sufficiently large, but nontrivial, values of epsilon) under the assumption that Y and Z have the same alphabets. We also analyze the privacy-constrained guessing probability for two scenarios in which X, Y, and Z are binary vectors. When X and Y are continuous random variables, we formulate the corresponding privacy-utility tradeoff in terms of sENSR(PXY, epsilon), the smallest normalized minimum mean squared-error (MMSE) incurred in estimating Y from a Gaussian perturbation Z. Here, the minimization is taken over a family of Gaussian perturbations Z for which the mmse of f(X) given Z is within a factor (1-epsilon) from the variance of f(X) for any non-constant real-valued function f. We derive tight upper and lower bounds for sENSR when Y is Gaussian. For general absolutely continuous random variables, we obtain a tight lower bound for sENSR(PXY , epsilon) in the high privacy regime, i.e., for small epsilon.
Given a dataset comprised of individual-level data, we consider the problem of identifying samples that may be disclosed without incurring a privacy risk. We address this challenge by designing a mapping that assigns a "privacy-risk score" to each sample. This mapping, called the privacy watchdog, is based on a sample-wise information leakage measure called the information density, deemed here lift privacy. We show that lift privacy is closely related to well-known information-theoretic privacy metrics. Moreover, we demonstrate how the privacy watchdog can be implemented using the Donsker-Varadhan representation of KL-divergence. Finally, we illustrate this approach on a real-world dataset.
Inspired by recent interests of developing machine learning and data mining algorithms on hypergraphs, we investigate in this paper the semi-supervised learning algorithm of propagating ”soft labels” (e.g. probability distributions, class membership scores) over hypergraphs, by means of optimal transportation. Borrowing insights from Wasserstein propagation on graphs [Solomon et al. 2014], we re-formulate the label propagation procedure as a message-passing algorithm, which renders itself naturally to a generalization applicable to hypergraphs through Wasserstein barycenters. Furthermore, in a PAC learning framework, we provide generalization error bounds for propagating one-dimensional distributions on graphs and hypergraphs using 2-Wasserstein distance, by establishing the algorithmic stability of the proposed semisupervised learning algorithm. These theoretical results also shed new lights upon deeper understandings of the Wasserstein propagation on graphs.