cronokirby

(2026-04) Which Privacy Blanket is Optimal in the Shuffle Model

2026-04-06

Abstract

In recent years, the shuffle model has emerged as a prevalent paradigm in privacy-preserving data analysis, centered on the principle of \textit{privacy amplification via shuffling}: an individual user's report is obscured by the ``background noise'' of other users' messages, a phenomenon intuitively known as the \textit{privacy blanket}. This paper initiates a foundational and systematic study of this mechanism from an information-theoretic perspective. We investigate the following optimal noise-design problem: given that a target user's message Y1Y_1 follows a distribution PP, what is the optimal blanket noise distribution QQ that maximizes privacy? Specifically, when Y1PY_1 \sim P and the remaining messages Y2,,Yni.i.d.QY_2, \dots, Y_n \stackrel{\text{i.i.d.}}{\sim} Q are shuffled to produce an output Z=(Yσ(i))i=1n\boldsymbol{Z} = (Y_{\sigma(i)})_{i=1}^n, we seek the QQ that affords the strongest protection for Y1Y_1 under various metrics, including mutual information I(Y1;Z)I(Y_1; \mathbf{Z}), total-variation-information ITV(Y1;Z)I_{\mathrm{TV}}(Y_1; \mathbf{Z}), message recovery advantage, and expected posterior variance.

Our analysis reveals a series of non-intuitive results that challenge the conventional heuristic of setting Q=PQ=P. First, we prove that the optimal noise distribution QQ generally \textit{deviates} from the target distribution PP. For binary alphabets, we show that the (near-)uniform distribution is optimal in a strong sense. For general finite alphabets, we derive an explicit analytical form QP(1P)Q \propto \sqrt{P(1-P)} that achieves asymptotic optimality for mutual information. Furthermore, we demonstrate that our analytical framework transcends the shuffle model, yielding new security insights into broader cryptographic primitives such as the \textit{ideal cipher model} and \textit{honey encryption}.

Finally, we extend our results to the shuffle-DP paradigm, where messages Yi=R(Xi)Y_i = \mathcal{R}(X_i) are outputs of ε0\varepsilon_0-locally differentially private mechanisms. We establish a new, tight information-theoretic upper bound I(X1;Z(Xi)i=2n)(eε0/21)22n+O(n3/2)I(X_1; \mathbf{Z} \mid (X_i)_{i=2}^n) \le \frac{(e^{\varepsilon_0/2}-1)^2}{2n} + \mathcal{O}(n^{-3/2}). This result provides a sharp characterization that matches the optimal privacy-amplification parameters known in the literature, while offering a novel interpretation of the shuffling gain.