1970-01-01

(2026-02) Scaling Sparse Matrix Computation for Secure Outsourced Computing

[[Wujie Xiong]]
[[Hao Zhou]]
[[Yutong Ye]]
[[Ruoming Jin]]
[[Lei Xu]]

2026-02-18

Abstract

Sparse General Matrix-Matrix Multiplication (SpGEMM) is a fundamental but computationally intensive operation that underpins many scientific workloads, including numerous AI applications. With the increasing demands for data security, privacy-preserving computation techniques such as Fully Homomorphic Encryption (FHE) have gained significant attention for their ability to process sensitive data without decryption. Nonetheless, executing SpGEMM within the framework of FHE presents significant challenges. The most effective SpGEMM algorithms exploit matrix sparsity to minimize computational costs; however, FHE obscures both the data values and the sparsity structures. Prior FHE-based privacy-preserving computation frameworks either ignore the inherent sparsity of matrices and rely on dense General Matrix-Matrix Multiplication (GEMM), incurring substantial overhead from redundant homomorphic multiplications, or they attempt to exploit sparsity by encrypting only the non-zero values, which inadvertently exposes sensitive positional information. To address this gap and achieve a better balance between efficiency and privacy, we propose an efficient FHE-compatible SpGEMM framework that enables oblivious data and position processing under a Single Instruction Multiple Data (SIMD) scheme. Moreover, we extend our method to support an arbitrary number of sparse matrices (FHE-SpGEMCM). The efficiency analysis shows that our method achieves an average homomorphic computation cost of $(n_A n_B)^2 / (n^2 N)$ , where $n_A$ and $n_B$ represent the number of nonzero elements in $A$ and $B$ , respectively, $n$ is the shared inner dimension of the multiplication, and $N$ denotes the batch size used in FHE. Experimental results demonstrate that for square matrices of scale $2^9$ , our scheme achieves an average speedup of $439.25\times$ and a $10.68\times$ reduction in memory consumption compared to state-of-the-art baselines that ignore sparsity. Furthermore, when the scale increases to $2^{13}$ , our method yields up to a $1201.77\times$ speedup over baselines that only exploit the sparsity of a single matrix.