cronokirby

(2026-02) MPSpeed; Implementing and Optimizing MPC-in-the-Head Digital Signatures in Hardware

2026-02-09

Abstract

The Multi-Party Computation (MPC)-in-the-Head (MPCitH) framework enables the construction of post-quantum Digital Signature Algorithms (DSAs), offering competitive public key sizes. However, this comes at a cost of high computational complexity, resulting in high signature generation and verification times.

In this work, we propose a compact and efficient hardware accelerator for Mirath, an MPCitH-based DSA and candidate in the ongoing NIST PQC standardization effort. We propose a series of algorithmic and hardware-level optimizations, focusing on Mirath's most critical operations: GGM tree-based polynomial commitments and MPC arithmetic. Firstly, we observe Mirath greatly relies on symmetric primitives (SHA3 & AES) during the GGM tree expansion and typically requires a large amount of memory to store the derived tree nodes. We propose an on-the-fly scheduling for generating and computing the GGM tree, such that a minimal amount of GGM tree nodes are stored in memory and their computations can be performed in parallel. Our methodology enables temporarily storing a minimal (and configurable) set of parent nodes in local buffers, from which the low-level tree nodes can be efficiently derived instead of repeatedly doing so from the root seed. This is achieved through a novel, hardware-friendly tree node indexing scheme, which enables efficient traversal through GGM tree nodes using only left and right shifts to find their closest previously computed ancestor. Secondly, we analyze the MPC arithmetic in Mirath and propose massively parallel and yet area-efficient arithmetic units, capable of exploiting algorithm-level parallelism in the MPCith operations. This is achieved by analyzing Mirath's proposed parameter sets and identifying the most hardware-friendly parameters, for which we design highly fine-tuned modules. Finally, we implement our unified design, which supports all Mirath operations, on FPGA and compare its performance against state-of-the-art PQC DSA hardware implementations. Compared to an implementation of the MPCitH-based SDitH scheme (TCHES 2024), we reduce on-chip BRAM by up to 81.6%81.6\% and improve the area-time-product by a factor 52.7×52.7\times up to 64.8×64.8\times. Overall, we demonstrate that modern MPCitH constructions can be significantly accelerated in hardware through a combination of algorithmic, architectural and low-level hardware optimizations, in line with real-world performance requirements.