cronokirby

(2026-03) Accelerating FAEST Signatures on ARM; NEON SIMD AES and Parallel VOLE Optimization

2026-03-11

Abstract

FAEST is a post-quantum digital signature candidate whose performance is dominated by repeated AES-CTR-based PRG calls in the VOLE-in-the-Head phase, yet its reference implementation provides no FAEST-specialized ARM NEON acceleration path. We present an ARM-oriented optimization that accelerates this bottleneck using general-purpose NEON SIMD instructions without relying on ARMv8 Crypto Extensions. The proposed implementation combines a register-resident 256-byte S-box with TBL/TBX-based four-stage SubBytes, 4-way and 8-way parallel AES block processing, a fixed-size PRG path specialized for the FAEST tree structure, and pthread-based batch-level parallelization of independent VOLE tasks. Evaluated on all 12 parameter sets of FAEST v2 on Raspberry Pi 4 and Apple M2, the combined optimization achieves speedups of up to 136.9×136.9\times and 330.1×330.1\times, respectively, over the pure-C reference. On RPi4, the single-thread NEON implementation outperforms OpenSSL's software AES, and on M2, the full NEON-plus-pthread configuration outperforms the best available reference configuration, including hardware-accelerated OpenSSL, across all tested parameters.