cronokirby

(2026-04) VeriRAG; Efficient Zero-Knowledge Proofs for Verifiable Retrieval-Augmented Generation

2026-04-01

Abstract

Retrieval-Augmented Generation (RAG) is widely used to enhance Large Language Models (LLMs), yet the "hallucination" characteristic allows malicious providers to bypass retrieval or claim non-existent data quality. To address these challenges, we present VeriRAG, a framework that leverages Zero-Knowledge Proofs (ZKP) to provide efficient integrity guarantees for RAG systems without compromising dataset privacy. Leveraging the robustness of AI inference, our framework supports Approximate Nearest Neighbor Search (ANNS)-based retrieval to avoid exhaustive searches. For the verification of top-kk sorting, we propose an innovative protocol that bypasses the intricate verification of sorting processes. To further enhance performance, we introduce a joint optimization leveraging vector lookup and chunk-merging strategies, which collectively drive down verification overhead while maintaining high generation accuracy. Experimental results demonstrate that VeriRAG scales efficiently to a 37GB dataset, achieving a prover time of 96s and a verifier time of 3s.