Benchmarking SLH-DSA Aggregation with STARKs

2026-04-12

Code: github.com/remix7531/slh-dsa-stark-bench

Background

In April 2025, Ethan Heilman posted Post Quantum Signatures and Scaling Bitcoin to the bitcoin-dev mailing list, proposing Non-Interactive Transaction Compression (NTC) using STARK proofs. The idea: once Bitcoin adopts post-quantum signatures, miners would aggregate all PQ signatures in a block into a single proof, replacing thousands of large signatures with one constant-size proof. This addresses the main downside of PQ signatures, namely their size, and could increase Bitcoin’s transaction throughput.

I find this approach compelling because both SLH-DSA and STARKs derive their security from the collision resistance of cryptographic hash functions. No discrete logarithm assumption, no lattice assumption, no pairings, and no trusted setup. Using a NIST-standardized algorithm SLH-DSA (FIPS 205), also means there are already many highly vetted implementations, with a clear path to hardware security module support and dedicated hardware acceleration.

There is also ongoing discussion around post-quantum ZK-STARK recovery proofs, where a user proves ownership of transaction outputs from a seed without revealing the seed. Both efforts could share infrastructure: the same STARK prover could handle signature aggregation and recovery proofs, and both could be combined into a single per-block proof.

A key concern Heilman raises: if proof generation is too expensive, it could give large miners an unfair advantage. This benchmark measures exactly that: how expensive is it to prove SLH-DSA verifications?

I used RISC Zero’s zkVM to prototype this. I chose it for its Rust guest program support and its built-in SHA-256 compression function accelerator. The guest program verifies N SLH-DSA-SHA2-128s signatures.

The proving times below are wall-clock times for the prove command. In code, that is the call to prove_with_opts(..., ProverOpts::succinct()), so it includes succinct proof generation and compression. All runs use freshly generated keypairs and different messages. For the B200 runs, I set RISC0_SEGMENT_PO2=22.

Results

NNVIDIA RTX 5090NVIDIA B200AMD Ryzen 5 8640u (CPU)Proof size
14.1 s4.2 s14 min 17 s218 KiB
27.7 s6.5 s23 min 5 s219 KiB
414.7 s10.7 s39 min 8 s220 KiB
828.9 s19.5 s1 h 14 min222 KiB
1653.7 s42.9 s2 h 24 min225 KiB
321 min 51 s1 min 16 s4 h 50 min232 KiB
643 min 31 s2 min 33 snot run247 KiB
1287 min 12 s5 min 1 snot run276 KiB
25617 min 1 s10 min 2 snot run336 KiB
51226 min 28 s20 min 3 snot run454 KiB

The B200 is only about 1.3x faster than the RTX 5090 despite being a much more powerful GPU. It would likely benefit from larger proving segments that use more VRAM, but RISC Zero currently limits the maximum segment size, RISC0_SEGMENT_PO2, to 22. Early testing with CPU proving showed that RISC Zero parallelizes very well across cores. RISC Zero also has experimental support for multi-GPU proving.

Benchmark results

Outlook

For Bitcoin, proving all signatures in a typical block at 3.1 s/sig on a single RTX 5090 would take far too long. Still, there are several ways to bring that down.

A dedicated STARK prover built for SLH-DSA verification, rather than a general-purpose zkVM, could yield large improvements. S-two’s benchmarks show their prover running SHA-256 chains up to 85x faster at scale than RISC Zero’s SHA-256 precompile on CPU. SLH-DSA verification also has overhead beyond SHA-256 compression calls that is not accelerated, so the real-world speedup for full signature verification will need benchmarking.

The benchmarks above assume proving starts only after a block is found. Transactions could be preprocessed as they enter the mempool. This would shift much of the proving work to before the block is mined, leaving only a final aggregation step. This requires clever algorithms for deciding which transactions to batch, probably by grouping signatures with similar fee levels.

STARK segment proving is embarrassingly parallel and could be distributed across multiple GPUs with little coordination overhead. RISC Zero already has experimental multi-GPU support.

There are also Bitcoin optimized SPHINCS+ for faster verification and smaller signatures from Mikhail Kudinov and Jonas Nick, with fewer SHA-256 compression calls, but those only yield roughly a 3x speedup. With signature aggregation, the smaller signatures matter less since they are replaced by a single proof anyway. I would rather see miners run a larger GPU cluster than give up NIST standardization. At current rates, a 3x improvement is likely overtaken within a few years of GPU development.