Benchmarking SLH-DSA Aggregation with STARKs

Code: github.com/remix7531/slh-dsa-stark-bench

Background

In April 2025, Ethan Heilman posted Post Quantum Signatures and Scaling Bitcoin to the bitcoin-dev mailing list, proposing Non-Interactive Transaction Compression (NTC) using STARK proofs. The idea: once Bitcoin adopts post-quantum signatures, miners would aggregate all PQ signatures in a block into a single proof, replacing thousands of large signatures with one constant-size proof. This addresses the main downside of PQ signatures, namely their size, and could increase Bitcoin’s transaction throughput.

I find this approach compelling because both SLH-DSA and STARKs derive their security from the collision resistance of cryptographic hash functions. No discrete logarithm assumption, no lattice assumption, no pairings, and no trusted setup. Using a NIST-standardized algorithm SLH-DSA (FIPS 205), also means there are already many highly vetted implementations, with a clear path to hardware security module support and dedicated hardware acceleration.

There is also ongoing discussion around post-quantum ZK-STARK recovery proofs, where a user proves ownership of transaction outputs from a seed without revealing the seed. Both efforts could share infrastructure: the same STARK prover could handle signature aggregation and recovery proofs, and both could be combined into a single per-block proof.

A key concern Heilman raises: if proof generation is too expensive, it could give large miners an unfair advantage. This benchmark measures exactly that: how expensive is it to prove SLH-DSA verifications?

I used RISC Zero’s zkVM to prototype this. I chose it for its Rust guest program support and its built-in SHA-256 compression function accelerator. The guest program verifies N SLH-DSA-SHA2-128s signatures.

The proving times below are wall-clock times for the prove command. In code, that is the call to prove_with_opts(..., ProverOpts::succinct()), so it includes succinct proof generation and compression. All runs use freshly generated keypairs and different messages. For the B200 runs, I set RISC0_SEGMENT_PO2=22.

Results

Proving scales roughly linearly with N.
Average per-signature proving time is about 3.1 seconds on an RTX 5090.
Proof size grows sublinearly, from 218 KiB at N=1 to 454 KiB at N=512, compared to 3.8 MiB of raw signatures at N=512.
Verification stays roughly constant at 12 to 15 ms regardless of N.

N	NVIDIA RTX 5090	NVIDIA B200	AMD Ryzen 5 8640u (CPU)	Proof size
1	4.1 s	4.2 s	14 min 17 s	218 KiB
2	7.7 s	6.5 s	23 min 5 s	219 KiB
4	14.7 s	10.7 s	39 min 8 s	220 KiB
8	28.9 s	19.5 s	1 h 14 min	222 KiB
16	53.7 s	42.9 s	2 h 24 min	225 KiB
32	1 min 51 s	1 min 16 s	4 h 50 min	232 KiB
64	3 min 31 s	2 min 33 s	not run	247 KiB
128	7 min 12 s	5 min 1 s	not run	276 KiB
256	17 min 1 s	10 min 2 s	not run	336 KiB
512	26 min 28 s	20 min 3 s	not run	454 KiB

The B200 is only about 1.3x faster than the RTX 5090 despite being a much more powerful GPU. It would likely benefit from larger proving segments that use more VRAM, but RISC Zero currently limits the maximum segment size, RISC0_SEGMENT_PO2, to 22. Early testing with CPU proving showed that RISC Zero parallelizes very well across cores. RISC Zero also has experimental support for multi-GPU proving.

Benchmark results

Outlook

For Bitcoin, proving all signatures in a typical block at 3.1 s/sig on a single RTX 5090 would take far too long. Still, there are several ways to bring that down.

A dedicated STARK prover built for SLH-DSA verification, rather than a general-purpose zkVM, could yield large improvements. S-two’s benchmarks show their prover running SHA-256 chains up to 85x faster at scale than RISC Zero’s SHA-256 precompile on CPU. SLH-DSA verification also has overhead beyond SHA-256 compression calls that is not accelerated, so the real-world speedup for full signature verification will need benchmarking.

The benchmarks above assume proving starts only after a block is found. Transactions could be preprocessed as they enter the mempool. This would shift much of the proving work to before the block is mined, leaving only a final aggregation step. This requires clever algorithms for deciding which transactions to batch, probably by grouping signatures with similar fee levels.

STARK segment proving is embarrassingly parallel and could be distributed across multiple GPUs with little coordination overhead. RISC Zero already has experimental multi-GPU support.

There are also Bitcoin optimized SPHINCS+ for faster verification and smaller signatures from Mikhail Kudinov and Jonas Nick, with fewer SHA-256 compression calls, but those only yield roughly a 3x speedup. With signature aggregation, the smaller signatures matter less since they are replaced by a single proof anyway. I would rather see miners run a larger GPU cluster than give up NIST standardization. At current rates, a 3x improvement is likely overtaken within a few years of GPU development.

Benchmarking SLH-DSA Aggregation with STARKs

2026-04-12

Background

Results

Outlook