9.9 KiB
RVF Quantum-Resistant Cryptography
1. Threat Model
RVF files may contain high-value intelligence (medical genomics, proprietary embeddings, classified networks). The cryptographic design must:
- Authenticate: Prove a segment was written by an authorized producer
- Integrity: Detect any modification to segment payloads
- Quantum resistance: Survive attacks by future quantum computers
- Performance: Not bottleneck streaming ingest or query paths
- Compactness: Signatures must fit in segment footers without bloating
Harvest-Now, Decrypt-Later
Adversaries may archive RVF files today and break classical signatures later with quantum computers. Post-quantum signatures protect against this from day one.
2. Algorithm Selection
NIST Post-Quantum Standards (FIPS 204, 205, 206)
| Algorithm | Standard | Type | Sig Size | PK Size | SK Size | Sign/s | Verify/s | Level |
|---|---|---|---|---|---|---|---|---|
| ML-DSA-44 | FIPS 204 | Lattice | 2,420 B | 1,312 B | 2,560 B | ~9,000 | ~42,000 | 2 |
| ML-DSA-65 | FIPS 204 | Lattice | 3,309 B | 1,952 B | 4,032 B | ~4,500 | ~17,000 | 3 |
| ML-DSA-87 | FIPS 204 | Lattice | 4,627 B | 2,592 B | 4,896 B | ~2,800 | ~10,000 | 5 |
| SLH-DSA-128s | FIPS 205 | Hash | 7,856 B | 32 B | 64 B | ~350 | ~15,000 | 1 |
| SLH-DSA-128f | FIPS 205 | Hash | 17,088 B | 32 B | 64 B | ~3,000 | ~90,000 | 1 |
| FN-DSA-512 | FIPS 206 | Lattice | 666 B | 897 B | ~1.3 KB | ~5,000 | ~25,000 | 1 |
RVF Default: ML-DSA-65
Why ML-DSA-65:
- NIST Level 3 security (128-bit post-quantum)
- 3,309 byte signatures (manageable in segment footer)
- ~4,500 sign/s (sufficient for streaming ingest at segment level)
- ~17,000 verify/s (fast enough for progressive load verification)
- Well-studied lattice assumption (Module-LWE)
Alternative for size-constrained environments (Core Profile): FN-DSA-512 with 666 byte signatures — but FIPS 206 is newer and less deployed.
Alternative for maximum conservatism: SLH-DSA-128s (hash-based, stateless, minimal assumptions) — 7,856 byte signatures but the smallest keys and strongest theoretical foundation.
3. Signature Scheme
What Gets Signed
Each signed segment's signature covers:
signed_data = segment_header[0:40] # Header minus content_hash and padding
|| content_hash # The payload hash
|| segment_id_bytes # Prevent replay
|| context_string # Domain separation
The signature does NOT cover the raw payload directly — it covers the payload's hash. This means:
- Signing is O(1) regardless of payload size
- The hash is computed during write anyway (required for integrity)
- Verification requires only the header + hash, not the full payload
Context String
context = "RVF-v1-" || seg_type_name || "-" || profile_name
Examples:
"RVF-v1-VEC_SEG-rvdna""RVF-v1-MANIFEST_SEG-generic"
Domain separation prevents cross-type signature confusion.
Key Management
Keys are stored in CRYPTO_SEG segments:
CRYPTO_SEG Payload:
key_type: u8
0 = signing public key
1 = verification certificate chain
2 = encryption public key (for ENCRYPTED segments)
3 = key rotation record
algorithm: u8
0 = Ed25519 (classical)
1 = ML-DSA-65 (post-quantum)
2 = SLH-DSA-128s (hash-based PQ)
3 = X25519 (classical KEM)
4 = ML-KEM-768 (post-quantum KEM)
key_id: [u8; 16] Unique key identifier (hash of public key)
key_data: [u8; var] The actual key material
valid_from: u64 Timestamp (ns) when key becomes valid
valid_until: u64 Timestamp (ns) when key expires (0 = no expiry)
Key Rotation
New keys are introduced by writing a new CRYPTO_SEG with key_type=3
(rotation record) that references both old and new key IDs. Segments
signed with either key are valid during the transition period.
CRYPTO_SEG (rotation):
old_key_id: [u8; 16]
new_key_id: [u8; 16]
rotation_timestamp: u64
cross_signature: [u8; var] New key signed by old key
4. Hash Functions
SHAKE-256 (Primary)
SHAKE-256 from the SHA-3 family is used for:
- Content hashes in segment headers (128-bit truncation for compactness)
- Min-cut witness hashes (256-bit for cryptographic binding)
- Key derivation
- Domain separation
Why SHAKE-256:
- Post-quantum safe (Keccak is not vulnerable to Grover's algorithm at 256-bit output)
- Extendable output function (XOF) — can produce any hash length
- No length extension attacks
- ~1 GB/s in software, faster with hardware SHA-3 extensions
XXH3-128 (Fast Path)
XXH3 is used for non-cryptographic content hashing where speed matters more than collision resistance:
- Segment content hashes when crypto verification is not required
- Block-level integrity checks in combination with CRC32C
Performance: ~50 GB/s with AVX2. This means hash computation is never the bottleneck during streaming ingest.
CRC32C (Block Level)
CRC32C is used for per-block integrity within segments:
- Detects random bit flips and truncation
- Hardware accelerated on x86 (SSE4.2) and ARM (CRC32 extension)
- ~3 GB/s throughput
Hash Selection by Context
| Context | Algorithm | Output Size | Why |
|---|---|---|---|
| Block integrity | CRC32C | 4 B | Fastest, HW accel |
| Segment content hash (fast) | XXH3-128 | 16 B | Very fast, good distribution |
| Segment content hash (crypto) | SHAKE-256 | 16 B | Post-quantum, collision resistant |
| Witness / proof hashes | SHAKE-256 | 32 B | Full crypto strength |
| Key derivation | SHAKE-256 | 32+ B | XOF flexibility |
5. Encryption (Optional)
For ENCRYPTED segments, RVF uses hybrid encryption:
Key Encapsulation
Classical: X25519 ECDH
Post-Quantum: ML-KEM-768 (CRYSTALS-Kyber, NIST Level 3)
Hybrid: X25519 || ML-KEM-768 (concatenated shared secrets)
Payload Encryption
Algorithm: AES-256-GCM (AEAD)
Key: SHAKE-256(X25519_shared || ML-KEM_shared || context)
Nonce: First 12 bytes of SHAKE-256(segment_id || timestamp)
AAD: segment_header[0:40] (authenticated but not encrypted)
Encrypted Segment Layout
Segment Header (64B, plaintext)
flags: ENCRYPTED set
content_hash: hash of PLAINTEXT payload (for integrity after decrypt)
Encapsulated Keys
x25519_ephemeral_pk: [u8; 32]
ml_kem_ciphertext: [u8; 1088]
key_id_recipient: [u8; 16]
Encrypted Payload
AES-256-GCM ciphertext (same size as plaintext + 16B auth tag)
Signature Footer (if also SIGNED)
Signature covers header + encapsulated keys + encrypted payload
6. Capability Manifests (WITNESS_SEG)
WITNESS_SEGs provide cryptographic proof of provenance and computation:
Witness Types
0x01 PROVENANCE Who created this file and when
0x02 COMPUTATION Proof that an index was correctly built
0x03 DELEGATION Authorization chain for data access
0x04 AUDIT Record of queries executed against this file
0x05 ATTESTATION Hardware attestation (for Cognitum tiles)
Provenance Witness
creator_key_id: [u8; 16]
creation_time: u64
tool_name: [u8; 64]
tool_version: [u8; 16]
input_hashes: [(hash256, description)] Hashes of source data
transform_description: [u8; var] What was done to create vectors
signature: [u8; var] Creator's signature over all above
Computation Witness
computation_type: u8
0 = HNSW construction
1 = Quantization training
2 = Temperature compaction
3 = Overlay rebalance
4 = Index merge
input_segments: [segment_id]
output_segments: [segment_id]
parameters: [(key, value)]
result_hash: hash256
duration_ns: u64
signature: [u8; var]
This allows any reader to verify that the index was built from the declared vectors using the declared parameters — without re-running the computation.
7. Signing Performance Budget
For streaming ingest at 100K vectors/second with 1024-vector blocks:
Segment write rate: ~100 segments/second (1024 vectors per VEC_SEG)
Manifest writes: ~1/second (batched)
ML-DSA-65 signing: ~4,500/second
Signing budget: 100 segment sigs + 1 manifest sig = 101/second
Utilization: 101 / 4,500 = 2.2%
Signing is not a bottleneck. Even at 10x the ingest rate, ML-DSA-65 has headroom.
For verification during progressive load (reading 1000 segments):
ML-DSA-65 verify: ~17,000/second
Verification budget: 1000 segments / 17,000 = 59 ms
All segments verified in under 60 ms. This runs concurrently with data loading, so it adds minimal latency to the progressive boot sequence.
8. Core Profile Crypto
For the Core Profile (8 KB code budget), full ML-DSA-65 verification is too large (~15 KB of code). Options:
-
Hub verifies, tile trusts: Hub checks all signatures before sending blocks to tiles. Tile only needs CRC32C for transport integrity.
-
Truncated verification: Tile verifies only the CRC32C of received blocks. Hub provides a signed attestation that the source segments were verified.
-
FN-DSA-512: Smaller verification code (~3 KB), 666 byte signatures. Fits in tile code budget but is less mature.
Recommended: Option 1 (hub verifies, tile trusts) for the initial release. The hub is a trusted component in the Cognitum architecture, and the tile-hub channel is physically secure (on-chip mesh).
9. Algorithm Agility
The sig_algo and checksum_algo fields in segment headers and footers
allow algorithm migration without format changes:
Today: ML-DSA-65 signatures, SHAKE-256 hashes
Future: May migrate to ML-DSA-87 or newer NIST standards
Transition: Write new segments with new algo, old segments remain valid
Verification: Reader tries algo from header field, no guessing needed
New algorithms are introduced by:
- Assigning a new enum value
- Writing a CRYPTO_SEG with the new key type
- Signing new segments with the new algorithm
- Old segments with old signatures remain verifiable
No file rewrite needed. No flag day. Gradual migration through the append-only segment model.