Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
312
vendor/ruvector/docs/research/rvf/crypto/quantum-signatures.md
vendored
Normal file
312
vendor/ruvector/docs/research/rvf/crypto/quantum-signatures.md
vendored
Normal file
@@ -0,0 +1,312 @@
|
||||
# RVF Quantum-Resistant Cryptography
|
||||
|
||||
## 1. Threat Model
|
||||
|
||||
RVF files may contain high-value intelligence (medical genomics, proprietary
|
||||
embeddings, classified networks). The cryptographic design must:
|
||||
|
||||
1. **Authenticate**: Prove a segment was written by an authorized producer
|
||||
2. **Integrity**: Detect any modification to segment payloads
|
||||
3. **Quantum resistance**: Survive attacks by future quantum computers
|
||||
4. **Performance**: Not bottleneck streaming ingest or query paths
|
||||
5. **Compactness**: Signatures must fit in segment footers without bloating
|
||||
|
||||
### Harvest-Now, Decrypt-Later
|
||||
|
||||
Adversaries may archive RVF files today and break classical signatures later
|
||||
with quantum computers. Post-quantum signatures protect against this from day one.
|
||||
|
||||
## 2. Algorithm Selection
|
||||
|
||||
### NIST Post-Quantum Standards (FIPS 204, 205, 206)
|
||||
|
||||
| Algorithm | Standard | Type | Sig Size | PK Size | SK Size | Sign/s | Verify/s | Level |
|
||||
|-----------|----------|------|----------|---------|---------|--------|----------|-------|
|
||||
| ML-DSA-44 | FIPS 204 | Lattice | 2,420 B | 1,312 B | 2,560 B | ~9,000 | ~42,000 | 2 |
|
||||
| ML-DSA-65 | FIPS 204 | Lattice | 3,309 B | 1,952 B | 4,032 B | ~4,500 | ~17,000 | 3 |
|
||||
| ML-DSA-87 | FIPS 204 | Lattice | 4,627 B | 2,592 B | 4,896 B | ~2,800 | ~10,000 | 5 |
|
||||
| SLH-DSA-128s | FIPS 205 | Hash | 7,856 B | 32 B | 64 B | ~350 | ~15,000 | 1 |
|
||||
| SLH-DSA-128f | FIPS 205 | Hash | 17,088 B | 32 B | 64 B | ~3,000 | ~90,000 | 1 |
|
||||
| FN-DSA-512 | FIPS 206 | Lattice | 666 B | 897 B | ~1.3 KB | ~5,000 | ~25,000 | 1 |
|
||||
|
||||
### RVF Default: ML-DSA-65
|
||||
|
||||
**Why ML-DSA-65**:
|
||||
- NIST Level 3 security (128-bit post-quantum)
|
||||
- 3,309 byte signatures (manageable in segment footer)
|
||||
- ~4,500 sign/s (sufficient for streaming ingest at segment level)
|
||||
- ~17,000 verify/s (fast enough for progressive load verification)
|
||||
- Well-studied lattice assumption (Module-LWE)
|
||||
|
||||
**Alternative for size-constrained environments (Core Profile)**:
|
||||
FN-DSA-512 with 666 byte signatures — but FIPS 206 is newer and less deployed.
|
||||
|
||||
**Alternative for maximum conservatism**:
|
||||
SLH-DSA-128s (hash-based, stateless, minimal assumptions) — 7,856 byte
|
||||
signatures but the smallest keys and strongest theoretical foundation.
|
||||
|
||||
## 3. Signature Scheme
|
||||
|
||||
### What Gets Signed
|
||||
|
||||
Each signed segment's signature covers:
|
||||
|
||||
```
|
||||
signed_data = segment_header[0:40] # Header minus content_hash and padding
|
||||
|| content_hash # The payload hash
|
||||
|| segment_id_bytes # Prevent replay
|
||||
|| context_string # Domain separation
|
||||
```
|
||||
|
||||
The signature does NOT cover the raw payload directly — it covers the payload's
|
||||
hash. This means:
|
||||
- Signing is O(1) regardless of payload size
|
||||
- The hash is computed during write anyway (required for integrity)
|
||||
- Verification requires only the header + hash, not the full payload
|
||||
|
||||
### Context String
|
||||
|
||||
```
|
||||
context = "RVF-v1-" || seg_type_name || "-" || profile_name
|
||||
```
|
||||
|
||||
Examples:
|
||||
- `"RVF-v1-VEC_SEG-rvdna"`
|
||||
- `"RVF-v1-MANIFEST_SEG-generic"`
|
||||
|
||||
Domain separation prevents cross-type signature confusion.
|
||||
|
||||
### Key Management
|
||||
|
||||
Keys are stored in CRYPTO_SEG segments:
|
||||
|
||||
```
|
||||
CRYPTO_SEG Payload:
|
||||
key_type: u8
|
||||
0 = signing public key
|
||||
1 = verification certificate chain
|
||||
2 = encryption public key (for ENCRYPTED segments)
|
||||
3 = key rotation record
|
||||
|
||||
algorithm: u8
|
||||
0 = Ed25519 (classical)
|
||||
1 = ML-DSA-65 (post-quantum)
|
||||
2 = SLH-DSA-128s (hash-based PQ)
|
||||
3 = X25519 (classical KEM)
|
||||
4 = ML-KEM-768 (post-quantum KEM)
|
||||
|
||||
key_id: [u8; 16] Unique key identifier (hash of public key)
|
||||
key_data: [u8; var] The actual key material
|
||||
valid_from: u64 Timestamp (ns) when key becomes valid
|
||||
valid_until: u64 Timestamp (ns) when key expires (0 = no expiry)
|
||||
```
|
||||
|
||||
### Key Rotation
|
||||
|
||||
New keys are introduced by writing a new CRYPTO_SEG with `key_type=3`
|
||||
(rotation record) that references both old and new key IDs. Segments
|
||||
signed with either key are valid during the transition period.
|
||||
|
||||
```
|
||||
CRYPTO_SEG (rotation):
|
||||
old_key_id: [u8; 16]
|
||||
new_key_id: [u8; 16]
|
||||
rotation_timestamp: u64
|
||||
cross_signature: [u8; var] New key signed by old key
|
||||
```
|
||||
|
||||
## 4. Hash Functions
|
||||
|
||||
### SHAKE-256 (Primary)
|
||||
|
||||
SHAKE-256 from the SHA-3 family is used for:
|
||||
- Content hashes in segment headers (128-bit truncation for compactness)
|
||||
- Min-cut witness hashes (256-bit for cryptographic binding)
|
||||
- Key derivation
|
||||
- Domain separation
|
||||
|
||||
**Why SHAKE-256**:
|
||||
- Post-quantum safe (Keccak is not vulnerable to Grover's algorithm at 256-bit output)
|
||||
- Extendable output function (XOF) — can produce any hash length
|
||||
- No length extension attacks
|
||||
- ~1 GB/s in software, faster with hardware SHA-3 extensions
|
||||
|
||||
### XXH3-128 (Fast Path)
|
||||
|
||||
XXH3 is used for non-cryptographic content hashing where speed matters more
|
||||
than collision resistance:
|
||||
- Segment content hashes when crypto verification is not required
|
||||
- Block-level integrity checks in combination with CRC32C
|
||||
|
||||
**Performance**: ~50 GB/s with AVX2. This means hash computation is never
|
||||
the bottleneck during streaming ingest.
|
||||
|
||||
### CRC32C (Block Level)
|
||||
|
||||
CRC32C is used for per-block integrity within segments:
|
||||
- Detects random bit flips and truncation
|
||||
- Hardware accelerated on x86 (SSE4.2) and ARM (CRC32 extension)
|
||||
- ~3 GB/s throughput
|
||||
|
||||
### Hash Selection by Context
|
||||
|
||||
| Context | Algorithm | Output Size | Why |
|
||||
|---------|-----------|------------|-----|
|
||||
| Block integrity | CRC32C | 4 B | Fastest, HW accel |
|
||||
| Segment content hash (fast) | XXH3-128 | 16 B | Very fast, good distribution |
|
||||
| Segment content hash (crypto) | SHAKE-256 | 16 B | Post-quantum, collision resistant |
|
||||
| Witness / proof hashes | SHAKE-256 | 32 B | Full crypto strength |
|
||||
| Key derivation | SHAKE-256 | 32+ B | XOF flexibility |
|
||||
|
||||
## 5. Encryption (Optional)
|
||||
|
||||
For ENCRYPTED segments, RVF uses hybrid encryption:
|
||||
|
||||
### Key Encapsulation
|
||||
|
||||
```
|
||||
Classical: X25519 ECDH
|
||||
Post-Quantum: ML-KEM-768 (CRYSTALS-Kyber, NIST Level 3)
|
||||
Hybrid: X25519 || ML-KEM-768 (concatenated shared secrets)
|
||||
```
|
||||
|
||||
### Payload Encryption
|
||||
|
||||
```
|
||||
Algorithm: AES-256-GCM (AEAD)
|
||||
Key: SHAKE-256(X25519_shared || ML-KEM_shared || context)
|
||||
Nonce: First 12 bytes of SHAKE-256(segment_id || timestamp)
|
||||
AAD: segment_header[0:40] (authenticated but not encrypted)
|
||||
```
|
||||
|
||||
### Encrypted Segment Layout
|
||||
|
||||
```
|
||||
Segment Header (64B, plaintext)
|
||||
flags: ENCRYPTED set
|
||||
content_hash: hash of PLAINTEXT payload (for integrity after decrypt)
|
||||
|
||||
Encapsulated Keys
|
||||
x25519_ephemeral_pk: [u8; 32]
|
||||
ml_kem_ciphertext: [u8; 1088]
|
||||
key_id_recipient: [u8; 16]
|
||||
|
||||
Encrypted Payload
|
||||
AES-256-GCM ciphertext (same size as plaintext + 16B auth tag)
|
||||
|
||||
Signature Footer (if also SIGNED)
|
||||
Signature covers header + encapsulated keys + encrypted payload
|
||||
```
|
||||
|
||||
## 6. Capability Manifests (WITNESS_SEG)
|
||||
|
||||
WITNESS_SEGs provide cryptographic proof of provenance and computation:
|
||||
|
||||
### Witness Types
|
||||
|
||||
```
|
||||
0x01 PROVENANCE Who created this file and when
|
||||
0x02 COMPUTATION Proof that an index was correctly built
|
||||
0x03 DELEGATION Authorization chain for data access
|
||||
0x04 AUDIT Record of queries executed against this file
|
||||
0x05 ATTESTATION Hardware attestation (for Cognitum tiles)
|
||||
```
|
||||
|
||||
### Provenance Witness
|
||||
|
||||
```
|
||||
creator_key_id: [u8; 16]
|
||||
creation_time: u64
|
||||
tool_name: [u8; 64]
|
||||
tool_version: [u8; 16]
|
||||
input_hashes: [(hash256, description)] Hashes of source data
|
||||
transform_description: [u8; var] What was done to create vectors
|
||||
signature: [u8; var] Creator's signature over all above
|
||||
```
|
||||
|
||||
### Computation Witness
|
||||
|
||||
```
|
||||
computation_type: u8
|
||||
0 = HNSW construction
|
||||
1 = Quantization training
|
||||
2 = Temperature compaction
|
||||
3 = Overlay rebalance
|
||||
4 = Index merge
|
||||
|
||||
input_segments: [segment_id]
|
||||
output_segments: [segment_id]
|
||||
parameters: [(key, value)]
|
||||
result_hash: hash256
|
||||
duration_ns: u64
|
||||
signature: [u8; var]
|
||||
```
|
||||
|
||||
This allows any reader to verify that the index was built from the declared
|
||||
vectors using the declared parameters — without re-running the computation.
|
||||
|
||||
## 7. Signing Performance Budget
|
||||
|
||||
For streaming ingest at 100K vectors/second with 1024-vector blocks:
|
||||
|
||||
```
|
||||
Segment write rate: ~100 segments/second (1024 vectors per VEC_SEG)
|
||||
Manifest writes: ~1/second (batched)
|
||||
|
||||
ML-DSA-65 signing: ~4,500/second
|
||||
Signing budget: 100 segment sigs + 1 manifest sig = 101/second
|
||||
Utilization: 101 / 4,500 = 2.2%
|
||||
```
|
||||
|
||||
Signing is not a bottleneck. Even at 10x the ingest rate, ML-DSA-65 has
|
||||
headroom.
|
||||
|
||||
For verification during progressive load (reading 1000 segments):
|
||||
|
||||
```
|
||||
ML-DSA-65 verify: ~17,000/second
|
||||
Verification budget: 1000 segments / 17,000 = 59 ms
|
||||
```
|
||||
|
||||
All segments verified in under 60 ms. This runs concurrently with data
|
||||
loading, so it adds minimal latency to the progressive boot sequence.
|
||||
|
||||
## 8. Core Profile Crypto
|
||||
|
||||
For the Core Profile (8 KB code budget), full ML-DSA-65 verification is
|
||||
too large (~15 KB of code). Options:
|
||||
|
||||
1. **Hub verifies, tile trusts**: Hub checks all signatures before sending
|
||||
blocks to tiles. Tile only needs CRC32C for transport integrity.
|
||||
|
||||
2. **Truncated verification**: Tile verifies only the CRC32C of received
|
||||
blocks. Hub provides a signed attestation that the source segments
|
||||
were verified.
|
||||
|
||||
3. **FN-DSA-512**: Smaller verification code (~3 KB), 666 byte signatures.
|
||||
Fits in tile code budget but is less mature.
|
||||
|
||||
Recommended: Option 1 (hub verifies, tile trusts) for the initial release.
|
||||
The hub is a trusted component in the Cognitum architecture, and the
|
||||
tile-hub channel is physically secure (on-chip mesh).
|
||||
|
||||
## 9. Algorithm Agility
|
||||
|
||||
The `sig_algo` and `checksum_algo` fields in segment headers and footers
|
||||
allow algorithm migration without format changes:
|
||||
|
||||
```
|
||||
Today: ML-DSA-65 signatures, SHAKE-256 hashes
|
||||
Future: May migrate to ML-DSA-87 or newer NIST standards
|
||||
Transition: Write new segments with new algo, old segments remain valid
|
||||
Verification: Reader tries algo from header field, no guessing needed
|
||||
```
|
||||
|
||||
New algorithms are introduced by:
|
||||
1. Assigning a new enum value
|
||||
2. Writing a CRYPTO_SEG with the new key type
|
||||
3. Signing new segments with the new algorithm
|
||||
4. Old segments with old signatures remain verifiable
|
||||
|
||||
No file rewrite needed. No flag day. Gradual migration through the
|
||||
append-only segment model.
|
||||
Reference in New Issue
Block a user