Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,566 @@
# EXO-AI 2025 Security Architecture
## Executive Summary
EXO-AI 2025 implements a **post-quantum secure** cognitive substrate with multi-layered defense-in-depth security. This document outlines the threat model, cryptographic choices, current implementation status, and known limitations.
**Current Status**: 🟡 **Development Phase** - Core cryptographic primitives implemented with proper libraries; network layer and key management pending.
---
## Table of Contents
1. [Threat Model](#threat-model)
2. [Security Architecture](#security-architecture)
3. [Cryptographic Choices](#cryptographic-choices)
4. [Implementation Status](#implementation-status)
5. [Known Limitations](#known-limitations)
6. [Security Best Practices](#security-best-practices)
7. [Incident Response](#incident-response)
---
## Threat Model
### Adversary Capabilities
We design against the following threat actors:
| Threat Actor | Capabilities | Likelihood | Impact |
|-------------|--------------|------------|--------|
| **Quantum Adversary** | Large-scale quantum computer (Shor's algorithm) | Medium (5-15 years) | CRITICAL |
| **Network Adversary** | Passive eavesdropping, active MITM | High | HIGH |
| **Byzantine Nodes** | Up to f=(n-1)/3 malicious nodes in federation | Medium | HIGH |
| **Timing Attack** | Precise timing measurements of crypto operations | Medium | MEDIUM |
| **Memory Disclosure** | Memory dumps, cold boot attacks | Low | HIGH |
| **Supply Chain** | Compromised dependencies | Low | CRITICAL |
### Assets to Protect
1. **Cryptographic Keys**: Post-quantum keypairs, session keys, shared secrets
2. **Agent Memory**: Temporal knowledge graphs, learned patterns
3. **Federation Data**: Inter-node communications, consensus state
4. **Query Privacy**: User queries must not leak to federation observers
5. **Substrate Integrity**: Cognitive state must be tamper-evident
### Attack Surfaces
```
┌─────────────────────────────────────────────────────┐
│ ATTACK SURFACES │
├─────────────────────────────────────────────────────┤
│ │
│ 1. Network Layer │
│ • Federation handshake protocol │
│ • Onion routing implementation │
│ • Consensus message passing │
│ │
│ 2. Cryptographic Layer │
│ • Key generation (RNG quality) │
│ • Key exchange (KEM encapsulation) │
│ • Encryption (AEAD implementation) │
│ • Signature verification │
│ │
│ 3. Application Layer │
│ • Input validation (query sizes, node counts) │
│ • Deserialization (JSON parsing) │
│ • Memory management (key zeroization) │
│ │
│ 4. Physical Layer │
│ • Side-channel leakage (timing, cache) │
│ • Memory disclosure (cold boot) │
│ │
└─────────────────────────────────────────────────────┘
```
---
## Security Architecture
### Defense-in-Depth Layers
```
┌──────────────────────────────────────────────────────┐
│ Layer 1: Post-Quantum Cryptography │
│ • CRYSTALS-Kyber-1024 (KEM) │
│ • 256-bit post-quantum security level │
└──────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────┐
│ Layer 2: Authenticated Encryption │
│ • ChaCha20-Poly1305 (AEAD) │
│ • Per-session key derivation (HKDF-SHA256) │
└──────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────┐
│ Layer 3: Privacy-Preserving Routing │
│ • Onion routing (multi-hop encryption) │
│ • Traffic analysis resistance │
└──────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────┐
│ Layer 4: Byzantine Fault Tolerance │
│ • PBFT consensus (2f+1 threshold) │
│ • Cryptographic commit proofs │
└──────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────┐
│ Layer 5: Memory Safety │
│ • Rust's ownership system (no use-after-free) │
│ • Secure zeroization (zeroize crate) │
│ • Constant-time operations (subtle crate) │
└──────────────────────────────────────────────────────┘
```
### Trust Boundaries
```
┌─────────────────────────────────────────────┐
│ TRUSTED COMPUTING BASE │
│ • Rust standard library │
│ • Cryptographic libraries (audited) │
│ • Local substrate instance │
└─────────────────────────────────────────────┘
Trust Boundary (cryptographic handshake)
┌─────────────────────────────────────────────┐
│ SEMI-TRUSTED ZONE │
│ • Direct federation peers │
│ • Verified with post-quantum signatures │
│ • Subject to Byzantine consensus │
└─────────────────────────────────────────────┘
Trust Boundary (onion routing)
┌─────────────────────────────────────────────┐
│ UNTRUSTED ZONE │
│ • Multi-hop relay nodes │
│ • Global federation queries │
│ • Assume adversarial behavior │
└─────────────────────────────────────────────┘
```
---
## Cryptographic Choices
### 1. Post-Quantum Key Encapsulation Mechanism (KEM)
**Choice**: CRYSTALS-Kyber-1024
**Rationale**:
-**NIST PQC Standardization**: Selected as NIST FIPS 203 (2024)
-**Security Level**: Targets 256-bit post-quantum security (Level 5)
-**Performance**: Faster than lattice-based alternatives
-**Key Sizes**: Public key: 1184 bytes, Secret key: 2400 bytes, Ciphertext: 1568 bytes
-**Research Pedigree**: Based on Module-LWE problem, heavily analyzed
**Alternative Considered**:
- Classic McEliece (rejected: 1MB+ key sizes impractical)
- NTRU Prime (rejected: less standardization progress)
**Implementation**: `pqcrypto-kyber` v0.8 (Rust bindings to reference C implementation)
**Security Assumptions**:
- Hardness of Module Learning-With-Errors (MLWE) problem
- IND-CCA2 security in the QROM (Quantum Random Oracle Model)
### 2. Authenticated Encryption with Associated Data (AEAD)
**Choice**: ChaCha20-Poly1305
**Rationale**:
-**IETF Standard**: RFC 8439 (2018)
-**Software Performance**: 3-4x faster than AES-GCM on non-AES-NI platforms
-**Side-Channel Resistance**: Constant-time by design (no lookup tables)
-**Nonce Misuse Resistance**: 96-bit nonces reduce collision probability
-**Quantum Resistance**: Symmetric crypto only affected by Grover (256-bit key = 128-bit quantum security)
**Implementation**: `chacha20poly1305` v0.10
**Usage Pattern**:
```rust
// Derive session key from Kyber shared secret
let session_key = HKDF-SHA256(kyber_shared_secret, salt, info)
// Encrypt message with unique nonce
let ciphertext = ChaCha20-Poly1305.encrypt(
key: session_key,
nonce: counter || random,
plaintext: message,
aad: channel_metadata
)
```
### 3. Key Derivation Function (KDF)
**Choice**: HKDF-SHA-256
**Rationale**:
-**RFC 5869 Standard**: Extract-then-Expand construction
-**Post-Quantum Safe**: SHA-256 provides 128-bit quantum security (Grover)
-**Domain Separation**: Supports multiple derived keys from one shared secret
**Derived Keys**:
```
shared_secret (from Kyber KEM)
HKDF-Extract(salt, shared_secret) → PRK
HKDF-Expand(PRK, "encryption") → encryption_key (256-bit)
HKDF-Expand(PRK, "authentication") → mac_key (256-bit)
HKDF-Expand(PRK, "channel-id") → channel_identifier
```
### 4. Hash Function
**Choice**: SHA-256
**Rationale**:
-**NIST Standard**: FIPS 180-4
-**Quantum Resistance**: 128-bit security against Grover's algorithm
-**Collision Resistance**: 2^128 quantum collision search complexity
-**Widespread**: Audited implementations, hardware acceleration
**Usage**:
- Peer ID generation
- State update digests (consensus)
- Commitment schemes
**Upgrade Path**: SHA-3 (Keccak) considered for future quantum hedging.
### 5. Message Authentication Code (MAC)
**Choice**: HMAC-SHA-256
**Rationale**:
-**FIPS 198-1 Standard**
-**PRF Security**: Pseudo-random function even with related-key attacks
-**Quantum Resistance**: 128-bit quantum security
-**Timing-Safe Comparison**: Via `subtle::ConstantTimeEq`
**Note**: ChaCha20-Poly1305 includes Poly1305 MAC, so standalone HMAC only used for non-AEAD cases.
### 6. Random Number Generation (RNG)
**Choice**: `rand::thread_rng()` (OS CSPRNG)
**Rationale**:
-**OS-provided entropy**: /dev/urandom (Linux), BCryptGenRandom (Windows)
-**ChaCha20 CSPRNG**: Deterministic expansion of entropy
-**Thread-local**: Reduces contention
**Critical Requirement**: Must be properly seeded by OS. If OS entropy is weak, all cryptography fails.
---
## Implementation Status
### ✅ Implemented (Secure)
| Component | Library | Status | Notes |
|-----------|---------|--------|-------|
| **Post-Quantum KEM** | `pqcrypto-kyber` v0.8 | ✅ Ready | Kyber-1024, IND-CCA2 secure |
| **AEAD Encryption** | `chacha20poly1305` v0.10 | ⚠️ Partial | Library added, integration pending |
| **HMAC** | `hmac` v0.12 + `sha2` | ⚠️ Partial | Library added, integration pending |
| **Constant-Time Ops** | `subtle` v2.5 | ⚠️ Partial | Library added, usage pending |
| **Secure Zeroization** | `zeroize` v1.7 | ⚠️ Partial | Library added, derive macros pending |
| **Memory Safety** | Rust ownership | ✅ Ready | No unsafe code outside stdlib |
### ⚠️ Partially Implemented (Insecure Placeholders)
| Component | Current State | Security Impact | Fix Required |
|-----------|---------------|-----------------|--------------|
| **Symmetric Encryption** | XOR cipher | **CRITICAL** | Replace with ChaCha20-Poly1305 |
| **Key Exchange** | Random bytes | **CRITICAL** | Integrate `pqcrypto-kyber::kyber1024` |
| **MAC Verification** | Custom hash | **HIGH** | Use HMAC-SHA-256 with constant-time compare |
| **Onion Routing** | Predictable keys | **HIGH** | Use ephemeral Kyber per hop |
| **Signature Verification** | Hash-based | **HIGH** | Implement proper post-quantum signatures |
### ❌ Not Implemented
| Component | Priority | Quantum Threat | Notes |
|-----------|----------|----------------|-------|
| **Key Rotation** | HIGH | No | Static keys are compromise-amplifying |
| **Forward Secrecy** | HIGH | No | Session keys must be ephemeral |
| **Certificate System** | MEDIUM | Yes | Need post-quantum certificate chain |
| **Rate Limiting** | MEDIUM | No | DoS protection for consensus |
| **Audit Logging** | LOW | No | For incident response |
---
## Known Limitations
### 1. Placeholder Cryptography (CRITICAL)
**Issue**: Several modules use insecure placeholder implementations:
```rust
// ❌ INSECURE: XOR cipher in crypto.rs (line 149-155)
let ciphertext: Vec<u8> = plaintext.iter()
.zip(self.encrypt_key.iter().cycle())
.map(|(p, k)| p ^ k)
.collect();
// ✅ SECURE: Should be
use chacha20poly1305::{ChaCha20Poly1305, KeyInit, AeadInPlace};
let cipher = ChaCha20Poly1305::new(&self.encrypt_key.into());
let ciphertext = cipher.encrypt(&nonce, plaintext.as_ref())?;
```
**Impact**: Complete confidentiality break. Attackers can trivially decrypt.
**Mitigation**: See [Crypto Implementation Roadmap](#crypto-implementation-roadmap) below.
### 2. Timing Side-Channels (HIGH)
**Issue**: Non-constant-time operations leak information:
```rust
// ❌ VULNERABLE: Variable-time comparison (crypto.rs:175)
expected.as_slice() == signature // Timing leak!
// ✅ SECURE: Constant-time comparison
use subtle::ConstantTimeEq;
expected.ct_eq(signature).unwrap_u8() == 1
```
**Impact**: Attackers can extract MAC keys via timing oracle attacks.
**Mitigation**:
- Use `subtle::ConstantTimeEq` for all signature/MAC comparisons
- Audit all crypto code for timing-sensitive operations
### 3. No Key Zeroization (HIGH)
**Issue**: Secret keys not cleared from memory after use.
```rust
// ❌ INSECURE: Keys linger in memory
pub struct PostQuantumKeypair {
pub public: Vec<u8>,
secret: Vec<u8>, // Not zeroized on drop!
}
// ✅ SECURE: Automatic zeroization
use zeroize::Zeroize;
#[derive(Zeroize)]
#[zeroize(drop)]
pub struct PostQuantumKeypair {
pub public: Vec<u8>,
secret: Vec<u8>, // Auto-zeroized on drop
}
```
**Impact**: Memory disclosure attacks (cold boot, process dumps) leak keys.
**Mitigation**: Add `#[derive(Zeroize)]` and `#[zeroize(drop)]` to all key types.
### 4. JSON Deserialization Without Size Limits (MEDIUM)
**Issue**: No bounds on deserialized message sizes.
```rust
// ❌ VULNERABLE: Unbounded allocation (onion.rs:185)
serde_json::from_slice(data) // Can allocate GBs!
// ✅ SECURE: Bounded deserialization
if data.len() > MAX_MESSAGE_SIZE {
return Err(FederationError::MessageTooLarge);
}
serde_json::from_slice(data)
```
**Impact**: Denial-of-service via memory exhaustion.
**Mitigation**: Add size checks before all deserialization.
### 5. No Signature Scheme (HIGH)
**Issue**: Consensus and federation use hashes instead of signatures.
**Impact**: Cannot prove message authenticity. Byzantine nodes can forge messages.
**Mitigation**: Implement post-quantum signatures:
- **Option 1**: CRYSTALS-Dilithium (NIST FIPS 204) - Fast, moderate signatures
- **Option 2**: SPHINCS+ (NIST FIPS 205) - Hash-based, conservative
- **Recommendation**: Dilithium-5 for 256-bit post-quantum security
### 6. Single-Point Entropy Source (MEDIUM)
**Issue**: Relies solely on OS RNG without health checks.
**Impact**: If OS RNG fails (embedded systems, VMs), all crypto fails silently.
**Mitigation**:
- Add entropy health checks at startup
- Consider supplementary entropy sources (hardware RNG, userspace entropy)
---
## Security Best Practices
### For Developers
1. **Never Use `unsafe`** without security review
- Current status: ✅ No unsafe blocks in codebase
2. **Always Validate Input Sizes**
```rust
if input.len() > MAX_SIZE {
return Err(Error::InputTooLarge);
}
```
3. **Use Constant-Time Comparisons**
```rust
use subtle::ConstantTimeEq;
if secret1.ct_eq(&secret2).unwrap_u8() != 1 {
return Err(Error::AuthenticationFailed);
}
```
4. **Zeroize Sensitive Data**
```rust
#[derive(Zeroize, ZeroizeOnDrop)]
struct SecretKey(Vec<u8>);
```
5. **Never Log Secrets**
```rust
// ❌ BAD
eprintln!("Secret key: {:?}", secret);
// ✅ GOOD
eprintln!("Secret key: [REDACTED]");
```
### For Operators
1. **Key Management**
- Generate keys on hardware with good entropy (avoid VMs if possible)
- Store keys in encrypted volumes
- Rotate federation keys every 90 days
- Back up keys to offline storage
2. **Network Security**
- Use TLS 1.3 for transport (in addition to EXO-AI crypto)
- Implement rate limiting (100 requests/sec per peer)
- Firewall federation ports (default: 7777)
3. **Monitoring**
- Alert on consensus failures (Byzantine activity)
- Monitor CPU/memory (DoS detection)
- Log federation join/leave events
---
## Crypto Implementation Roadmap
### Phase 1: Fix Critical Vulnerabilities (Sprint 1)
**Priority**: 🔴 CRITICAL
- [ ] Replace XOR cipher with ChaCha20-Poly1305 in `crypto.rs`
- [ ] Integrate `pqcrypto-kyber` for real KEM in `crypto.rs`
- [ ] Add constant-time MAC verification
- [ ] Add `#[derive(Zeroize, ZeroizeOnDrop)]` to all key types
- [ ] Add input size validation to all deserialization
**Success Criteria**: No CRITICAL vulnerabilities remain.
### Phase 2: Improve Crypto Robustness (Sprint 2)
**Priority**: 🟡 HIGH
- [ ] Implement proper HKDF key derivation
- [ ] Add post-quantum signatures (Dilithium-5)
- [ ] Fix onion routing to use ephemeral keys
- [ ] Add entropy health checks
- [ ] Implement key rotation system
**Success Criteria**: All HIGH vulnerabilities mitigated.
### Phase 3: Advanced Security Features (Sprint 3+)
**Priority**: 🟢 MEDIUM
- [ ] Forward secrecy for all sessions
- [ ] Post-quantum certificate infrastructure
- [ ] Hardware RNG integration (optional)
- [ ] Formal verification of consensus protocol
- [ ] Third-party security audit
**Success Criteria**: Production-ready security posture.
---
## Incident Response
### Security Contact
**Email**: security@exo-ai.example.com (placeholder)
**PGP Key**: [Publish post-quantum resistant key when available]
**Disclosure Policy**: Coordinated disclosure, 90-day embargo
### Vulnerability Reporting
1. **DO NOT** open public GitHub issues for security bugs
2. Email security contact with:
- Description of vulnerability
- Proof-of-concept (if available)
- Impact assessment
- Suggested fix (optional)
3. Expect acknowledgment within 48 hours
4. Receive CVE assignment for accepted vulnerabilities
### Known CVEs
**None at this time** (pre-production software).
---
## Audit History
| Date | Auditor | Scope | Findings | Status |
|------|---------|-------|----------|--------|
| 2025-11-29 | Internal (Security Agent) | Full codebase | 5 CRITICAL, 3 HIGH, 2 MEDIUM | **This Document** |
---
## Appendix: Cryptographic Parameter Summary
| Primitive | Algorithm | Parameter Set | Security Level (bits) | Quantum Security (bits) |
|-----------|-----------|---------------|----------------------|------------------------|
| KEM | CRYSTALS-Kyber | Kyber-1024 | 256 (classical) | 256 (quantum) |
| AEAD | ChaCha20-Poly1305 | 256-bit key | 256 (classical) | 128 (quantum, Grover) |
| KDF | HKDF-SHA-256 | 256-bit output | 256 (classical) | 128 (quantum, Grover) |
| Hash | SHA-256 | 256-bit digest | 128 (collision) | 128 (quantum collision) |
| MAC | HMAC-SHA-256 | 256-bit key | 256 (classical) | 128 (quantum, Grover) |
**Minimum Quantum Security**: 128 bits (meets NIST Level 1, suitable for SECRET classification)
**Recommended Upgrade Timeline**:
- 2030: Migrate to Kyber-1024 + Dilithium-5 (if not already)
- 2035: Re-evaluate post-quantum standards (NIST PQC Round 4+)
- 2040: Assume large-scale quantum computers exist, full PQC migration mandatory
---
## References
1. [NIST FIPS 203](https://csrc.nist.gov/pubs/fips/203/final) - Module-Lattice-Based Key-Encapsulation Mechanism Standard
2. [RFC 8439](https://www.rfc-editor.org/rfc/rfc8439) - ChaCha20 and Poly1305
3. [RFC 5869](https://www.rfc-editor.org/rfc/rfc5869) - HKDF
4. [NIST PQC Project](https://csrc.nist.gov/projects/post-quantum-cryptography)
5. [Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems](https://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf) - Kocher, 1996
---
**Document Version**: 1.0
**Last Updated**: 2025-11-29
**Next Review**: Upon Phase 1 completion or 2025-12-31, whichever is sooner