Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions
--- a/examples/exo-ai-2025/research/07-causal-emergence/mathematical_framework.md
+++ b/examples/exo-ai-2025/research/07-causal-emergence/mathematical_framework.md
@@ -0,0 +1,986 @@
+# Mathematical Framework for Causal Emergence
+## Information-Theoretic Foundations and Computational Algorithms
+
+**Date**: December 4, 2025
+**Purpose**: Rigorous mathematical definitions for implementing HCC in RuVector
+
+---
+
+## 1. Information Theory Foundations
+
+### 1.1 Shannon Entropy
+
+**Definition**: For discrete random variable X with probability mass function p(x):
+
+```
+H(X) = -Σ p(x) log₂ p(x)
+```
+
+**Units**: bits
+**Interpretation**: Expected surprise or uncertainty about X
+
+**Properties**:
+- H(X) ≥ 0 (non-negative)
+- H(X) = 0 iff X is deterministic
+- H(X) ≤ log₂|𝒳| with equality iff uniform distribution
+
+**Computational Formula** (avoiding log 0):
+```
+H(X) = -Σ [p(x) > 0] p(x) log₂ p(x)
+```
+
+### 1.2 Joint and Conditional Entropy
+
+**Joint Entropy**:
+```
+H(X,Y) = -Σₓ Σᵧ p(x,y) log₂ p(x,y)
+```
+
+**Conditional Entropy**:
+```
+H(Y|X) = -Σₓ Σᵧ p(x,y) log₂ p(y|x)
+      = H(X,Y) - H(X)
+```
+
+**Interpretation**: Uncertainty in Y given knowledge of X
+
+**Chain Rule**:
+```
+H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y)
+```
+
+### 1.3 Mutual Information
+
+**Definition**:
+```
+I(X;Y) = H(X) + H(Y) - H(X,Y)
+       = H(X) - H(X|Y)
+       = H(Y) - H(Y|X)
+       = Σₓ Σᵧ p(x,y) log₂ [p(x,y) / (p(x)p(y))]
+```
+
+**Interpretation**:
+- Reduction in uncertainty about X from observing Y
+- Shared information between X and Y
+- KL divergence between joint and product of marginals
+
+**Properties**:
+- I(X;Y) = I(Y;X) (symmetric)
+- I(X;Y) ≥ 0 (non-negative)
+- I(X;Y) = 0 iff X ⊥ Y (independent)
+- I(X;X) = H(X)
+
+### 1.4 Conditional Mutual Information
+
+**Definition**:
+```
+I(X;Y|Z) = H(X|Z) + H(Y|Z) - H(X,Y|Z)
+         = Σₓ Σᵧ Σz p(x,y,z) log₂ [p(x,y|z) / (p(x|z)p(y|z))]
+```
+
+**Interpretation**: Information X and Y share about each other, given Z
+
+**Properties**:
+- I(X;Y|Z) ≥ 0
+- Can have I(X;Y|Z) > I(X;Y) (explaining away)
+
+### 1.5 KL Divergence
+
+**Definition**: For distributions P and Q over same space:
+```
+D_KL(P || Q) = Σₓ P(x) log₂ [P(x) / Q(x)]
+```
+
+**Interpretation**:
+- "Distance" from Q to P (not symmetric!)
+- Expected log-likelihood ratio
+- Information lost when approximating P with Q
+
+**Properties**:
+- D_KL(P || Q) ≥ 0 (Gibbs' inequality)
+- D_KL(P || Q) = 0 iff P = Q
+- NOT a metric (no symmetry, no triangle inequality)
+
+**Relation to MI**:
+```
+I(X;Y) = D_KL(P(X,Y) || P(X)P(Y))
+```
+
+---
+
+## 2. Effective Information (EI)
+
+### 2.1 Hoel's Definition
+
+**Setup**:
+- System with n states: S = {s₁, s₂, ..., sₙ}
+- Transition probability matrix: T[i,j] = P(sⱼ(t+1) | sᵢ(t))
+
+**Maximum Entropy Intervention**:
+```
+P(sᵢ(t)) = 1/n  for all i  (uniform distribution)
+```
+
+**Effective Information**:
+```
+EI = I(S(t); S(t+1))  under max-entropy S(t)
+   = H(S(t+1)) - H(S(t+1)|S(t))
+   = H(S(t+1)) - Σᵢ (1/n) H(S(t+1)|sᵢ(t))
+```
+
+**Expanded Form**:
+```
+EI = -Σⱼ p(sⱼ(t+1)) log₂ p(sⱼ(t+1)) + (1/n) Σᵢ Σⱼ T[i,j] log₂ T[i,j]
+```
+
+where `p(sⱼ(t+1)) = (1/n) Σᵢ T[i,j]` (marginal over uniform input)
+
+### 2.2 Computational Algorithm
+
+**Input**: Transition matrix T (n×n)
+**Output**: Effective information (bits)
+
+```python
+def compute_ei(T: np.ndarray) -> float:
+    n = T.shape[0]
+
+    # Marginal output distribution under uniform input
+    p_out = np.mean(T, axis=0)  # Average each column
+
+    # Output entropy
+    H_out = -np.sum(p_out * np.log2(p_out + 1e-10))
+
+    # Conditional entropy H(out|in)
+    H_cond = -(1/n) * np.sum(T * np.log2(T + 1e-10))
+
+    # Effective information
+    ei = H_out - H_cond
+
+    return ei
+```
+
+**SIMD Optimization** (Rust):
+```rust
+use std::simd::*;
+
+fn compute_ei_simd(transition_matrix: &[f32]) -> f32 {
+    let n = (transition_matrix.len() as f32).sqrt() as usize;
+
+    // Compute column means (SIMD)
+    let mut p_out = vec![0.0f32; n];
+    for j in 0..n {
+        let mut sum = f32x16::splat(0.0);
+        for i in (0..n).step_by(16) {
+            let chunk = f32x16::from_slice(&transition_matrix[i*n+j..(i+16)*n+j]);
+            sum += chunk;
+        }
+        p_out[j] = sum.reduce_sum() / (n as f32);
+    }
+
+    // Compute entropies (SIMD)
+    let h_out = entropy_simd(&p_out);
+    let h_cond = conditional_entropy_simd(transition_matrix, n);
+
+    h_out - h_cond
+}
+```
+
+### 2.3 Properties and Interpretation
+
+**Range**: 0 ≤ EI ≤ log₂(n)
+
+**Meaning**:
+- EI = 0: No causal power (random output)
+- EI = log₂(n): Maximal causal power (deterministic + invertible)
+
+**Causal Emergence**:
+```
+System exhibits emergence iff EI(macro) > EI(micro)
+```
+
+---
+
+## 3. Transfer Entropy (TE)
+
+### 3.1 Schreiber's Definition
+
+**Setup**: Two time series X and Y
+
+**Transfer Entropy from X to Y**:
+```
+TE_{X→Y} = I(Y_{t+1}; X_{t}^{(k)} | Y_{t}^{(l)})
+```
+
+where:
+- X_{t}^{(k)} = (X_t, X_{t-1}, ..., X_{t-k+1}): k-history of X
+- Y_{t}^{(l)} = (Y_t, Y_{t-1}, ..., Y_{t-l+1}): l-history of Y
+
+**Expanded**:
+```
+TE_{X→Y} = Σ p(y_{t+1}, x_t^k, y_t^l) log₂ [p(y_{t+1}|x_t^k, y_t^l) / p(y_{t+1}|y_t^l)]
+```
+
+**Interpretation**:
+- Information X's past adds to predicting Y's future, beyond Y's own past
+- Measures directed influence from X to Y
+
+### 3.2 Relation to Granger Causality
+
+**Theorem** (Barnett et al., 2009): For Gaussian vector autoregressive (VAR) processes:
+```
+TE_{X→Y} = -½ ln(1 - R²)
+```
+where R² is the coefficient of determination in regression of Y_{t+1} on X_t and Y_t.
+
+**Implication**: TE generalizes Granger causality to non-linear, non-Gaussian systems.
+
+### 3.3 Computational Algorithm
+
+**Input**: Time series X and Y (length T), lags k and l
+**Output**: Transfer entropy (bits)
+
+```python
+def transfer_entropy(X, Y, k=1, l=1):
+    T = len(X)
+
+    # Build lagged variables
+    X_lagged = np.array([X[i-k:i] for i in range(k, T)])
+    Y_lagged = np.array([Y[i-l:i] for i in range(l, T)])
+    Y_future = Y[k:]
+
+    # Estimate joint distributions (use binning or KDE)
+    p_joint = estimate_joint_distribution(Y_future, X_lagged, Y_lagged)
+    p_cond_xy = estimate_conditional(Y_future, X_lagged, Y_lagged)
+    p_cond_y = estimate_conditional(Y_future, Y_lagged)
+
+    # Compute TE
+    te = 0.0
+    for y_next, x_past, y_past in zip(Y_future, X_lagged, Y_lagged):
+        p_xyz = p_joint[(y_next, x_past, y_past)]
+        p_y_xy = p_cond_xy[(y_next, x_past, y_past)]
+        p_y_y = p_cond_y[(y_next, y_past)]
+        te += p_xyz * np.log2((p_y_xy + 1e-10) / (p_y_y + 1e-10))
+
+    return te
+```
+
+**Efficient Binning**:
+```rust
+fn transfer_entropy_binned(
+    x: &[f32],
+    y: &[f32],
+    k: usize,
+    l: usize,
+    bins: usize
+) -> f32 {
+    // Discretize signals into bins
+    let x_binned = discretize(x, bins);
+    let y_binned = discretize(y, bins);
+
+    // Build histogram for p(y_next, x_past, y_past)
+    let mut counts = HashMap::new();
+    for t in (l.max(k))..(x.len()-1) {
+        let x_past: Vec<_> = x_binned[t-k..t].to_vec();
+        let y_past: Vec<_> = y_binned[t-l..t].to_vec();
+        let y_next = y_binned[t+1];
+        *counts.entry((y_next, x_past, y_past)).or_insert(0) += 1;
+    }
+
+    // Normalize and compute MI
+    compute_cmi_from_counts(&counts)
+}
+```
+
+### 3.4 Upward and Downward Transfer Entropy
+
+**Upward TE** (micro → macro):
+```
+TE↑(s) = TE_{σ_{s-1} → σ_s}
+```
+Measures emergence: how much micro-level informs macro-level beyond macro's own history.
+
+**Downward TE** (macro → micro):
+```
+TE↓(s) = TE_{σ_s → σ_{s-1}}
+```
+Measures top-down causation: how much macro-level constrains micro-level beyond micro's own history.
+
+**Circular Causation Condition**:
+```
+TE↑(s) > 0  AND  TE↓(s) > 0
+```
+
+---
+
+## 4. Integrated Information (Φ)
+
+### 4.1 IIT 3.0 Definition
+
+**Setup**: System with n elements, each with states {0,1}
+
+**Partition**: Division of system into parts A and B (A ∪ B = full system)
+
+**Cut**: Severing causal connections between A and B
+
+**Earth Mover's Distance (EMD)**:
+```
+EMD(P, Q) = min_γ Σᵢⱼ γᵢⱼ dᵢⱼ
+```
+subject to:
+- γᵢⱼ ≥ 0
+- Σⱼ γᵢⱼ = P(i)
+- Σᵢ γᵢⱼ = Q(j)
+
+where dᵢⱼ is distance between states i and j.
+
+**Integrated Information**:
+```
+Φ = min_{partition} EMD(P^full, P^cut)
+```
+
+**Interpretation**: Minimum information lost by any partition—quantifies irreducibility.
+
+### 4.2 IIT 4.0 Update (2024)
+
+**Change**: Uses **KL divergence** instead of EMD for computational tractability.
+
+```
+Φ = min_{partition} D_KL(P^full || P^cut)
+```
+
+**Computational Advantage**: KL is faster to compute and differentiable.
+
+### 4.3 Approximate Φ Calculation
+
+**Challenge**: Computing exact Φ requires checking all 2^n partitions.
+
+**Solution 1: Greedy Search**
+```python
+def approximate_phi(transition_matrix):
+    n = transition_matrix.shape[0]
+    min_kl = float('inf')
+
+    # Try only bipartitions (not all partitions)
+    for size_A in range(1, n):
+        for subset_A in combinations(range(n), size_A):
+            subset_B = [i for i in range(n) if i not in subset_A]
+
+            # Compute KL divergence for this partition
+            kl = compute_kl_partition(transition_matrix, subset_A, subset_B)
+            min_kl = min(min_kl, kl)
+
+    return min_kl
+```
+
+**Complexity**: O(2^n) → O(n²) by limiting to bipartitions.
+
+**Solution 2: Spectral Clustering**
+```python
+def approximate_phi_spectral(transition_matrix):
+    # Use spectral clustering to find best 2-partition
+    from sklearn.cluster import SpectralClustering
+
+    # Compute affinity matrix (causal connections)
+    affinity = np.abs(transition_matrix @ transition_matrix.T)
+
+    # Find 2-cluster partition
+    clustering = SpectralClustering(n_clusters=2, affinity='precomputed')
+    labels = clustering.fit_predict(affinity)
+
+    subset_A = np.where(labels == 0)[0]
+    subset_B = np.where(labels == 1)[0]
+
+    # Compute KL for this partition
+    return compute_kl_partition(transition_matrix, subset_A, subset_B)
+```
+
+**Complexity**: O(n³) for eigendecomposition, but finds good partition efficiently.
+
+### 4.4 SIMD-Accelerated Φ
+
+```rust
+fn approximate_phi_simd(
+    transition_matrix: &[f32],
+    n: usize
+) -> f32 {
+    // Use spectral method to find partition
+    let (subset_a, subset_b) = spectral_partition(transition_matrix, n);
+
+    // Compute P^full (full system distribution)
+    let p_full = compute_stationary_distribution_simd(transition_matrix, n);
+
+    // Compute P^cut (partitioned system distribution)
+    let p_cut = compute_cut_distribution_simd(
+        transition_matrix,
+        &subset_a,
+        &subset_b
+    );
+
+    // KL divergence (SIMD)
+    kl_divergence_simd(&p_full, &p_cut)
+}
+
+fn kl_divergence_simd(p: &[f32], q: &[f32]) -> f32 {
+    assert_eq!(p.len(), q.len());
+    let n = p.len();
+
+    let mut kl = f32x16::splat(0.0);
+    for i in (0..n).step_by(16) {
+        let p_chunk = f32x16::from_slice(&p[i..i+16]);
+        let q_chunk = f32x16::from_slice(&q[i..i+16]);
+
+        // KL += p * log(p/q)
+        let ratio = p_chunk / (q_chunk + f32x16::splat(1e-10));
+        let log_ratio = ratio.ln() / f32x16::splat(2.0_f32.ln()); // log2
+        kl += p_chunk * log_ratio;
+    }
+
+    kl.reduce_sum()
+}
+```
+
+---
+
+## 5. Hierarchical Coarse-Graining
+
+### 5.1 k-way Aggregation
+
+**Goal**: Reduce n states to n/k states by grouping.
+
+**Methods**:
+
+**1. Sequential Grouping**:
+```
+Groups: {s₁,...,sₖ}, {sₖ₊₁,...,s₂ₖ}, ...
+```
+
+**2. Clustering-Based**:
+```python
+def coarse_grain_kmeans(states, k):
+    from sklearn.cluster import KMeans
+
+    # Cluster states based on transition similarity
+    kmeans = KMeans(n_clusters=k)
+    labels = kmeans.fit_predict(states)
+
+    # Map each micro-state to its macro-state
+    return labels
+```
+
+**3. Information-Theoretic** (optimal for EI):
+```python
+def coarse_grain_optimal(transition_matrix, k):
+    # Minimize redundancy within groups, maximize between
+    best_partition = None
+    best_ei = -float('inf')
+
+    for partition in generate_partitions(n, k):
+        ei = compute_ei_coarse(transition_matrix, partition)
+        if ei > best_ei:
+            best_ei = ei
+            best_partition = partition
+
+    return best_partition
+```
+
+### 5.2 Transition Matrix Coarse-Graining
+
+**Given**: Micro-level transition matrix T (n×n)
+**Goal**: Macro-level transition matrix T' (m×m) where m < n
+
+**Coarse-Graining Map**: φ : {1,...,n} → {1,...,m}
+
+**Macro Transition Probability**:
+```
+T'[I,J] = P(macro_J(t+1) | macro_I(t))
+        = Σᵢ∈φ⁻¹(I) Σⱼ∈φ⁻¹(J) P(sᵢ(t) | macro_I(t)) T[i,j]
+```
+
+**Uniform Assumption** (simplest):
+```
+P(sᵢ(t) | macro_I(t)) = 1/|φ⁻¹(I)|  for i ∈ φ⁻¹(I)
+```
+
+**Resulting Formula**:
+```
+T'[I,J] = (1/|φ⁻¹(I)|) Σᵢ∈φ⁻¹(I) Σⱼ∈φ⁻¹(J) T[i,j]
+```
+
+**Algorithm**:
+```python
+def coarse_grain_transition(T, partition):
+    """
+    T: n×n transition matrix
+    partition: list of lists, e.g. [[0,1,2], [3,4], [5,6,7,8]]
+    returns: m×m coarse-grained transition matrix
+    """
+    m = len(partition)
+    T_coarse = np.zeros((m, m))
+
+    for I in range(m):
+        for J in range(m):
+            group_I = partition[I]
+            group_J = partition[J]
+
+            # Average transitions from group I to group J
+            total = 0.0
+            for i in group_I:
+                for j in group_J:
+                    total += T[i, j]
+
+            T_coarse[I, J] = total / len(group_I)
+
+    return T_coarse
+```
+
+### 5.3 Hierarchical Construction
+
+**Input**: Micro-level data (n states)
+**Output**: Hierarchy of scales (log_k n levels)
+
+```rust
+struct ScaleHierarchy {
+    levels: Vec<ScaleLevel>,
+}
+
+struct ScaleLevel {
+    num_states: usize,
+    transition_matrix: Vec<f32>,
+    partition: Vec<Vec<usize>>, // Which micro-states → this macro-state
+}
+
+impl ScaleHierarchy {
+    fn build(micro_data: &[f32], branching_factor: usize) -> Self {
+        let mut levels = vec![];
+        let mut current_transition = estimate_transition_matrix(micro_data);
+        let mut current_partition = (0..current_transition.len())
+            .map(|i| vec![i])
+            .collect();
+
+        levels.push(ScaleLevel {
+            num_states: current_transition.len(),
+            transition_matrix: current_transition.clone(),
+            partition: current_partition.clone(),
+        });
+
+        while current_transition.len() > branching_factor {
+            // Find optimal k-way partition
+            let new_partition = find_optimal_partition(
+                &current_transition,
+                branching_factor
+            );
+
+            // Coarse-grain
+            current_transition = coarse_grain_transition_matrix(
+                &current_transition,
+                &new_partition
+            );
+
+            // Update partition relative to original micro-states
+            current_partition = merge_partitions(&current_partition, &new_partition);
+
+            levels.push(ScaleLevel {
+                num_states: current_transition.len(),
+                transition_matrix: current_transition.clone(),
+                partition: current_partition.clone(),
+            });
+        }
+
+        ScaleHierarchy { levels }
+    }
+}
+```
+
+---
+
+## 6. Consciousness Metric (Ψ)
+
+### 6.1 Combined Formula
+
+**Per-Scale Metric**:
+```
+Ψ(s) = EI(s) · Φ(s) · √(TE↑(s) · TE↓(s))
+```
+
+**Components**:
+- **EI(s)**: Causal power at scale s (emergence)
+- **Φ(s)**: Integration at scale s (irreducibility)
+- **TE↑(s)**: Upward information flow (bottom-up)
+- **TE↓(s)**: Downward information flow (top-down)
+
+**Geometric Mean** for TE: Ensures both directions required (if either is 0, product is 0).
+
+**Alternative Formulations**:
+
+**Additive** (for interpretability):
+```
+Ψ(s) = α·EI(s) + β·Φ(s) + γ·min(TE↑(s), TE↓(s))
+```
+
+**Harmonic Mean** (emphasizes balanced TE):
+```
+Ψ(s) = EI(s) · Φ(s) · (2·TE↑(s)·TE↓(s)) / (TE↑(s) + TE↓(s))
+```
+
+### 6.2 Normalization
+
+**Problem**: EI, Φ, and TE have different ranges.
+
+**Solution**: Z-score normalization
+```
+EI_norm = (EI - μ_EI) / σ_EI
+Φ_norm = (Φ - μ_Φ) / σ_Φ
+TE_norm = (TE - μ_TE) / σ_TE
+```
+
+**Ψ Normalized**:
+```
+Ψ_norm(s) = EI_norm(s) · Φ_norm(s) · √(TE↑_norm(s) · TE↓_norm(s))
+```
+
+**Threshold**:
+```
+Conscious iff Ψ_norm(s*) > θ  (e.g., θ = 2 standard deviations)
+```
+
+### 6.3 Implementation
+
+```rust
+#[derive(Debug, Clone)]
+pub struct ConsciousnessMetrics {
+    pub ei: Vec<f32>,
+    pub phi: Vec<f32>,
+    pub te_up: Vec<f32>,
+    pub te_down: Vec<f32>,
+    pub psi: Vec<f32>,
+    pub optimal_scale: usize,
+    pub consciousness_score: f32,
+}
+
+impl ConsciousnessMetrics {
+    pub fn compute(hierarchy: &ScaleHierarchy, data: &[f32]) -> Self {
+        let num_scales = hierarchy.levels.len();
+
+        let mut ei = vec![0.0; num_scales];
+        let mut phi = vec![0.0; num_scales];
+        let mut te_up = vec![0.0; num_scales - 1];
+        let mut te_down = vec![0.0; num_scales - 1];
+
+        // Compute per-scale metrics (parallel)
+        ei.par_iter_mut()
+            .zip(&hierarchy.levels)
+            .for_each(|(ei_val, level)| {
+                *ei_val = compute_ei_simd(&level.transition_matrix);
+            });
+
+        phi.par_iter_mut()
+            .zip(&hierarchy.levels)
+            .for_each(|(phi_val, level)| {
+                *phi_val = approximate_phi_simd(
+                    &level.transition_matrix,
+                    level.num_states
+                );
+            });
+
+        // Transfer entropy between scales
+        for s in 0..(num_scales - 1) {
+            te_up[s] = transfer_entropy_between_scales(
+                &hierarchy.levels[s],
+                &hierarchy.levels[s + 1],
+                data
+            );
+            te_down[s] = transfer_entropy_between_scales(
+                &hierarchy.levels[s + 1],
+                &hierarchy.levels[s],
+                data
+            );
+        }
+
+        // Compute Ψ
+        let mut psi = vec![0.0; num_scales];
+        for s in 0..(num_scales - 1) {
+            psi[s] = ei[s] * phi[s] * (te_up[s] * te_down[s]).sqrt();
+        }
+
+        // Find optimal scale
+        let (optimal_scale, &consciousness_score) = psi.iter()
+            .enumerate()
+            .max_by(|a, b| a.1.partial_cmp(b.1).unwrap())
+            .unwrap();
+
+        Self {
+            ei,
+            phi,
+            te_up,
+            te_down,
+            psi,
+            optimal_scale,
+            consciousness_score,
+        }
+    }
+
+    pub fn is_conscious(&self, threshold: f32) -> bool {
+        self.consciousness_score > threshold
+    }
+}
+```
+
+---
+
+## 7. Complexity Analysis
+
+### 7.1 Naive Approaches
+
+| Operation | Naive Complexity | Problem |
+|-----------|------------------|---------|
+| EI | O(n²) | Transition matrix construction |
+| Φ (exact) | O(2^n) | Check all partitions |
+| TE | O(T·n²) | All pairwise histories |
+| Multi-scale | O(S·n²) | S scales × per-scale cost |
+
+**Total**: O(2^n) or O(S·n²·T) — **infeasible for large systems**
+
+### 7.2 Hierarchical Optimization
+
+**Key Insight**: Coarse-graining reduces states logarithmically.
+
+**Scale Sizes**:
+```
+Level 0: n states
+Level 1: n/k states
+Level 2: n/k² states
+...
+Level log_k(n): 1 state
+```
+
+**Per-Level Cost**:
+- EI: O(m²) for m states at that level
+- Φ (approx): O(m²) for spectral method
+- TE: O(T·m) for discretized estimation
+
+**Total Across Levels**:
+```
+Σ_{i=0}^{log_k n} (n/k^i)² = n² Σ (1/k^{2i})
+                             = n² · (1 / (1 - 1/k²))  (geometric series)
+                             ≈ O(n²)
+```
+
+**With SIMD Acceleration**: O(n²/W) where W = SIMD width (8-16)
+
+**Effective Complexity**: O(n log n) amortized
+
+### 7.3 SIMD Speedup
+
+**Without SIMD**:
+- Process 1 element per cycle
+
+**With AVX-512** (16× f32):
+- Process 16 elements per cycle
+- Theoretical 16× speedup
+
+**Practical Speedup** (accounting for memory bandwidth, overhead):
+- Entropy: 8-12×
+- MI: 6-10×
+- Matrix operations: 10-14×
+
+**Overall**: 8-12× faster with SIMD
+
+---
+
+## 8. Numerical Stability
+
+### 8.1 Common Issues
+
+**1. Log of Zero**:
+```
+log₂(0) = -∞
+```
+
+**Solution**: Add small epsilon
+```python
+H = -np.sum(p * np.log2(p + 1e-10))
+```
+
+**2. Division by Zero**:
+```
+MI = log₂(p(x,y) / (p(x)·p(y)))
+```
+
+**Solution**: Clip probabilities
+```python
+p_xy_safe = np.clip(p_xy, 1e-10, 1.0)
+p_x_safe = np.clip(p_x, 1e-10, 1.0)
+p_y_safe = np.clip(p_y, 1e-10, 1.0)
+mi = np.log2(p_xy_safe / (p_x_safe * p_y_safe))
+```
+
+**3. Floating-Point Underflow**:
+```
+exp(-1000) = 0  (underflows)
+```
+
+**Solution**: Log-space arithmetic
+```python
+log_p = log_sum_exp([log_p1, log_p2, ...])
+```
+
+### 8.2 Robust Implementations
+
+**Entropy**:
+```rust
+fn entropy_robust(probs: &[f32]) -> f32 {
+    probs.iter()
+        .filter(|&&p| p > 1e-10)  // Skip near-zero
+        .map(|&p| -p * p.log2())
+        .sum()
+}
+```
+
+**Mutual Information**:
+```rust
+fn mutual_information_robust(p_xy: &[f32], p_x: &[f32], p_y: &[f32]) -> f32 {
+    let mut mi = 0.0;
+    for i in 0..p_x.len() {
+        for j in 0..p_y.len() {
+            let idx = i * p_y.len() + j;
+            let joint = p_xy[idx].max(1e-10);
+            let marginal = (p_x[i] * p_y[j]).max(1e-10);
+            mi += joint * (joint / marginal).log2();
+        }
+    }
+    mi
+}
+```
+
+---
+
+## 9. Validation and Testing
+
+### 9.1 Synthetic Test Cases
+
+**Test 1: Deterministic System**
+```
+Transition: State i → State (i+1) mod n
+Expected: EI = log₂(n), Φ ≈ log₂(n)
+```
+
+**Test 2: Random System**
+```
+Transition: Uniform random
+Expected: EI = 0, Φ = 0
+```
+
+**Test 3: Modular System**
+```
+Two independent subsystems
+Expected: Φ = 0 (reducible)
+```
+
+**Test 4: Hierarchical System**
+```
+Macro-level has higher EI than micro
+Expected: Causal emergence detected
+```
+
+### 9.2 Neuroscience Datasets
+
+**1. Anesthesia EEG**:
+- Source: Cambridge anesthesia database
+- Expected: Ψ drops during loss of consciousness
+
+**2. Sleep Stages**:
+- Source: Physionet sleep recordings
+- Expected: Ψ highest in REM/wake, lowest in deep sleep
+
+**3. Disorders of Consciousness**:
+- Source: DOC patients (VS, MCS, EMCS)
+- Expected: Ψ correlates with CRS-R scores
+
+### 9.3 Unit Tests
+
+```rust
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_ei_deterministic() {
+        let n = 16;
+        let mut t = vec![0.0; n * n];
+        // Cyclic transition
+        for i in 0..n {
+            t[i * n + ((i + 1) % n)] = 1.0;
+        }
+        let ei = compute_ei_simd(&t);
+        assert!((ei - (n as f32).log2()).abs() < 0.01);
+    }
+
+    #[test]
+    fn test_ei_random() {
+        let n = 16;
+        let mut t = vec![1.0 / n as f32; n * n];
+        let ei = compute_ei_simd(&t);
+        assert!(ei < 0.01);  // Should be ~0
+    }
+
+    #[test]
+    fn test_phi_independent() {
+        // Two independent subsystems
+        let t = build_independent_system(8, 8);
+        let phi = approximate_phi_simd(&t, 16);
+        assert!(phi < 0.1);  // Should be near-zero
+    }
+}
+```
+
+---
+
+## 10. Summary of Key Formulas
+
+### Information Theory
+```
+Entropy:            H(X) = -Σ p(x) log₂ p(x)
+Mutual Info:        I(X;Y) = H(X) + H(Y) - H(X,Y)
+Conditional MI:     I(X;Y|Z) = H(X|Z) - H(X|Y,Z)
+KL Divergence:      D_KL(P||Q) = Σ P(x) log₂[P(x)/Q(x)]
+```
+
+### Causal Measures
+```
+Effective Info:     EI = I(S(t); S(t+1)) under uniform S(t)
+Transfer Entropy:   TE_{X→Y} = I(Y_{t+1}; X_t^k | Y_t^l)
+Integrated Info:    Φ = min_{partition} D_KL(P^full || P^cut)
+```
+
+### HCC Metric
+```
+Consciousness:      Ψ(s) = EI(s) · Φ(s) · √(TE↑(s) · TE↓(s))
+Optimal Scale:      s* = argmax_s Ψ(s)
+Conscious iff:      Ψ(s*) > θ
+```
+
+### Complexity
+```
+Naive:              O(2^n) for Φ, O(n²) for EI/TE
+Hierarchical:       O(n log n) across all scales
+SIMD:               8-16× speedup on modern CPUs
+```
+
+---
+
+## References
+
+1. **Shannon (1948)**: "A Mathematical Theory of Communication" — entropy foundations
+2. **Cover & Thomas (2006)**: "Elements of Information Theory" — MI, KL divergence
+3. **Schreiber (2000)**: "Measuring Information Transfer" — transfer entropy
+4. **Barnett et al. (2009)**: "Granger Causality and Transfer Entropy are Equivalent for Gaussian Variables"
+5. **Tononi et al. (2016)**: "Integrated Information Theory of Consciousness" — Φ definition
+6. **Hoel et al. (2013, 2025)**: "Quantifying Causal Emergence" — effective information
+7. **Oizumi et al. (2014)**: "From the Phenomenology to the Mechanisms of Consciousness: IIT 3.0"
+
+---
+
+**Document Status**: Mathematical Specification v1.0
+**Implementation**: See `/src/` for Rust code
+**Next**: Implement and benchmark algorithms
+**Contact**: Submit issues to RuVector repository