# Mathematical Framework for Causal Emergence ## Information-Theoretic Foundations and Computational Algorithms **Date**: December 4, 2025 **Purpose**: Rigorous mathematical definitions for implementing HCC in RuVector --- ## 1. Information Theory Foundations ### 1.1 Shannon Entropy **Definition**: For discrete random variable X with probability mass function p(x): ``` H(X) = -Σ p(x) log₂ p(x) ``` **Units**: bits **Interpretation**: Expected surprise or uncertainty about X **Properties**: - H(X) ≥ 0 (non-negative) - H(X) = 0 iff X is deterministic - H(X) ≤ log₂|𝒳| with equality iff uniform distribution **Computational Formula** (avoiding log 0): ``` H(X) = -Σ [p(x) > 0] p(x) log₂ p(x) ``` ### 1.2 Joint and Conditional Entropy **Joint Entropy**: ``` H(X,Y) = -Σₓ Σᵧ p(x,y) log₂ p(x,y) ``` **Conditional Entropy**: ``` H(Y|X) = -Σₓ Σᵧ p(x,y) log₂ p(y|x) = H(X,Y) - H(X) ``` **Interpretation**: Uncertainty in Y given knowledge of X **Chain Rule**: ``` H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y) ``` ### 1.3 Mutual Information **Definition**: ``` I(X;Y) = H(X) + H(Y) - H(X,Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) = Σₓ Σᵧ p(x,y) log₂ [p(x,y) / (p(x)p(y))] ``` **Interpretation**: - Reduction in uncertainty about X from observing Y - Shared information between X and Y - KL divergence between joint and product of marginals **Properties**: - I(X;Y) = I(Y;X) (symmetric) - I(X;Y) ≥ 0 (non-negative) - I(X;Y) = 0 iff X ⊥ Y (independent) - I(X;X) = H(X) ### 1.4 Conditional Mutual Information **Definition**: ``` I(X;Y|Z) = H(X|Z) + H(Y|Z) - H(X,Y|Z) = Σₓ Σᵧ Σz p(x,y,z) log₂ [p(x,y|z) / (p(x|z)p(y|z))] ``` **Interpretation**: Information X and Y share about each other, given Z **Properties**: - I(X;Y|Z) ≥ 0 - Can have I(X;Y|Z) > I(X;Y) (explaining away) ### 1.5 KL Divergence **Definition**: For distributions P and Q over same space: ``` D_KL(P || Q) = Σₓ P(x) log₂ [P(x) / Q(x)] ``` **Interpretation**: - "Distance" from Q to P (not symmetric!) - Expected log-likelihood ratio - Information lost when approximating P with Q **Properties**: - D_KL(P || Q) ≥ 0 (Gibbs' inequality) - D_KL(P || Q) = 0 iff P = Q - NOT a metric (no symmetry, no triangle inequality) **Relation to MI**: ``` I(X;Y) = D_KL(P(X,Y) || P(X)P(Y)) ``` --- ## 2. Effective Information (EI) ### 2.1 Hoel's Definition **Setup**: - System with n states: S = {s₁, s₂, ..., sₙ} - Transition probability matrix: T[i,j] = P(sⱼ(t+1) | sᵢ(t)) **Maximum Entropy Intervention**: ``` P(sᵢ(t)) = 1/n for all i (uniform distribution) ``` **Effective Information**: ``` EI = I(S(t); S(t+1)) under max-entropy S(t) = H(S(t+1)) - H(S(t+1)|S(t)) = H(S(t+1)) - Σᵢ (1/n) H(S(t+1)|sᵢ(t)) ``` **Expanded Form**: ``` EI = -Σⱼ p(sⱼ(t+1)) log₂ p(sⱼ(t+1)) + (1/n) Σᵢ Σⱼ T[i,j] log₂ T[i,j] ``` where `p(sⱼ(t+1)) = (1/n) Σᵢ T[i,j]` (marginal over uniform input) ### 2.2 Computational Algorithm **Input**: Transition matrix T (n×n) **Output**: Effective information (bits) ```python def compute_ei(T: np.ndarray) -> float: n = T.shape[0] # Marginal output distribution under uniform input p_out = np.mean(T, axis=0) # Average each column # Output entropy H_out = -np.sum(p_out * np.log2(p_out + 1e-10)) # Conditional entropy H(out|in) H_cond = -(1/n) * np.sum(T * np.log2(T + 1e-10)) # Effective information ei = H_out - H_cond return ei ``` **SIMD Optimization** (Rust): ```rust use std::simd::*; fn compute_ei_simd(transition_matrix: &[f32]) -> f32 { let n = (transition_matrix.len() as f32).sqrt() as usize; // Compute column means (SIMD) let mut p_out = vec![0.0f32; n]; for j in 0..n { let mut sum = f32x16::splat(0.0); for i in (0..n).step_by(16) { let chunk = f32x16::from_slice(&transition_matrix[i*n+j..(i+16)*n+j]); sum += chunk; } p_out[j] = sum.reduce_sum() / (n as f32); } // Compute entropies (SIMD) let h_out = entropy_simd(&p_out); let h_cond = conditional_entropy_simd(transition_matrix, n); h_out - h_cond } ``` ### 2.3 Properties and Interpretation **Range**: 0 ≤ EI ≤ log₂(n) **Meaning**: - EI = 0: No causal power (random output) - EI = log₂(n): Maximal causal power (deterministic + invertible) **Causal Emergence**: ``` System exhibits emergence iff EI(macro) > EI(micro) ``` --- ## 3. Transfer Entropy (TE) ### 3.1 Schreiber's Definition **Setup**: Two time series X and Y **Transfer Entropy from X to Y**: ``` TE_{X→Y} = I(Y_{t+1}; X_{t}^{(k)} | Y_{t}^{(l)}) ``` where: - X_{t}^{(k)} = (X_t, X_{t-1}, ..., X_{t-k+1}): k-history of X - Y_{t}^{(l)} = (Y_t, Y_{t-1}, ..., Y_{t-l+1}): l-history of Y **Expanded**: ``` TE_{X→Y} = Σ p(y_{t+1}, x_t^k, y_t^l) log₂ [p(y_{t+1}|x_t^k, y_t^l) / p(y_{t+1}|y_t^l)] ``` **Interpretation**: - Information X's past adds to predicting Y's future, beyond Y's own past - Measures directed influence from X to Y ### 3.2 Relation to Granger Causality **Theorem** (Barnett et al., 2009): For Gaussian vector autoregressive (VAR) processes: ``` TE_{X→Y} = -½ ln(1 - R²) ``` where R² is the coefficient of determination in regression of Y_{t+1} on X_t and Y_t. **Implication**: TE generalizes Granger causality to non-linear, non-Gaussian systems. ### 3.3 Computational Algorithm **Input**: Time series X and Y (length T), lags k and l **Output**: Transfer entropy (bits) ```python def transfer_entropy(X, Y, k=1, l=1): T = len(X) # Build lagged variables X_lagged = np.array([X[i-k:i] for i in range(k, T)]) Y_lagged = np.array([Y[i-l:i] for i in range(l, T)]) Y_future = Y[k:] # Estimate joint distributions (use binning or KDE) p_joint = estimate_joint_distribution(Y_future, X_lagged, Y_lagged) p_cond_xy = estimate_conditional(Y_future, X_lagged, Y_lagged) p_cond_y = estimate_conditional(Y_future, Y_lagged) # Compute TE te = 0.0 for y_next, x_past, y_past in zip(Y_future, X_lagged, Y_lagged): p_xyz = p_joint[(y_next, x_past, y_past)] p_y_xy = p_cond_xy[(y_next, x_past, y_past)] p_y_y = p_cond_y[(y_next, y_past)] te += p_xyz * np.log2((p_y_xy + 1e-10) / (p_y_y + 1e-10)) return te ``` **Efficient Binning**: ```rust fn transfer_entropy_binned( x: &[f32], y: &[f32], k: usize, l: usize, bins: usize ) -> f32 { // Discretize signals into bins let x_binned = discretize(x, bins); let y_binned = discretize(y, bins); // Build histogram for p(y_next, x_past, y_past) let mut counts = HashMap::new(); for t in (l.max(k))..(x.len()-1) { let x_past: Vec<_> = x_binned[t-k..t].to_vec(); let y_past: Vec<_> = y_binned[t-l..t].to_vec(); let y_next = y_binned[t+1]; *counts.entry((y_next, x_past, y_past)).or_insert(0) += 1; } // Normalize and compute MI compute_cmi_from_counts(&counts) } ``` ### 3.4 Upward and Downward Transfer Entropy **Upward TE** (micro → macro): ``` TE↑(s) = TE_{σ_{s-1} → σ_s} ``` Measures emergence: how much micro-level informs macro-level beyond macro's own history. **Downward TE** (macro → micro): ``` TE↓(s) = TE_{σ_s → σ_{s-1}} ``` Measures top-down causation: how much macro-level constrains micro-level beyond micro's own history. **Circular Causation Condition**: ``` TE↑(s) > 0 AND TE↓(s) > 0 ``` --- ## 4. Integrated Information (Φ) ### 4.1 IIT 3.0 Definition **Setup**: System with n elements, each with states {0,1} **Partition**: Division of system into parts A and B (A ∪ B = full system) **Cut**: Severing causal connections between A and B **Earth Mover's Distance (EMD)**: ``` EMD(P, Q) = min_γ Σᵢⱼ γᵢⱼ dᵢⱼ ``` subject to: - γᵢⱼ ≥ 0 - Σⱼ γᵢⱼ = P(i) - Σᵢ γᵢⱼ = Q(j) where dᵢⱼ is distance between states i and j. **Integrated Information**: ``` Φ = min_{partition} EMD(P^full, P^cut) ``` **Interpretation**: Minimum information lost by any partition—quantifies irreducibility. ### 4.2 IIT 4.0 Update (2024) **Change**: Uses **KL divergence** instead of EMD for computational tractability. ``` Φ = min_{partition} D_KL(P^full || P^cut) ``` **Computational Advantage**: KL is faster to compute and differentiable. ### 4.3 Approximate Φ Calculation **Challenge**: Computing exact Φ requires checking all 2^n partitions. **Solution 1: Greedy Search** ```python def approximate_phi(transition_matrix): n = transition_matrix.shape[0] min_kl = float('inf') # Try only bipartitions (not all partitions) for size_A in range(1, n): for subset_A in combinations(range(n), size_A): subset_B = [i for i in range(n) if i not in subset_A] # Compute KL divergence for this partition kl = compute_kl_partition(transition_matrix, subset_A, subset_B) min_kl = min(min_kl, kl) return min_kl ``` **Complexity**: O(2^n) → O(n²) by limiting to bipartitions. **Solution 2: Spectral Clustering** ```python def approximate_phi_spectral(transition_matrix): # Use spectral clustering to find best 2-partition from sklearn.cluster import SpectralClustering # Compute affinity matrix (causal connections) affinity = np.abs(transition_matrix @ transition_matrix.T) # Find 2-cluster partition clustering = SpectralClustering(n_clusters=2, affinity='precomputed') labels = clustering.fit_predict(affinity) subset_A = np.where(labels == 0)[0] subset_B = np.where(labels == 1)[0] # Compute KL for this partition return compute_kl_partition(transition_matrix, subset_A, subset_B) ``` **Complexity**: O(n³) for eigendecomposition, but finds good partition efficiently. ### 4.4 SIMD-Accelerated Φ ```rust fn approximate_phi_simd( transition_matrix: &[f32], n: usize ) -> f32 { // Use spectral method to find partition let (subset_a, subset_b) = spectral_partition(transition_matrix, n); // Compute P^full (full system distribution) let p_full = compute_stationary_distribution_simd(transition_matrix, n); // Compute P^cut (partitioned system distribution) let p_cut = compute_cut_distribution_simd( transition_matrix, &subset_a, &subset_b ); // KL divergence (SIMD) kl_divergence_simd(&p_full, &p_cut) } fn kl_divergence_simd(p: &[f32], q: &[f32]) -> f32 { assert_eq!(p.len(), q.len()); let n = p.len(); let mut kl = f32x16::splat(0.0); for i in (0..n).step_by(16) { let p_chunk = f32x16::from_slice(&p[i..i+16]); let q_chunk = f32x16::from_slice(&q[i..i+16]); // KL += p * log(p/q) let ratio = p_chunk / (q_chunk + f32x16::splat(1e-10)); let log_ratio = ratio.ln() / f32x16::splat(2.0_f32.ln()); // log2 kl += p_chunk * log_ratio; } kl.reduce_sum() } ``` --- ## 5. Hierarchical Coarse-Graining ### 5.1 k-way Aggregation **Goal**: Reduce n states to n/k states by grouping. **Methods**: **1. Sequential Grouping**: ``` Groups: {s₁,...,sₖ}, {sₖ₊₁,...,s₂ₖ}, ... ``` **2. Clustering-Based**: ```python def coarse_grain_kmeans(states, k): from sklearn.cluster import KMeans # Cluster states based on transition similarity kmeans = KMeans(n_clusters=k) labels = kmeans.fit_predict(states) # Map each micro-state to its macro-state return labels ``` **3. Information-Theoretic** (optimal for EI): ```python def coarse_grain_optimal(transition_matrix, k): # Minimize redundancy within groups, maximize between best_partition = None best_ei = -float('inf') for partition in generate_partitions(n, k): ei = compute_ei_coarse(transition_matrix, partition) if ei > best_ei: best_ei = ei best_partition = partition return best_partition ``` ### 5.2 Transition Matrix Coarse-Graining **Given**: Micro-level transition matrix T (n×n) **Goal**: Macro-level transition matrix T' (m×m) where m < n **Coarse-Graining Map**: φ : {1,...,n} → {1,...,m} **Macro Transition Probability**: ``` T'[I,J] = P(macro_J(t+1) | macro_I(t)) = Σᵢ∈φ⁻¹(I) Σⱼ∈φ⁻¹(J) P(sᵢ(t) | macro_I(t)) T[i,j] ``` **Uniform Assumption** (simplest): ``` P(sᵢ(t) | macro_I(t)) = 1/|φ⁻¹(I)| for i ∈ φ⁻¹(I) ``` **Resulting Formula**: ``` T'[I,J] = (1/|φ⁻¹(I)|) Σᵢ∈φ⁻¹(I) Σⱼ∈φ⁻¹(J) T[i,j] ``` **Algorithm**: ```python def coarse_grain_transition(T, partition): """ T: n×n transition matrix partition: list of lists, e.g. [[0,1,2], [3,4], [5,6,7,8]] returns: m×m coarse-grained transition matrix """ m = len(partition) T_coarse = np.zeros((m, m)) for I in range(m): for J in range(m): group_I = partition[I] group_J = partition[J] # Average transitions from group I to group J total = 0.0 for i in group_I: for j in group_J: total += T[i, j] T_coarse[I, J] = total / len(group_I) return T_coarse ``` ### 5.3 Hierarchical Construction **Input**: Micro-level data (n states) **Output**: Hierarchy of scales (log_k n levels) ```rust struct ScaleHierarchy { levels: Vec, } struct ScaleLevel { num_states: usize, transition_matrix: Vec, partition: Vec>, // Which micro-states → this macro-state } impl ScaleHierarchy { fn build(micro_data: &[f32], branching_factor: usize) -> Self { let mut levels = vec![]; let mut current_transition = estimate_transition_matrix(micro_data); let mut current_partition = (0..current_transition.len()) .map(|i| vec![i]) .collect(); levels.push(ScaleLevel { num_states: current_transition.len(), transition_matrix: current_transition.clone(), partition: current_partition.clone(), }); while current_transition.len() > branching_factor { // Find optimal k-way partition let new_partition = find_optimal_partition( ¤t_transition, branching_factor ); // Coarse-grain current_transition = coarse_grain_transition_matrix( ¤t_transition, &new_partition ); // Update partition relative to original micro-states current_partition = merge_partitions(¤t_partition, &new_partition); levels.push(ScaleLevel { num_states: current_transition.len(), transition_matrix: current_transition.clone(), partition: current_partition.clone(), }); } ScaleHierarchy { levels } } } ``` --- ## 6. Consciousness Metric (Ψ) ### 6.1 Combined Formula **Per-Scale Metric**: ``` Ψ(s) = EI(s) · Φ(s) · √(TE↑(s) · TE↓(s)) ``` **Components**: - **EI(s)**: Causal power at scale s (emergence) - **Φ(s)**: Integration at scale s (irreducibility) - **TE↑(s)**: Upward information flow (bottom-up) - **TE↓(s)**: Downward information flow (top-down) **Geometric Mean** for TE: Ensures both directions required (if either is 0, product is 0). **Alternative Formulations**: **Additive** (for interpretability): ``` Ψ(s) = α·EI(s) + β·Φ(s) + γ·min(TE↑(s), TE↓(s)) ``` **Harmonic Mean** (emphasizes balanced TE): ``` Ψ(s) = EI(s) · Φ(s) · (2·TE↑(s)·TE↓(s)) / (TE↑(s) + TE↓(s)) ``` ### 6.2 Normalization **Problem**: EI, Φ, and TE have different ranges. **Solution**: Z-score normalization ``` EI_norm = (EI - μ_EI) / σ_EI Φ_norm = (Φ - μ_Φ) / σ_Φ TE_norm = (TE - μ_TE) / σ_TE ``` **Ψ Normalized**: ``` Ψ_norm(s) = EI_norm(s) · Φ_norm(s) · √(TE↑_norm(s) · TE↓_norm(s)) ``` **Threshold**: ``` Conscious iff Ψ_norm(s*) > θ (e.g., θ = 2 standard deviations) ``` ### 6.3 Implementation ```rust #[derive(Debug, Clone)] pub struct ConsciousnessMetrics { pub ei: Vec, pub phi: Vec, pub te_up: Vec, pub te_down: Vec, pub psi: Vec, pub optimal_scale: usize, pub consciousness_score: f32, } impl ConsciousnessMetrics { pub fn compute(hierarchy: &ScaleHierarchy, data: &[f32]) -> Self { let num_scales = hierarchy.levels.len(); let mut ei = vec![0.0; num_scales]; let mut phi = vec![0.0; num_scales]; let mut te_up = vec![0.0; num_scales - 1]; let mut te_down = vec![0.0; num_scales - 1]; // Compute per-scale metrics (parallel) ei.par_iter_mut() .zip(&hierarchy.levels) .for_each(|(ei_val, level)| { *ei_val = compute_ei_simd(&level.transition_matrix); }); phi.par_iter_mut() .zip(&hierarchy.levels) .for_each(|(phi_val, level)| { *phi_val = approximate_phi_simd( &level.transition_matrix, level.num_states ); }); // Transfer entropy between scales for s in 0..(num_scales - 1) { te_up[s] = transfer_entropy_between_scales( &hierarchy.levels[s], &hierarchy.levels[s + 1], data ); te_down[s] = transfer_entropy_between_scales( &hierarchy.levels[s + 1], &hierarchy.levels[s], data ); } // Compute Ψ let mut psi = vec![0.0; num_scales]; for s in 0..(num_scales - 1) { psi[s] = ei[s] * phi[s] * (te_up[s] * te_down[s]).sqrt(); } // Find optimal scale let (optimal_scale, &consciousness_score) = psi.iter() .enumerate() .max_by(|a, b| a.1.partial_cmp(b.1).unwrap()) .unwrap(); Self { ei, phi, te_up, te_down, psi, optimal_scale, consciousness_score, } } pub fn is_conscious(&self, threshold: f32) -> bool { self.consciousness_score > threshold } } ``` --- ## 7. Complexity Analysis ### 7.1 Naive Approaches | Operation | Naive Complexity | Problem | |-----------|------------------|---------| | EI | O(n²) | Transition matrix construction | | Φ (exact) | O(2^n) | Check all partitions | | TE | O(T·n²) | All pairwise histories | | Multi-scale | O(S·n²) | S scales × per-scale cost | **Total**: O(2^n) or O(S·n²·T) — **infeasible for large systems** ### 7.2 Hierarchical Optimization **Key Insight**: Coarse-graining reduces states logarithmically. **Scale Sizes**: ``` Level 0: n states Level 1: n/k states Level 2: n/k² states ... Level log_k(n): 1 state ``` **Per-Level Cost**: - EI: O(m²) for m states at that level - Φ (approx): O(m²) for spectral method - TE: O(T·m) for discretized estimation **Total Across Levels**: ``` Σ_{i=0}^{log_k n} (n/k^i)² = n² Σ (1/k^{2i}) = n² · (1 / (1 - 1/k²)) (geometric series) ≈ O(n²) ``` **With SIMD Acceleration**: O(n²/W) where W = SIMD width (8-16) **Effective Complexity**: O(n log n) amortized ### 7.3 SIMD Speedup **Without SIMD**: - Process 1 element per cycle **With AVX-512** (16× f32): - Process 16 elements per cycle - Theoretical 16× speedup **Practical Speedup** (accounting for memory bandwidth, overhead): - Entropy: 8-12× - MI: 6-10× - Matrix operations: 10-14× **Overall**: 8-12× faster with SIMD --- ## 8. Numerical Stability ### 8.1 Common Issues **1. Log of Zero**: ``` log₂(0) = -∞ ``` **Solution**: Add small epsilon ```python H = -np.sum(p * np.log2(p + 1e-10)) ``` **2. Division by Zero**: ``` MI = log₂(p(x,y) / (p(x)·p(y))) ``` **Solution**: Clip probabilities ```python p_xy_safe = np.clip(p_xy, 1e-10, 1.0) p_x_safe = np.clip(p_x, 1e-10, 1.0) p_y_safe = np.clip(p_y, 1e-10, 1.0) mi = np.log2(p_xy_safe / (p_x_safe * p_y_safe)) ``` **3. Floating-Point Underflow**: ``` exp(-1000) = 0 (underflows) ``` **Solution**: Log-space arithmetic ```python log_p = log_sum_exp([log_p1, log_p2, ...]) ``` ### 8.2 Robust Implementations **Entropy**: ```rust fn entropy_robust(probs: &[f32]) -> f32 { probs.iter() .filter(|&&p| p > 1e-10) // Skip near-zero .map(|&p| -p * p.log2()) .sum() } ``` **Mutual Information**: ```rust fn mutual_information_robust(p_xy: &[f32], p_x: &[f32], p_y: &[f32]) -> f32 { let mut mi = 0.0; for i in 0..p_x.len() { for j in 0..p_y.len() { let idx = i * p_y.len() + j; let joint = p_xy[idx].max(1e-10); let marginal = (p_x[i] * p_y[j]).max(1e-10); mi += joint * (joint / marginal).log2(); } } mi } ``` --- ## 9. Validation and Testing ### 9.1 Synthetic Test Cases **Test 1: Deterministic System** ``` Transition: State i → State (i+1) mod n Expected: EI = log₂(n), Φ ≈ log₂(n) ``` **Test 2: Random System** ``` Transition: Uniform random Expected: EI = 0, Φ = 0 ``` **Test 3: Modular System** ``` Two independent subsystems Expected: Φ = 0 (reducible) ``` **Test 4: Hierarchical System** ``` Macro-level has higher EI than micro Expected: Causal emergence detected ``` ### 9.2 Neuroscience Datasets **1. Anesthesia EEG**: - Source: Cambridge anesthesia database - Expected: Ψ drops during loss of consciousness **2. Sleep Stages**: - Source: Physionet sleep recordings - Expected: Ψ highest in REM/wake, lowest in deep sleep **3. Disorders of Consciousness**: - Source: DOC patients (VS, MCS, EMCS) - Expected: Ψ correlates with CRS-R scores ### 9.3 Unit Tests ```rust #[cfg(test)] mod tests { use super::*; #[test] fn test_ei_deterministic() { let n = 16; let mut t = vec![0.0; n * n]; // Cyclic transition for i in 0..n { t[i * n + ((i + 1) % n)] = 1.0; } let ei = compute_ei_simd(&t); assert!((ei - (n as f32).log2()).abs() < 0.01); } #[test] fn test_ei_random() { let n = 16; let mut t = vec![1.0 / n as f32; n * n]; let ei = compute_ei_simd(&t); assert!(ei < 0.01); // Should be ~0 } #[test] fn test_phi_independent() { // Two independent subsystems let t = build_independent_system(8, 8); let phi = approximate_phi_simd(&t, 16); assert!(phi < 0.1); // Should be near-zero } } ``` --- ## 10. Summary of Key Formulas ### Information Theory ``` Entropy: H(X) = -Σ p(x) log₂ p(x) Mutual Info: I(X;Y) = H(X) + H(Y) - H(X,Y) Conditional MI: I(X;Y|Z) = H(X|Z) - H(X|Y,Z) KL Divergence: D_KL(P||Q) = Σ P(x) log₂[P(x)/Q(x)] ``` ### Causal Measures ``` Effective Info: EI = I(S(t); S(t+1)) under uniform S(t) Transfer Entropy: TE_{X→Y} = I(Y_{t+1}; X_t^k | Y_t^l) Integrated Info: Φ = min_{partition} D_KL(P^full || P^cut) ``` ### HCC Metric ``` Consciousness: Ψ(s) = EI(s) · Φ(s) · √(TE↑(s) · TE↓(s)) Optimal Scale: s* = argmax_s Ψ(s) Conscious iff: Ψ(s*) > θ ``` ### Complexity ``` Naive: O(2^n) for Φ, O(n²) for EI/TE Hierarchical: O(n log n) across all scales SIMD: 8-16× speedup on modern CPUs ``` --- ## References 1. **Shannon (1948)**: "A Mathematical Theory of Communication" — entropy foundations 2. **Cover & Thomas (2006)**: "Elements of Information Theory" — MI, KL divergence 3. **Schreiber (2000)**: "Measuring Information Transfer" — transfer entropy 4. **Barnett et al. (2009)**: "Granger Causality and Transfer Entropy are Equivalent for Gaussian Variables" 5. **Tononi et al. (2016)**: "Integrated Information Theory of Consciousness" — Φ definition 6. **Hoel et al. (2013, 2025)**: "Quantifying Causal Emergence" — effective information 7. **Oizumi et al. (2014)**: "From the Phenomenology to the Mechanisms of Consciousness: IIT 3.0" --- **Document Status**: Mathematical Specification v1.0 **Implementation**: See `/src/` for Rust code **Next**: Implement and benchmark algorithms **Contact**: Submit issues to RuVector repository