Files
wifi-densepose/examples/exo-ai-2025/research/07-causal-emergence/mathematical_framework.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

24 KiB
Raw Blame History

Mathematical Framework for Causal Emergence

Information-Theoretic Foundations and Computational Algorithms

Date: December 4, 2025 Purpose: Rigorous mathematical definitions for implementing HCC in RuVector


1. Information Theory Foundations

1.1 Shannon Entropy

Definition: For discrete random variable X with probability mass function p(x):

H(X) = -Σ p(x) log₂ p(x)

Units: bits Interpretation: Expected surprise or uncertainty about X

Properties:

  • H(X) ≥ 0 (non-negative)
  • H(X) = 0 iff X is deterministic
  • H(X) ≤ log₂|𝒳| with equality iff uniform distribution

Computational Formula (avoiding log 0):

H(X) = -Σ [p(x) > 0] p(x) log₂ p(x)

1.2 Joint and Conditional Entropy

Joint Entropy:

H(X,Y) = -Σₓ Σᵧ p(x,y) log₂ p(x,y)

Conditional Entropy:

H(Y|X) = -Σₓ Σᵧ p(x,y) log₂ p(y|x)
      = H(X,Y) - H(X)

Interpretation: Uncertainty in Y given knowledge of X

Chain Rule:

H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y)

1.3 Mutual Information

Definition:

I(X;Y) = H(X) + H(Y) - H(X,Y)
       = H(X) - H(X|Y)
       = H(Y) - H(Y|X)
       = Σₓ Σᵧ p(x,y) log₂ [p(x,y) / (p(x)p(y))]

Interpretation:

  • Reduction in uncertainty about X from observing Y
  • Shared information between X and Y
  • KL divergence between joint and product of marginals

Properties:

  • I(X;Y) = I(Y;X) (symmetric)
  • I(X;Y) ≥ 0 (non-negative)
  • I(X;Y) = 0 iff X ⊥ Y (independent)
  • I(X;X) = H(X)

1.4 Conditional Mutual Information

Definition:

I(X;Y|Z) = H(X|Z) + H(Y|Z) - H(X,Y|Z)
         = Σₓ Σᵧ Σz p(x,y,z) log₂ [p(x,y|z) / (p(x|z)p(y|z))]

Interpretation: Information X and Y share about each other, given Z

Properties:

  • I(X;Y|Z) ≥ 0
  • Can have I(X;Y|Z) > I(X;Y) (explaining away)

1.5 KL Divergence

Definition: For distributions P and Q over same space:

D_KL(P || Q) = Σₓ P(x) log₂ [P(x) / Q(x)]

Interpretation:

  • "Distance" from Q to P (not symmetric!)
  • Expected log-likelihood ratio
  • Information lost when approximating P with Q

Properties:

  • D_KL(P || Q) ≥ 0 (Gibbs' inequality)
  • D_KL(P || Q) = 0 iff P = Q
  • NOT a metric (no symmetry, no triangle inequality)

Relation to MI:

I(X;Y) = D_KL(P(X,Y) || P(X)P(Y))

2. Effective Information (EI)

2.1 Hoel's Definition

Setup:

  • System with n states: S = {s₁, s₂, ..., sₙ}
  • Transition probability matrix: T[i,j] = P(sⱼ(t+1) | sᵢ(t))

Maximum Entropy Intervention:

P(sᵢ(t)) = 1/n  for all i  (uniform distribution)

Effective Information:

EI = I(S(t); S(t+1))  under max-entropy S(t)
   = H(S(t+1)) - H(S(t+1)|S(t))
   = H(S(t+1)) - Σᵢ (1/n) H(S(t+1)|sᵢ(t))

Expanded Form:

EI = -Σⱼ p(sⱼ(t+1)) log₂ p(sⱼ(t+1)) + (1/n) Σᵢ Σⱼ T[i,j] log₂ T[i,j]

where p(sⱼ(t+1)) = (1/n) Σᵢ T[i,j] (marginal over uniform input)

2.2 Computational Algorithm

Input: Transition matrix T (n×n) Output: Effective information (bits)

def compute_ei(T: np.ndarray) -> float:
    n = T.shape[0]

    # Marginal output distribution under uniform input
    p_out = np.mean(T, axis=0)  # Average each column

    # Output entropy
    H_out = -np.sum(p_out * np.log2(p_out + 1e-10))

    # Conditional entropy H(out|in)
    H_cond = -(1/n) * np.sum(T * np.log2(T + 1e-10))

    # Effective information
    ei = H_out - H_cond

    return ei

SIMD Optimization (Rust):

use std::simd::*;

fn compute_ei_simd(transition_matrix: &[f32]) -> f32 {
    let n = (transition_matrix.len() as f32).sqrt() as usize;

    // Compute column means (SIMD)
    let mut p_out = vec![0.0f32; n];
    for j in 0..n {
        let mut sum = f32x16::splat(0.0);
        for i in (0..n).step_by(16) {
            let chunk = f32x16::from_slice(&transition_matrix[i*n+j..(i+16)*n+j]);
            sum += chunk;
        }
        p_out[j] = sum.reduce_sum() / (n as f32);
    }

    // Compute entropies (SIMD)
    let h_out = entropy_simd(&p_out);
    let h_cond = conditional_entropy_simd(transition_matrix, n);

    h_out - h_cond
}

2.3 Properties and Interpretation

Range: 0 ≤ EI ≤ log₂(n)

Meaning:

  • EI = 0: No causal power (random output)
  • EI = log₂(n): Maximal causal power (deterministic + invertible)

Causal Emergence:

System exhibits emergence iff EI(macro) > EI(micro)

3. Transfer Entropy (TE)

3.1 Schreiber's Definition

Setup: Two time series X and Y

Transfer Entropy from X to Y:

TE_{X→Y} = I(Y_{t+1}; X_{t}^{(k)} | Y_{t}^{(l)})

where:

  • X_{t}^{(k)} = (X_t, X_{t-1}, ..., X_{t-k+1}): k-history of X
  • Y_{t}^{(l)} = (Y_t, Y_{t-1}, ..., Y_{t-l+1}): l-history of Y

Expanded:

TE_{X→Y} = Σ p(y_{t+1}, x_t^k, y_t^l) log₂ [p(y_{t+1}|x_t^k, y_t^l) / p(y_{t+1}|y_t^l)]

Interpretation:

  • Information X's past adds to predicting Y's future, beyond Y's own past
  • Measures directed influence from X to Y

3.2 Relation to Granger Causality

Theorem (Barnett et al., 2009): For Gaussian vector autoregressive (VAR) processes:

TE_{X→Y} = -½ ln(1 - R²)

where R² is the coefficient of determination in regression of Y_{t+1} on X_t and Y_t.

Implication: TE generalizes Granger causality to non-linear, non-Gaussian systems.

3.3 Computational Algorithm

Input: Time series X and Y (length T), lags k and l Output: Transfer entropy (bits)

def transfer_entropy(X, Y, k=1, l=1):
    T = len(X)

    # Build lagged variables
    X_lagged = np.array([X[i-k:i] for i in range(k, T)])
    Y_lagged = np.array([Y[i-l:i] for i in range(l, T)])
    Y_future = Y[k:]

    # Estimate joint distributions (use binning or KDE)
    p_joint = estimate_joint_distribution(Y_future, X_lagged, Y_lagged)
    p_cond_xy = estimate_conditional(Y_future, X_lagged, Y_lagged)
    p_cond_y = estimate_conditional(Y_future, Y_lagged)

    # Compute TE
    te = 0.0
    for y_next, x_past, y_past in zip(Y_future, X_lagged, Y_lagged):
        p_xyz = p_joint[(y_next, x_past, y_past)]
        p_y_xy = p_cond_xy[(y_next, x_past, y_past)]
        p_y_y = p_cond_y[(y_next, y_past)]
        te += p_xyz * np.log2((p_y_xy + 1e-10) / (p_y_y + 1e-10))

    return te

Efficient Binning:

fn transfer_entropy_binned(
    x: &[f32],
    y: &[f32],
    k: usize,
    l: usize,
    bins: usize
) -> f32 {
    // Discretize signals into bins
    let x_binned = discretize(x, bins);
    let y_binned = discretize(y, bins);

    // Build histogram for p(y_next, x_past, y_past)
    let mut counts = HashMap::new();
    for t in (l.max(k))..(x.len()-1) {
        let x_past: Vec<_> = x_binned[t-k..t].to_vec();
        let y_past: Vec<_> = y_binned[t-l..t].to_vec();
        let y_next = y_binned[t+1];
        *counts.entry((y_next, x_past, y_past)).or_insert(0) += 1;
    }

    // Normalize and compute MI
    compute_cmi_from_counts(&counts)
}

3.4 Upward and Downward Transfer Entropy

Upward TE (micro → macro):

TE↑(s) = TE_{σ_{s-1} → σ_s}

Measures emergence: how much micro-level informs macro-level beyond macro's own history.

Downward TE (macro → micro):

TE↓(s) = TE_{σ_s → σ_{s-1}}

Measures top-down causation: how much macro-level constrains micro-level beyond micro's own history.

Circular Causation Condition:

TE↑(s) > 0  AND  TE↓(s) > 0

4. Integrated Information (Φ)

4.1 IIT 3.0 Definition

Setup: System with n elements, each with states {0,1}

Partition: Division of system into parts A and B (A B = full system)

Cut: Severing causal connections between A and B

Earth Mover's Distance (EMD):

EMD(P, Q) = min_γ Σᵢⱼ γᵢⱼ dᵢⱼ

subject to:

  • γᵢⱼ ≥ 0
  • Σⱼ γᵢⱼ = P(i)
  • Σᵢ γᵢⱼ = Q(j)

where dᵢⱼ is distance between states i and j.

Integrated Information:

Φ = min_{partition} EMD(P^full, P^cut)

Interpretation: Minimum information lost by any partition—quantifies irreducibility.

4.2 IIT 4.0 Update (2024)

Change: Uses KL divergence instead of EMD for computational tractability.

Φ = min_{partition} D_KL(P^full || P^cut)

Computational Advantage: KL is faster to compute and differentiable.

4.3 Approximate Φ Calculation

Challenge: Computing exact Φ requires checking all 2^n partitions.

Solution 1: Greedy Search

def approximate_phi(transition_matrix):
    n = transition_matrix.shape[0]
    min_kl = float('inf')

    # Try only bipartitions (not all partitions)
    for size_A in range(1, n):
        for subset_A in combinations(range(n), size_A):
            subset_B = [i for i in range(n) if i not in subset_A]

            # Compute KL divergence for this partition
            kl = compute_kl_partition(transition_matrix, subset_A, subset_B)
            min_kl = min(min_kl, kl)

    return min_kl

Complexity: O(2^n) → O(n²) by limiting to bipartitions.

Solution 2: Spectral Clustering

def approximate_phi_spectral(transition_matrix):
    # Use spectral clustering to find best 2-partition
    from sklearn.cluster import SpectralClustering

    # Compute affinity matrix (causal connections)
    affinity = np.abs(transition_matrix @ transition_matrix.T)

    # Find 2-cluster partition
    clustering = SpectralClustering(n_clusters=2, affinity='precomputed')
    labels = clustering.fit_predict(affinity)

    subset_A = np.where(labels == 0)[0]
    subset_B = np.where(labels == 1)[0]

    # Compute KL for this partition
    return compute_kl_partition(transition_matrix, subset_A, subset_B)

Complexity: O(n³) for eigendecomposition, but finds good partition efficiently.

4.4 SIMD-Accelerated Φ

fn approximate_phi_simd(
    transition_matrix: &[f32],
    n: usize
) -> f32 {
    // Use spectral method to find partition
    let (subset_a, subset_b) = spectral_partition(transition_matrix, n);

    // Compute P^full (full system distribution)
    let p_full = compute_stationary_distribution_simd(transition_matrix, n);

    // Compute P^cut (partitioned system distribution)
    let p_cut = compute_cut_distribution_simd(
        transition_matrix,
        &subset_a,
        &subset_b
    );

    // KL divergence (SIMD)
    kl_divergence_simd(&p_full, &p_cut)
}

fn kl_divergence_simd(p: &[f32], q: &[f32]) -> f32 {
    assert_eq!(p.len(), q.len());
    let n = p.len();

    let mut kl = f32x16::splat(0.0);
    for i in (0..n).step_by(16) {
        let p_chunk = f32x16::from_slice(&p[i..i+16]);
        let q_chunk = f32x16::from_slice(&q[i..i+16]);

        // KL += p * log(p/q)
        let ratio = p_chunk / (q_chunk + f32x16::splat(1e-10));
        let log_ratio = ratio.ln() / f32x16::splat(2.0_f32.ln()); // log2
        kl += p_chunk * log_ratio;
    }

    kl.reduce_sum()
}

5. Hierarchical Coarse-Graining

5.1 k-way Aggregation

Goal: Reduce n states to n/k states by grouping.

Methods:

1. Sequential Grouping:

Groups: {s₁,...,sₖ}, {sₖ₊₁,...,s₂ₖ}, ...

2. Clustering-Based:

def coarse_grain_kmeans(states, k):
    from sklearn.cluster import KMeans

    # Cluster states based on transition similarity
    kmeans = KMeans(n_clusters=k)
    labels = kmeans.fit_predict(states)

    # Map each micro-state to its macro-state
    return labels

3. Information-Theoretic (optimal for EI):

def coarse_grain_optimal(transition_matrix, k):
    # Minimize redundancy within groups, maximize between
    best_partition = None
    best_ei = -float('inf')

    for partition in generate_partitions(n, k):
        ei = compute_ei_coarse(transition_matrix, partition)
        if ei > best_ei:
            best_ei = ei
            best_partition = partition

    return best_partition

5.2 Transition Matrix Coarse-Graining

Given: Micro-level transition matrix T (n×n) Goal: Macro-level transition matrix T' (m×m) where m < n

Coarse-Graining Map: φ : {1,...,n} → {1,...,m}

Macro Transition Probability:

T'[I,J] = P(macro_J(t+1) | macro_I(t))
        = Σᵢ∈φ⁻¹(I) Σⱼ∈φ⁻¹(J) P(sᵢ(t) | macro_I(t)) T[i,j]

Uniform Assumption (simplest):

P(sᵢ(t) | macro_I(t)) = 1/|φ⁻¹(I)|  for i ∈ φ⁻¹(I)

Resulting Formula:

T'[I,J] = (1/|φ⁻¹(I)|) Σᵢ∈φ⁻¹(I) Σⱼ∈φ⁻¹(J) T[i,j]

Algorithm:

def coarse_grain_transition(T, partition):
    """
    T: n×n transition matrix
    partition: list of lists, e.g. [[0,1,2], [3,4], [5,6,7,8]]
    returns: m×m coarse-grained transition matrix
    """
    m = len(partition)
    T_coarse = np.zeros((m, m))

    for I in range(m):
        for J in range(m):
            group_I = partition[I]
            group_J = partition[J]

            # Average transitions from group I to group J
            total = 0.0
            for i in group_I:
                for j in group_J:
                    total += T[i, j]

            T_coarse[I, J] = total / len(group_I)

    return T_coarse

5.3 Hierarchical Construction

Input: Micro-level data (n states) Output: Hierarchy of scales (log_k n levels)

struct ScaleHierarchy {
    levels: Vec<ScaleLevel>,
}

struct ScaleLevel {
    num_states: usize,
    transition_matrix: Vec<f32>,
    partition: Vec<Vec<usize>>, // Which micro-states → this macro-state
}

impl ScaleHierarchy {
    fn build(micro_data: &[f32], branching_factor: usize) -> Self {
        let mut levels = vec![];
        let mut current_transition = estimate_transition_matrix(micro_data);
        let mut current_partition = (0..current_transition.len())
            .map(|i| vec![i])
            .collect();

        levels.push(ScaleLevel {
            num_states: current_transition.len(),
            transition_matrix: current_transition.clone(),
            partition: current_partition.clone(),
        });

        while current_transition.len() > branching_factor {
            // Find optimal k-way partition
            let new_partition = find_optimal_partition(
                &current_transition,
                branching_factor
            );

            // Coarse-grain
            current_transition = coarse_grain_transition_matrix(
                &current_transition,
                &new_partition
            );

            // Update partition relative to original micro-states
            current_partition = merge_partitions(&current_partition, &new_partition);

            levels.push(ScaleLevel {
                num_states: current_transition.len(),
                transition_matrix: current_transition.clone(),
                partition: current_partition.clone(),
            });
        }

        ScaleHierarchy { levels }
    }
}

6. Consciousness Metric (Ψ)

6.1 Combined Formula

Per-Scale Metric:

Ψ(s) = EI(s) · Φ(s) · √(TE↑(s) · TE↓(s))

Components:

  • EI(s): Causal power at scale s (emergence)
  • Φ(s): Integration at scale s (irreducibility)
  • TE↑(s): Upward information flow (bottom-up)
  • TE↓(s): Downward information flow (top-down)

Geometric Mean for TE: Ensures both directions required (if either is 0, product is 0).

Alternative Formulations:

Additive (for interpretability):

Ψ(s) = α·EI(s) + β·Φ(s) + γ·min(TE↑(s), TE↓(s))

Harmonic Mean (emphasizes balanced TE):

Ψ(s) = EI(s) · Φ(s) · (2·TE↑(s)·TE↓(s)) / (TE↑(s) + TE↓(s))

6.2 Normalization

Problem: EI, Φ, and TE have different ranges.

Solution: Z-score normalization

EI_norm = (EI - μ_EI) / σ_EI
Φ_norm = (Φ - μ_Φ) / σ_Φ
TE_norm = (TE - μ_TE) / σ_TE

Ψ Normalized:

Ψ_norm(s) = EI_norm(s) · Φ_norm(s) · √(TE↑_norm(s) · TE↓_norm(s))

Threshold:

Conscious iff Ψ_norm(s*) > θ  (e.g., θ = 2 standard deviations)

6.3 Implementation

#[derive(Debug, Clone)]
pub struct ConsciousnessMetrics {
    pub ei: Vec<f32>,
    pub phi: Vec<f32>,
    pub te_up: Vec<f32>,
    pub te_down: Vec<f32>,
    pub psi: Vec<f32>,
    pub optimal_scale: usize,
    pub consciousness_score: f32,
}

impl ConsciousnessMetrics {
    pub fn compute(hierarchy: &ScaleHierarchy, data: &[f32]) -> Self {
        let num_scales = hierarchy.levels.len();

        let mut ei = vec![0.0; num_scales];
        let mut phi = vec![0.0; num_scales];
        let mut te_up = vec![0.0; num_scales - 1];
        let mut te_down = vec![0.0; num_scales - 1];

        // Compute per-scale metrics (parallel)
        ei.par_iter_mut()
            .zip(&hierarchy.levels)
            .for_each(|(ei_val, level)| {
                *ei_val = compute_ei_simd(&level.transition_matrix);
            });

        phi.par_iter_mut()
            .zip(&hierarchy.levels)
            .for_each(|(phi_val, level)| {
                *phi_val = approximate_phi_simd(
                    &level.transition_matrix,
                    level.num_states
                );
            });

        // Transfer entropy between scales
        for s in 0..(num_scales - 1) {
            te_up[s] = transfer_entropy_between_scales(
                &hierarchy.levels[s],
                &hierarchy.levels[s + 1],
                data
            );
            te_down[s] = transfer_entropy_between_scales(
                &hierarchy.levels[s + 1],
                &hierarchy.levels[s],
                data
            );
        }

        // Compute Ψ
        let mut psi = vec![0.0; num_scales];
        for s in 0..(num_scales - 1) {
            psi[s] = ei[s] * phi[s] * (te_up[s] * te_down[s]).sqrt();
        }

        // Find optimal scale
        let (optimal_scale, &consciousness_score) = psi.iter()
            .enumerate()
            .max_by(|a, b| a.1.partial_cmp(b.1).unwrap())
            .unwrap();

        Self {
            ei,
            phi,
            te_up,
            te_down,
            psi,
            optimal_scale,
            consciousness_score,
        }
    }

    pub fn is_conscious(&self, threshold: f32) -> bool {
        self.consciousness_score > threshold
    }
}

7. Complexity Analysis

7.1 Naive Approaches

Operation Naive Complexity Problem
EI O(n²) Transition matrix construction
Φ (exact) O(2^n) Check all partitions
TE O(T·n²) All pairwise histories
Multi-scale O(S·n²) S scales × per-scale cost

Total: O(2^n) or O(S·n²·T) — infeasible for large systems

7.2 Hierarchical Optimization

Key Insight: Coarse-graining reduces states logarithmically.

Scale Sizes:

Level 0: n states
Level 1: n/k states
Level 2: n/k² states
...
Level log_k(n): 1 state

Per-Level Cost:

  • EI: O(m²) for m states at that level
  • Φ (approx): O(m²) for spectral method
  • TE: O(T·m) for discretized estimation

Total Across Levels:

Σ_{i=0}^{log_k n} (n/k^i)² = n² Σ (1/k^{2i})
                             = n² · (1 / (1 - 1/k²))  (geometric series)
                             ≈ O(n²)

With SIMD Acceleration: O(n²/W) where W = SIMD width (8-16)

Effective Complexity: O(n log n) amortized

7.3 SIMD Speedup

Without SIMD:

  • Process 1 element per cycle

With AVX-512 (16× f32):

  • Process 16 elements per cycle
  • Theoretical 16× speedup

Practical Speedup (accounting for memory bandwidth, overhead):

  • Entropy: 8-12×
  • MI: 6-10×
  • Matrix operations: 10-14×

Overall: 8-12× faster with SIMD


8. Numerical Stability

8.1 Common Issues

1. Log of Zero:

log₂(0) = -∞

Solution: Add small epsilon

H = -np.sum(p * np.log2(p + 1e-10))

2. Division by Zero:

MI = log₂(p(x,y) / (p(x)·p(y)))

Solution: Clip probabilities

p_xy_safe = np.clip(p_xy, 1e-10, 1.0)
p_x_safe = np.clip(p_x, 1e-10, 1.0)
p_y_safe = np.clip(p_y, 1e-10, 1.0)
mi = np.log2(p_xy_safe / (p_x_safe * p_y_safe))

3. Floating-Point Underflow:

exp(-1000) = 0  (underflows)

Solution: Log-space arithmetic

log_p = log_sum_exp([log_p1, log_p2, ...])

8.2 Robust Implementations

Entropy:

fn entropy_robust(probs: &[f32]) -> f32 {
    probs.iter()
        .filter(|&&p| p > 1e-10)  // Skip near-zero
        .map(|&p| -p * p.log2())
        .sum()
}

Mutual Information:

fn mutual_information_robust(p_xy: &[f32], p_x: &[f32], p_y: &[f32]) -> f32 {
    let mut mi = 0.0;
    for i in 0..p_x.len() {
        for j in 0..p_y.len() {
            let idx = i * p_y.len() + j;
            let joint = p_xy[idx].max(1e-10);
            let marginal = (p_x[i] * p_y[j]).max(1e-10);
            mi += joint * (joint / marginal).log2();
        }
    }
    mi
}

9. Validation and Testing

9.1 Synthetic Test Cases

Test 1: Deterministic System

Transition: State i → State (i+1) mod n
Expected: EI = log₂(n), Φ ≈ log₂(n)

Test 2: Random System

Transition: Uniform random
Expected: EI = 0, Φ = 0

Test 3: Modular System

Two independent subsystems
Expected: Φ = 0 (reducible)

Test 4: Hierarchical System

Macro-level has higher EI than micro
Expected: Causal emergence detected

9.2 Neuroscience Datasets

1. Anesthesia EEG:

  • Source: Cambridge anesthesia database
  • Expected: Ψ drops during loss of consciousness

2. Sleep Stages:

  • Source: Physionet sleep recordings
  • Expected: Ψ highest in REM/wake, lowest in deep sleep

3. Disorders of Consciousness:

  • Source: DOC patients (VS, MCS, EMCS)
  • Expected: Ψ correlates with CRS-R scores

9.3 Unit Tests

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_ei_deterministic() {
        let n = 16;
        let mut t = vec![0.0; n * n];
        // Cyclic transition
        for i in 0..n {
            t[i * n + ((i + 1) % n)] = 1.0;
        }
        let ei = compute_ei_simd(&t);
        assert!((ei - (n as f32).log2()).abs() < 0.01);
    }

    #[test]
    fn test_ei_random() {
        let n = 16;
        let mut t = vec![1.0 / n as f32; n * n];
        let ei = compute_ei_simd(&t);
        assert!(ei < 0.01);  // Should be ~0
    }

    #[test]
    fn test_phi_independent() {
        // Two independent subsystems
        let t = build_independent_system(8, 8);
        let phi = approximate_phi_simd(&t, 16);
        assert!(phi < 0.1);  // Should be near-zero
    }
}

10. Summary of Key Formulas

Information Theory

Entropy:            H(X) = -Σ p(x) log₂ p(x)
Mutual Info:        I(X;Y) = H(X) + H(Y) - H(X,Y)
Conditional MI:     I(X;Y|Z) = H(X|Z) - H(X|Y,Z)
KL Divergence:      D_KL(P||Q) = Σ P(x) log₂[P(x)/Q(x)]

Causal Measures

Effective Info:     EI = I(S(t); S(t+1)) under uniform S(t)
Transfer Entropy:   TE_{X→Y} = I(Y_{t+1}; X_t^k | Y_t^l)
Integrated Info:    Φ = min_{partition} D_KL(P^full || P^cut)

HCC Metric

Consciousness:      Ψ(s) = EI(s) · Φ(s) · √(TE↑(s) · TE↓(s))
Optimal Scale:      s* = argmax_s Ψ(s)
Conscious iff:      Ψ(s*) > θ

Complexity

Naive:              O(2^n) for Φ, O(n²) for EI/TE
Hierarchical:       O(n log n) across all scales
SIMD:               8-16× speedup on modern CPUs

References

  1. Shannon (1948): "A Mathematical Theory of Communication" — entropy foundations
  2. Cover & Thomas (2006): "Elements of Information Theory" — MI, KL divergence
  3. Schreiber (2000): "Measuring Information Transfer" — transfer entropy
  4. Barnett et al. (2009): "Granger Causality and Transfer Entropy are Equivalent for Gaussian Variables"
  5. Tononi et al. (2016): "Integrated Information Theory of Consciousness" — Φ definition
  6. Hoel et al. (2013, 2025): "Quantifying Causal Emergence" — effective information
  7. Oizumi et al. (2014): "From the Phenomenology to the Mechanisms of Consciousness: IIT 3.0"

Document Status: Mathematical Specification v1.0 Implementation: See /src/ for Rust code Next: Implement and benchmark algorithms Contact: Submit issues to RuVector repository