wifi-densepose/docs/research/gnn-v2/25-self-organizing-graph-transformers.md

# Feature 25: Self-Organizing Graph Transformers

## Overview

### Problem Statement

Current graph transformers operate on fixed, manually designed topologies. The graph structure is either given as input (e.g., molecule graphs, social networks) or constructed once via nearest-neighbor heuristics (e.g., HNSW). In either case, the topology is static during inference and training: it does not grow, differentiate, or reorganize in response to the data distribution. This rigidity creates three fundamental bottlenecks:

1. **Topology-data mismatch**: A graph constructed for one data distribution becomes suboptimal as the distribution shifts.
2. **No specialization**: Every node and edge in the graph plays the same generic role -- there is no mechanism for nodes to develop distinct functional identities.
3. **No self-repair**: When parts of the graph become corrupted or irrelevant, there is no process for replacing or regenerating damaged regions.

Biology solved these problems billions of years ago. Morphogenesis builds complex structures from simple rules. Embryonic development differentiates a single cell into hundreds of specialized types. Autopoiesis maintains living systems by continuously rebuilding their own components. These principles have been largely ignored in graph neural network design.

### Proposed Solution

Self-Organizing Graph Transformers (SOGTs) are graph attention networks that grow, differentiate, and maintain their own topology through biologically-inspired developmental programs. The approach has three pillars:

1. **Morphogenetic Graph Networks**: Turing pattern formation on graphs drives reaction-diffusion attention, creating spatially structured activation patterns that guide message passing and edge formation.
2. **Developmental Graph Programs**: Graph grammars encode growth rules as L-system productions. Generic seed nodes differentiate into specialized types (hub nodes, boundary nodes, relay nodes) through a developmental program conditioned on local graph statistics.
3. **Autopoietic Graph Transformers**: The network continuously rebuilds its own topology -- pruning dead edges, spawning new nodes, and adjusting attention weights -- to maintain a target coherence level, analogous to homeostasis in living systems.

### Expected Benefits

- **Adaptive Topology**: 30-50% improvement in retrieval quality on distribution-shifting workloads
- **Self-Specialization**: Nodes develop distinct roles (hub, boundary, relay) reducing routing overhead by 40-60%
- **Self-Repair**: Automatic recovery from node/edge corruption with <5% transient degradation
- **Architecture Search**: Morphogenetic NAS discovers attention patterns 10x faster than random search
- **Emergent Computation**: Local attention rules give rise to global computational patterns (sorting, clustering, routing)

### Novelty Claim

**Unique Contribution**: First graph transformer architecture that grows its own topology through morphogenetic, developmental, and autopoietic processes. Unlike neural architecture search (which optimizes a fixed search space), SOGTs develop continuously through biologically-grounded growth rules that operate at runtime.

**Differentiators**:
1. Reaction-diffusion attention creates Turing patterns on graphs for structured activation
2. L-system graph grammars encode developmental programs for node specialization
3. Autopoietic maintenance loop continuously rebuilds topology to maintain coherence
4. Cellular automata attention rules produce emergent global computation from local rules
5. Morphogenetic NAS discovers novel attention architectures through growth processes

---

## Biological Foundations

### Morphogenesis and Turing Patterns

Alan Turing's 1952 paper "The Chemical Basis of Morphogenesis" demonstrated that two diffusing chemicals (an activator and an inhibitor) with different diffusion rates can spontaneously form stable spatial patterns: spots, stripes, and spirals. These reaction-diffusion systems explain leopard spots, zebrafish stripes, and fingerprint ridges.

On a graph, the Turing instability generalizes naturally. Each node holds concentrations of an activator `a` and inhibitor `h`. The dynamics follow the graph Laplacian:

```
da/dt = f(a, h) + D_a * L * a
dh/dt = g(a, h) + D_h * L * h
```

where `L` is the graph Laplacian, `D_h >> D_a` (inhibitor diffuses faster), and `f`, `g` encode local reaction kinetics. The key insight is that **Turing patterns on graphs create natural attention masks**: regions of high activator concentration attend to each other, while inhibitor barriers create boundaries between attention clusters.

### Embryonic Development and Differentiation

A single fertilized cell becomes a human body with 200+ cell types through a developmental program. Key principles:

- **Positional information**: Cells read chemical gradients to determine their position and fate.
- **Inductive signaling**: Cells signal neighbors to change type.
- **Competence windows**: Cells can only respond to certain signals during specific developmental stages.
- **Canalization**: Development is robust to perturbations -- the same endpoint is reached from varied starting conditions.

For graph transformers, these principles translate to: nodes read local graph statistics (degree, centrality, neighborhood composition) to determine their functional role; they signal neighbors through message passing to coordinate specialization; and developmental stages gate which transformations are available at each growth step.

### Autopoiesis and Self-Maintenance

Autopoiesis (Maturana and Varela, 1972) describes systems that continuously produce and replace their own components. A living cell is autopoietic: it synthesizes the membrane that bounds it, the enzymes that catalyze reactions, and the DNA that encodes those enzymes. The system maintains itself through circular causality.

For graph transformers, autopoiesis means: the attention mechanism produces the topology that shapes the attention mechanism. Dead edges are pruned. Overloaded nodes are split. Missing connections are grown. The graph maintains a target coherence level (measurable via `ruvector-coherence`) through continuous self-modification.

---

## Technical Design

### Architecture Diagram

```
                      Data Distribution
                             |
                    +--------v--------+
                    |   Seed Graph    |
                    |  (initial K     |
                    |   nodes)        |
                    +--------+--------+
                             |
              +--------------+--------------+
              |              |              |
     +--------v-------+ +---v----+ +-------v--------+
     | Morphogenetic  | | Devel- | | Autopoietic    |
     | Field Engine   | | opment | | Maintenance    |
     |                | | Program| | Loop           |
     | Turing pattern | | L-sys  | | Coherence-     |
     | on graph       | | grammar| | gated rebuild  |
     +--------+-------+ +---+----+ +-------+--------+
              |              |              |
              +------+-------+------+-------+
                     |              |
              +------v------+ +----v-------+
              | Topology    | | Node Type  |
              | Growth      | | Specialize |
              | (new edges/ | | (hub/relay/|
              |  nodes)     | |  boundary) |
              +------+------+ +----+-------+
                     |              |
                     +------+-------+
                            |
                   +--------v--------+
                   | Self-Organizing |
                   | Graph Attention |
                   | Layer           |
                   +--------+--------+
                            |
                   +--------v--------+
                   | Query / Embed   |
                   | / Route         |
                   +-----------------+


Morphogenetic Field Detail:

  Node Activator (a)     Node Inhibitor (h)
  +---+---+---+---+     +---+---+---+---+
  |0.9|0.1|0.8|0.2|     |0.1|0.8|0.2|0.9|
  +---+---+---+---+     +---+---+---+---+
  |0.2|0.7|0.1|0.9|     |0.7|0.2|0.8|0.1|
  +---+---+---+---+     +---+---+---+---+

  Attention Mask = sigma(a - threshold)
  High-a nodes form attention clusters
  High-h boundaries separate clusters
```

### Core Data Structures

```rust
/// Configuration for Self-Organizing Graph Transformer
#[derive(Debug, Clone)]
pub struct SelfOrganizingConfig {
    /// Initial seed graph size
    pub seed_nodes: usize,

    /// Maximum graph size (growth limit)
    pub max_nodes: usize,

    /// Embedding dimension
    pub embed_dim: usize,

    /// Morphogenetic field parameters
    pub morpho: MorphogeneticConfig,

    /// Developmental program parameters
    pub development: DevelopmentalConfig,

    /// Autopoietic maintenance parameters
    pub autopoiesis: AutopoieticConfig,

    /// Growth phase schedule
    pub phases: Vec<GrowthPhase>,
}

/// Morphogenetic field configuration (Turing patterns on graphs)
#[derive(Debug, Clone)]
pub struct MorphogeneticConfig {
    /// Activator diffusion rate
    pub d_activator: f32,

    /// Inhibitor diffusion rate (must be > d_activator)
    pub d_inhibitor: f32,

    /// Reaction kinetics: activator self-enhancement rate
    pub rho_a: f32,

    /// Reaction kinetics: inhibitor production rate
    pub rho_h: f32,

    /// Activator decay rate
    pub mu_a: f32,

    /// Inhibitor decay rate
    pub mu_h: f32,

    /// Number of reaction-diffusion steps per forward pass
    pub rd_steps: usize,

    /// Threshold for activator-based attention gating
    pub attention_threshold: f32,
}

impl Default for MorphogeneticConfig {
    fn default() -> Self {
        Self {
            d_activator: 0.01,
            d_inhibitor: 0.1, // 10x faster diffusion
            rho_a: 0.08,
            rho_h: 0.12,
            mu_a: 0.03,
            mu_h: 0.06,
            rd_steps: 10,
            attention_threshold: 0.5,
        }
    }
}

/// Node functional types arising from developmental specialization
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum NodeType {
    /// Undifferentiated seed node
    Stem,
    /// High-degree hub node (routes between clusters)
    Hub,
    /// Cluster boundary node (separates attention groups)
    Boundary,
    /// Internal relay node (local message passing)
    Relay,
    /// Sensory node (interfaces with external data)
    Sensory,
    /// Memory node (long-term information storage)
    Memory,
}

/// Developmental program configuration
#[derive(Debug, Clone)]
pub struct DevelopmentalConfig {
    /// L-system axiom (initial production)
    pub axiom: Vec<NodeType>,

    /// Production rules: (predecessor, condition, successor pattern)
    pub rules: Vec<ProductionRule>,

    /// Maximum developmental steps
    pub max_steps: usize,

    /// Competence window: (min_step, max_step) per rule
    pub competence_windows: Vec<(usize, usize)>,
}

/// A production rule in the developmental graph grammar
#[derive(Debug, Clone)]
pub struct ProductionRule {
    /// Node type that this rule applies to
    pub predecessor: NodeType,

    /// Condition: local graph statistic threshold
    pub condition: DevelopmentalCondition,

    /// Successor: what the node becomes + new nodes spawned
    pub successor: Vec<NodeType>,

    /// Edge pattern for newly created nodes
    pub edge_pattern: EdgePattern,
}

/// Conditions for developmental rule activation
#[derive(Debug, Clone)]
pub enum DevelopmentalCondition {
    /// Node degree exceeds threshold
    DegreeAbove(usize),
    /// Node betweenness centrality exceeds threshold
    CentralityAbove(f32),
    /// Activator concentration exceeds threshold
    ActivatorAbove(f32),
    /// Inhibitor concentration exceeds threshold
    InhibitorAbove(f32),
    /// Neighbor composition: fraction of type T exceeds threshold
    NeighborFraction(NodeType, f32),
    /// Always applies
    Always,
}

/// Edge creation patterns for developmental rules
#[derive(Debug, Clone)]
pub enum EdgePattern {
    /// Connect to parent only
    ParentOnly,
    /// Connect to parent and all parent neighbors
    InheritNeighborhood,
    /// Connect to k nearest nodes by embedding distance
    KNearest(usize),
    /// Connect to nodes with matching activator pattern
    MorphogeneticAffinity,
}

/// Autopoietic maintenance configuration
#[derive(Debug, Clone)]
pub struct AutopoieticConfig {
    /// Target coherence level (from ruvector-coherence)
    pub target_coherence: f32,

    /// Coherence tolerance band (maintain within +/- tolerance)
    pub coherence_tolerance: f32,

    /// Edge pruning threshold: remove edges with attention < threshold
    pub prune_threshold: f32,

    /// Node splitting threshold: split nodes with degree > threshold
    pub split_degree_threshold: usize,

    /// Edge growth rate: max new edges per maintenance cycle
    pub max_new_edges_per_cycle: usize,

    /// Maintenance cycle interval (every N forward passes)
    pub cycle_interval: usize,
}

/// Growth phase in the developmental schedule
#[derive(Debug, Clone)]
pub struct GrowthPhase {
    /// Phase name
    pub name: String,

    /// Duration in forward passes
    pub duration: usize,

    /// Which subsystems are active
    pub morpho_active: bool,
    pub development_active: bool,
    pub autopoiesis_active: bool,

    /// Growth rate multiplier
    pub growth_rate: f32,
}
```

### Key Algorithms

#### 1. Morphogenetic Field Update (Reaction-Diffusion on Graph)

```rust
/// Morphogenetic field state for the graph
pub struct MorphogeneticField {
    /// Activator concentration per node
    activator: Vec<f32>,
    /// Inhibitor concentration per node
    inhibitor: Vec<f32>,
    /// Graph Laplacian (sparse)
    laplacian: Vec<(usize, usize, f32)>,
    /// Configuration
    config: MorphogeneticConfig,
}

impl MorphogeneticField {
    /// Run one step of reaction-diffusion on the graph.
    ///
    /// Uses the Gierer-Meinhardt model:
    ///   da/dt = rho_a * (a^2 / h) - mu_a * a + D_a * L * a
    ///   dh/dt = rho_h * a^2 - mu_h * h + D_h * L * h
    fn step(&mut self, dt: f32) {
        let n = self.activator.len();
        let mut da = vec![0.0f32; n];
        let mut dh = vec![0.0f32; n];

        // Reaction kinetics (Gierer-Meinhardt)
        for i in 0..n {
            let a = self.activator[i];
            let h = self.inhibitor[i].max(1e-6); // avoid division by zero
            da[i] += self.config.rho_a * (a * a / h) - self.config.mu_a * a;
            dh[i] += self.config.rho_h * a * a - self.config.mu_h * h;
        }

        // Diffusion via graph Laplacian
        for &(src, dst, weight) in &self.laplacian {
            let diff_a = self.activator[dst] - self.activator[src];
            let diff_h = self.inhibitor[dst] - self.inhibitor[src];
            da[src] += self.config.d_activator * weight * diff_a;
            dh[src] += self.config.d_inhibitor * weight * diff_h;
        }

        // Euler integration
        for i in 0..n {
            self.activator[i] = (self.activator[i] + dt * da[i]).max(0.0);
            self.inhibitor[i] = (self.inhibitor[i] + dt * dh[i]).max(0.0);
        }
    }

    /// Compute attention mask from activator field.
    /// Nodes with activator above threshold attend to each other.
    fn attention_mask(&self) -> Vec<bool> {
        self.activator.iter()
            .map(|&a| a > self.config.attention_threshold)
            .collect()
    }

    /// Compute morphogenetic affinity between two nodes.
    /// Nodes with similar activator/inhibitor ratios have high affinity.
    fn affinity(&self, i: usize, j: usize) -> f32 {
        let ratio_i = self.activator[i] / self.inhibitor[i].max(1e-6);
        let ratio_j = self.activator[j] / self.inhibitor[j].max(1e-6);
        let diff = (ratio_i - ratio_j).abs();
        (-diff * diff).exp() // Gaussian affinity
    }
}
```

#### 2. Developmental Program (L-System Graph Grammar)

```rust
/// Developmental program executor
pub struct DevelopmentalProgram {
    /// Current developmental step
    step: usize,
    /// Production rules
    rules: Vec<ProductionRule>,
    /// Competence windows per rule
    competence: Vec<(usize, usize)>,
    /// Node type assignments
    node_types: Vec<NodeType>,
    /// Graph adjacency (mutable during development)
    adjacency: Vec<Vec<usize>>,
    /// Node embeddings
    embeddings: Vec<Vec<f32>>,
}

impl DevelopmentalProgram {
    /// Execute one developmental step.
    ///
    /// For each node, check if any production rule applies:
    /// 1. The node type matches the rule predecessor
    /// 2. The condition is satisfied
    /// 3. The current step is within the competence window
    ///
    /// If so, apply the rule: change node type and/or spawn new nodes.
    fn develop_step(
        &mut self,
        field: &MorphogeneticField,
        max_nodes: usize,
    ) -> Vec<DevelopmentalEvent> {
        let mut events = Vec::new();
        let current_n = self.node_types.len();

        // Collect applicable rules (avoid borrow conflicts)
        let mut applications: Vec<(usize, usize)> = Vec::new(); // (node_idx, rule_idx)

        for node_idx in 0..current_n {
            for (rule_idx, rule) in self.rules.iter().enumerate() {
                // Check competence window
                let (min_step, max_step) = self.competence[rule_idx];
                if self.step < min_step || self.step > max_step {
                    continue;
                }

                // Check predecessor type
                if self.node_types[node_idx] != rule.predecessor {
                    continue;
                }

                // Check condition
                if self.check_condition(node_idx, &rule.condition, field) {
                    applications.push((node_idx, rule_idx));
                    break; // one rule per node per step
                }
            }
        }

        // Apply rules
        for (node_idx, rule_idx) in applications {
            if self.node_types.len() >= max_nodes {
                break;
            }

            let rule = &self.rules[rule_idx];

            // First element of successor replaces the node's type
            if let Some(&new_type) = rule.successor.first() {
                let old_type = self.node_types[node_idx];
                self.node_types[node_idx] = new_type;
                events.push(DevelopmentalEvent::Differentiate {
                    node: node_idx,
                    from: old_type,
                    to: new_type,
                });
            }

            // Remaining elements spawn new nodes
            for &spawn_type in rule.successor.iter().skip(1) {
                let new_idx = self.node_types.len();
                if new_idx >= max_nodes { break; }

                self.node_types.push(spawn_type);

                // Create embedding as perturbation of parent
                let parent_emb = self.embeddings[node_idx].clone();
                let new_emb = perturb_embedding(&parent_emb, 0.01);
                self.embeddings.push(new_emb);

                // Create edges based on pattern
                let new_edges = match &rule.edge_pattern {
                    EdgePattern::ParentOnly => vec![node_idx],
                    EdgePattern::InheritNeighborhood => {
                        let mut edges = vec![node_idx];
                        edges.extend_from_slice(&self.adjacency[node_idx]);
                        edges
                    }
                    EdgePattern::KNearest(k) => {
                        self.k_nearest(new_idx, *k)
                    }
                    EdgePattern::MorphogeneticAffinity => {
                        self.morpho_nearest(new_idx, field, 4)
                    }
                };

                self.adjacency.push(new_edges.clone());
                for &neighbor in &new_edges {
                    if neighbor < self.adjacency.len() {
                        self.adjacency[neighbor].push(new_idx);
                    }
                }

                events.push(DevelopmentalEvent::Spawn {
                    parent: node_idx,
                    child: new_idx,
                    child_type: spawn_type,
                });
            }
        }

        self.step += 1;
        events
    }

    /// Check whether a developmental condition is satisfied for a node.
    fn check_condition(
        &self,
        node_idx: usize,
        condition: &DevelopmentalCondition,
        field: &MorphogeneticField,
    ) -> bool {
        match condition {
            DevelopmentalCondition::DegreeAbove(threshold) => {
                self.adjacency[node_idx].len() > *threshold
            }
            DevelopmentalCondition::ActivatorAbove(threshold) => {
                field.activator[node_idx] > *threshold
            }
            DevelopmentalCondition::InhibitorAbove(threshold) => {
                field.inhibitor[node_idx] > *threshold
            }
            DevelopmentalCondition::NeighborFraction(target_type, threshold) => {
                let neighbors = &self.adjacency[node_idx];
                if neighbors.is_empty() { return false; }
                let count = neighbors.iter()
                    .filter(|&&n| self.node_types[n] == *target_type)
                    .count();
                (count as f32 / neighbors.len() as f32) > *threshold
            }
            DevelopmentalCondition::CentralityAbove(_threshold) => {
                // Approximated via degree centrality for efficiency
                let degree = self.adjacency[node_idx].len() as f32;
                let max_degree = self.adjacency.iter()
                    .map(|adj| adj.len())
                    .max()
                    .unwrap_or(1) as f32;
                (degree / max_degree) > 0.5
            }
            DevelopmentalCondition::Always => true,
        }
    }
}

/// Events produced by the developmental program
#[derive(Debug, Clone)]
pub enum DevelopmentalEvent {
    /// A node changed its functional type
    Differentiate { node: usize, from: NodeType, to: NodeType },
    /// A new node was spawned
    Spawn { parent: usize, child: usize, child_type: NodeType },
    /// An edge was pruned
    Prune { src: usize, dst: usize },
}

/// Perturb an embedding with small Gaussian noise
fn perturb_embedding(emb: &[f32], scale: f32) -> Vec<f32> {
    emb.iter().enumerate()
        .map(|(i, &v)| {
            // Deterministic pseudo-noise based on index
            let noise = ((i as f32 * 0.618033988) % 1.0 - 0.5) * 2.0 * scale;
            v + noise
        })
        .collect()
}
```

#### 3. Autopoietic Maintenance Loop

```rust
/// Autopoietic maintenance system
pub struct AutopoieticMaintainer {
    config: AutopoieticConfig,
    /// Forward pass counter
    pass_count: usize,
    /// Running coherence history
    coherence_history: Vec<f32>,
}

impl AutopoieticMaintainer {
    /// Execute one maintenance cycle if due.
    ///
    /// Measures current coherence (via ruvector-coherence metrics),
    /// then adjusts topology to stay within the target band.
    fn maybe_maintain(
        &mut self,
        adjacency: &mut Vec<Vec<usize>>,
        node_types: &mut Vec<NodeType>,
        attention_weights: &[Vec<(usize, f32)>],
        embeddings: &[Vec<f32>],
    ) -> Vec<MaintenanceAction> {
        self.pass_count += 1;
        if self.pass_count % self.config.cycle_interval != 0 {
            return Vec::new();
        }

        let mut actions = Vec::new();
        let coherence = self.measure_coherence(attention_weights);
        self.coherence_history.push(coherence);

        let target = self.config.target_coherence;
        let tol = self.config.coherence_tolerance;

        if coherence < target - tol {
            // Coherence too low: grow edges to increase connectivity
            let new_edges = self.grow_edges(adjacency, embeddings);
            actions.extend(new_edges);
        } else if coherence > target + tol {
            // Coherence too high: prune weak edges
            let pruned = self.prune_edges(adjacency, attention_weights);
            actions.extend(pruned);
        }

        // Always check for overloaded nodes
        let splits = self.split_overloaded(adjacency, node_types, embeddings);
        actions.extend(splits);

        actions
    }

    /// Measure coherence as mean attention weight across active edges.
    fn measure_coherence(&self, attention_weights: &[Vec<(usize, f32)>]) -> f32 {
        let mut total_weight = 0.0f32;
        let mut edge_count = 0usize;

        for node_weights in attention_weights {
            for &(_neighbor, weight) in node_weights {
                total_weight += weight;
                edge_count += 1;
            }
        }

        if edge_count == 0 { return 0.0; }
        total_weight / edge_count as f32
    }

    /// Prune edges with attention weight below threshold.
    fn prune_edges(
        &self,
        adjacency: &mut Vec<Vec<usize>>,
        attention_weights: &[Vec<(usize, f32)>],
    ) -> Vec<MaintenanceAction> {
        let mut actions = Vec::new();

        for (src, node_weights) in attention_weights.iter().enumerate() {
            let to_prune: Vec<usize> = node_weights.iter()
                .filter(|&&(_, w)| w < self.config.prune_threshold)
                .map(|&(dst, _)| dst)
                .collect();

            for dst in to_prune {
                adjacency[src].retain(|&n| n != dst);
                actions.push(MaintenanceAction::PruneEdge { src, dst });
            }
        }

        actions
    }

    /// Split nodes whose degree exceeds the threshold.
    fn split_overloaded(
        &self,
        adjacency: &mut Vec<Vec<usize>>,
        node_types: &mut Vec<NodeType>,
        embeddings: &[Vec<f32>],
    ) -> Vec<MaintenanceAction> {
        let mut actions = Vec::new();
        let n = adjacency.len();

        for i in 0..n {
            if adjacency[i].len() > self.config.split_degree_threshold {
                // Split: new node takes half the edges
                let mid = adjacency[i].len() / 2;
                let split_edges: Vec<usize> = adjacency[i].drain(mid..).collect();

                let new_idx = adjacency.len();
                adjacency.push(split_edges.clone());
                node_types.push(node_types[i]);

                // Reconnect transferred edges
                for &neighbor in &split_edges {
                    if neighbor < adjacency.len() {
                        // Replace old -> new in neighbor lists
                        if let Some(pos) = adjacency[neighbor].iter().position(|&n| n == i) {
                            adjacency[neighbor][pos] = new_idx;
                        }
                    }
                }

                // Connect the two halves
                adjacency[i].push(new_idx);
                adjacency[new_idx].push(i);

                actions.push(MaintenanceAction::SplitNode {
                    original: i,
                    new_node: new_idx,
                    edges_transferred: split_edges.len(),
                });
            }
        }

        actions
    }

    /// Grow new edges to increase coherence.
    fn grow_edges(
        &self,
        adjacency: &mut Vec<Vec<usize>>,
        embeddings: &[Vec<f32>],
    ) -> Vec<MaintenanceAction> {
        let mut actions = Vec::new();
        let mut added = 0;

        // Find pairs with high embedding similarity but no edge
        for i in 0..adjacency.len() {
            if added >= self.config.max_new_edges_per_cycle { break; }

            for j in (i + 1)..adjacency.len() {
                if added >= self.config.max_new_edges_per_cycle { break; }
                if adjacency[i].contains(&j) { continue; }

                let sim = cosine_similarity(&embeddings[i], &embeddings[j]);
                if sim > 0.8 {
                    adjacency[i].push(j);
                    adjacency[j].push(i);
                    added += 1;
                    actions.push(MaintenanceAction::GrowEdge { src: i, dst: j, similarity: sim });
                }
            }
        }

        actions
    }
}

/// Actions taken by the autopoietic maintainer
#[derive(Debug, Clone)]
pub enum MaintenanceAction {
    PruneEdge { src: usize, dst: usize },
    GrowEdge { src: usize, dst: usize, similarity: f32 },
    SplitNode { original: usize, new_node: usize, edges_transferred: usize },
}

fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
    let dot: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum();
    let norm_a: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt();
    let norm_b: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
    if norm_a < 1e-8 || norm_b < 1e-8 { return 0.0; }
    dot / (norm_a * norm_b)
}
```

#### 4. Cellular Automata Attention Rules

```rust
/// Cellular automaton rule for graph attention updates.
///
/// Each node updates its attention state based on the attention states
/// of its neighbors, analogous to Conway's Game of Life on a graph.
pub struct CellularAttentionRule {
    /// Birth threshold: node activates if >= birth neighbors are active
    pub birth_threshold: usize,
    /// Survival range: node stays active if neighbors in [lo, hi]
    pub survival_lo: usize,
    pub survival_hi: usize,
    /// Refractory period: steps before reactivation after deactivation
    pub refractory: usize,
}

impl CellularAttentionRule {
    /// Update attention states for all nodes.
    fn update(
        &self,
        states: &mut Vec<CellState>,
        adjacency: &[Vec<usize>],
    ) {
        let n = states.len();
        let old_states: Vec<CellState> = states.clone();

        for i in 0..n {
            let active_neighbors = adjacency[i].iter()
                .filter(|&&j| old_states[j].active)
                .count();

            match &mut states[i] {
                s if s.active => {
                    // Survival check
                    if active_neighbors < self.survival_lo
                        || active_neighbors > self.survival_hi
                    {
                        s.active = false;
                        s.refractory_remaining = self.refractory;
                    }
                }
                s if s.refractory_remaining > 0 => {
                    s.refractory_remaining -= 1;
                }
                s => {
                    // Birth check
                    if active_neighbors >= self.birth_threshold {
                        s.active = true;
                    }
                }
            }
        }
    }
}

#[derive(Debug, Clone)]
pub struct CellState {
    pub active: bool,
    pub refractory_remaining: usize,
}
```

---

## RuVector Integration Points

### Affected Crates/Modules

1. **`ruvector-domain-expansion`**: The `DomainExpansionEngine` already implements cross-domain transfer with `MetaThompsonEngine`. Morphogenetic fields extend this with spatial structure over the domain graph -- each domain node carries activator/inhibitor concentrations that influence the transfer policy selection. The `PolicyKernel` population search can be guided by developmental programs that specialize kernels into domain-specific roles.

2. **`ruvector-attention`**: The existing 18+ attention mechanisms (morphological, topology, sheaf, PDE, transport, curvature, sparse, flash, hyperbolic, MoE) serve as the building blocks that the self-organizing system selects and composes. The `topology/` module's gated attention maps directly to morphogenetic field gating. The `sheaf/` module's restriction maps provide the mathematical framework for boundary-creating attention between differentiated node types.

3. **`ruvector-coherence`**: The coherence engine (`spectral.rs`, `quality.rs`, `metrics.rs`) provides the feedback signal for the autopoietic loop. The target coherence from `AutopoieticConfig` corresponds directly to the spectral coherence thresholds used in the mincut-gated-transformer. Coherence measurements drive the grow/prune/split decisions.

4. **`ruvector-mincut`**: Topology optimization via mincut provides the theoretical foundation for the pruning phase of autopoiesis. The mincut-gated-transformer's `GateController` (energy gates, early exit) directly corresponds to morphogenetic field gating -- both decide which computation paths are active based on a learned signal.

5. **`ruvector-nervous-system`**: The dendritic coincidence detection (`Dendrite`, `DendriticTree`, `PlateauPotential`) maps directly to the developmental differentiation model. Neurons differentiate based on their dendritic input patterns, just as graph nodes differentiate based on local topology. The `plasticity/eprop` module's e-prop learning rule can guide morphogenetic field parameter adaptation. The `GlobalWorkspace` and `OscillatoryRouter` provide the coordination substrate for cellular automata attention.

6. **`ruvector-gnn`**: The core GNN layer (`layer.rs`), training loop (`training.rs`), and elastic weight consolidation (`ewc.rs`) provide the foundation. EWC is essential for developmental programs: when a node differentiates, the weights associated with its old type must be protected via Fisher-information-weighted regularization, preventing catastrophic forgetting of learned representations.

### New Modules to Create

```
ruvector-gnn/src/self_organizing/
  mod.rs
  morphogenetic.rs     # Reaction-diffusion field on graph
  developmental.rs     # L-system graph grammar executor
  autopoietic.rs       # Self-maintenance loop
  cellular_automata.rs # CA-based attention rules
  growth_phase.rs      # Phase scheduling
  metrics.rs           # Growth statistics and visualization
```

---

## Future Roadmap

### 2030: Self-Growing Graph Architectures

By 2030, the developmental program becomes a learned object rather than a hand-designed grammar. The production rules themselves are parameterized by neural networks trained via reinforcement learning on downstream task performance. Key milestones:

- **Learned Growth Rules**: A meta-network predicts which production rule to apply at each developmental step, conditioned on global graph statistics and task loss gradients.
- **Topology-Aware Data Distribution Matching**: The morphogenetic field parameters are optimized so that the resulting attention cluster structure matches the data distribution's intrinsic geometry (e.g., manifold structure, cluster hierarchy).
- **Federated Self-Organization**: Multiple SOGT instances running on different data partitions exchange developmental signals (activator/inhibitor concentrations) to coordinate topology across distributed deployments.
- **Morphogenetic Architecture Search**: Instead of NAS over a fixed search space, the search space itself grows through morphogenetic processes. Novel attention mechanisms emerge as stable Turing patterns on the architecture search graph.

### 2036: Autonomous Graph Systems

By 2036, the self-organizing graph transformer becomes a fully autonomous system that evolves new attention mechanisms through its developmental program:

- **Open-Ended Evolution**: The graph system exhibits open-ended evolution -- it continuously produces novel structures that are not repetitions of previous states. New node types, edge types, and attention mechanisms emerge without human intervention.
- **Developmental Canalization**: The system develops robust developmental trajectories that reliably produce high-performing topologies despite environmental perturbation, analogous to biological canalization.
- **Morphogenetic Memory**: Growth histories are stored as compressed developmental programs (analogous to DNA) that can be replayed, mutated, and recombined for evolutionary search over architectures.
- **Autopoietic Resilience at Scale**: Production graph systems with millions of nodes self-repair within milliseconds of node failure, maintaining 99.999% coherence through continuous autopoietic maintenance without human intervention.

---

## Implementation Phases

### Phase 1: Morphogenetic Fields (3 weeks)
- Implement reaction-diffusion on graph using graph Laplacian
- Integrate Turing pattern attention masking with existing ruvector-attention
- Validate pattern formation on synthetic graphs
- Unit tests for stability and convergence

### Phase 2: Developmental Programs (4 weeks)
- Implement L-system graph grammar with production rules
- Add competence windows and node differentiation
- Integrate with morphogenetic fields for condition checking
- Test developmental trajectories on benchmark graphs

### Phase 3: Autopoietic Maintenance (3 weeks)
- Implement coherence-gated topology maintenance using ruvector-coherence
- Add edge pruning, node splitting, and edge growth
- Integrate with existing HNSW index maintenance
- Stress tests for self-repair under node deletion

### Phase 4: Integration and Evaluation (2 weeks)
- Combine all three subsystems into unified SOGT layer
- Benchmark against static graph transformers on distribution-shifting workloads
- Measure self-repair latency and coherence maintenance
- Document growth phase scheduling heuristics

---

## Success Metrics

| Metric | Target |
|--------|--------|
| Topology Adaptation Speed | <100ms to respond to distribution shift |
| Node Specialization Accuracy | >85% correct functional type assignment |
| Self-Repair Recovery Time | <50ms to recover from 10% node deletion |
| Coherence Maintenance | Within +/-5% of target coherence |
| Retrieval Quality (shifting workload) | 30-50% improvement over static topology |
| Growth Overhead | <15% additional computation per forward pass |
| Morphogenetic Pattern Stability | Converge within 50 reaction-diffusion steps |

---

## Risks and Mitigations

1. **Risk: Uncontrolled Growth**
   - Mitigation: Hard `max_nodes` cap, growth rate limits per phase, energy-based cost for node creation

2. **Risk: Developmental Instability**
   - Mitigation: Canalization through competence windows, EWC-protected weight consolidation during differentiation

3. **Risk: Morphogenetic Pattern Collapse**
   - Mitigation: Validated Turing parameter regimes (D_h/D_a > 5), stochastic perturbation to break symmetry

4. **Risk: Autopoietic Oscillation**
   - Mitigation: Hysteresis in coherence thresholds (different thresholds for grow vs. prune), exponential moving average smoothing

5. **Risk: Performance Overhead**
   - Mitigation: Amortize maintenance over many forward passes, sparse Laplacian operations, early-exit from growth phases when targets are met