Files
wifi-densepose/docs/research/gnn-v2/25-self-organizing-morphogenetic-nets.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

18 KiB

Axis 5: Self-Organizing Morphogenetic Networks

Document: 25 of 30 Series: Graph Transformers: 2026-2036 and Beyond Last Updated: 2026-02-25 Status: Research Prospectus


1. Problem Statement

Current graph transformers have fixed architectures: the number of nodes, edges, layers, and attention heads is determined before training and remains constant during inference. Biological neural systems, by contrast, grow, prune, specialize, and reorganize throughout their lifetime. The brain develops from a single cell to 86 billion neurons through a developmental program encoded in DNA.

The self-organizing axis asks: can graph transformers grow their own architecture?

1.1 The Architecture Search Problem

Current approaches to architecture search (NAS) are external: a controller searches over a space of architectures, trains each candidate, and selects the best. This is:

  • Expensive: Training thousands of candidate architectures
  • Brittle: The search space is hand-designed
  • Static: The architecture cannot adapt after deployment
  • Unbiological: No biological system uses external architecture search

Morphogenetic graph transformers solve this by making architecture growth intrinsic to the computation.

1.2 RuVector Baseline

  • ruvector-nervous-system: Competitive learning (compete/), plasticity (plasticity/), routing (routing/), Hopfield nets (hopfield/)
  • ruvector-graph: Dynamic graph operations (add/remove nodes, edges), property graph with hyperedges
  • ruvector-gnn: Continual learning via EWC (ewc.rs), replay buffers (replay.rs)
  • ruvector-domain-expansion: Domain expansion mechanisms (a form of self-organization)

2. Morphogenetic Graph Transformers

2.1 The Biological Analogy

Biological development proceeds through:

  1. Cell division: One cell becomes two (node splitting)
  2. Differentiation: Cells specialize based on local signals (attention specialization)
  3. Migration: Cells move to their functional position (graph rewiring)
  4. Apoptosis: Programmed cell death removes unnecessary cells (node pruning)
  5. Synaptogenesis: Neurons form connections based on activity (edge creation)
  6. Synaptic pruning: Unused connections are removed (edge deletion)

We map each biological process to a graph transformer operation.

2.2 Node Division (Mitosis)

When a node v becomes "overloaded" (high information throughput, high gradient magnitude, or high attention diversity), it divides into two daughter nodes v1, v2:

MITOSIS(v):
  1. Create daughter nodes v1, v2
  2. Split features: h_{v1} = h_v + epsilon_1, h_{v2} = h_v + epsilon_2
     (small perturbation to break symmetry)
  3. Distribute edges:
     - Edges to v: assign to v1 or v2 based on attention similarity
     - Edge (u, v): assign to argmax_{i in {1,2}} alpha_{u, vi}
  4. Create sibling edge: (v1, v2) with high initial weight
  5. Remove original node v

Trigger condition:
  divide(v) if:
    information_throughput(v) > theta_divide
    OR gradient_magnitude(v) > theta_grad
    OR attention_entropy(v) > theta_entropy

Complexity per division: O(degree(v) * d) -- proportional to the number of edges being reassigned.

2.3 Node Differentiation

After division, daughter nodes differentiate by specializing their attention patterns:

DIFFERENTIATE(v1, v2):
  // Over T time steps, v1 and v2 develop different attention profiles

  For t = 1 to T:
    // Competitive Hebbian learning between siblings
    if alpha_{u, v1} > alpha_{u, v2} for neighbor u:
      w_{u, v1} += eta * alpha_{u, v1}
      w_{u, v2} -= eta * alpha_{u, v2}   // Competitive inhibition

    // v1 becomes specialist for one set of neighbors
    // v2 becomes specialist for the complementary set

RuVector connection: This directly extends ruvector-nervous-system/src/compete/ competitive learning mechanisms.

2.4 Node Apoptosis (Programmed Death)

Underutilized nodes are removed:

APOPTOSIS(v):
  Trigger: if attention_received(v) < theta_min for T_grace consecutive steps

  1. Redistribute v's information to neighbors:
     For each neighbor u:
       h_u += (alpha_{v,u} / sum_{w in N(v)} alpha_{v,w}) * h_v
  2. Reconnect v's neighbors:
     For each pair (u, w) both in N(v):
       if not edge(u, w):
         add_edge(u, w, weight = alpha_{v,u} * alpha_{v,w})
  3. Remove v and all its edges

2.5 Edge Growth and Pruning

Synaptogenesis (edge creation):

For each pair (u, v) not connected:
  Compute predicted utility:
    utility(u, v) = |h_u . h_v| / (||h_u|| * ||h_v||)  // Cosine similarity
                    + beta * shared_neighbors(u, v) / max_degree
  If utility(u, v) > theta_synapse:
    add_edge(u, v, weight = utility(u, v))

Synaptic pruning (edge deletion):

For each edge (u, v):
  If attention_weight(u, v) < theta_prune for T_prune steps:
    remove_edge(u, v)

2.6 The Morphogenetic Program

All operations are governed by a learned "genetic program" -- a small regulatory network that controls growth:

Morphogenetic Controller:

Inputs:
  - Local features: h_v, gradient(v), loss_contribution(v)
  - Neighborhood signals: mean(h_u for u in N(v)), attention_entropy(v)
  - Global signals: total_nodes, total_loss, epoch

Outputs (per node):
  - p_divide: probability of division [0, 1]
  - p_differentiate: probability of specialization [0, 1]
  - p_apoptosis: probability of death [0, 1]
  - p_synapse_grow: probability of new edge [0, 1]
  - p_synapse_prune: probability of edge removal [0, 1]

Architecture:
  Small MLP (3 layers, 64 hidden units)
  Trained end-to-end with the main graph transformer

RuVector trait design:

/// Morphogenetic graph transformer
pub trait MorphogeneticGraphTransformer {
    /// Execute one developmental step
    fn develop(
        &mut self,
        graph: &mut DynamicPropertyGraph,
        features: &mut DynamicTensor,
        controller: &MorphogeneticController,
    ) -> Result<DevelopmentReport, MorphError>;

    /// Get current architecture statistics
    fn architecture_stats(&self) -> ArchitectureStats;

    /// Freeze architecture (stop growth)
    fn freeze(&mut self);

    /// Resume growth
    fn unfreeze(&mut self);
}

pub struct DevelopmentReport {
    pub nodes_divided: Vec<(NodeId, NodeId, NodeId)>,  // (parent, child1, child2)
    pub nodes_differentiated: Vec<NodeId>,
    pub nodes_removed: Vec<NodeId>,
    pub edges_created: Vec<(NodeId, NodeId)>,
    pub edges_removed: Vec<(NodeId, NodeId)>,
    pub total_nodes_after: usize,
    pub total_edges_after: usize,
}

pub struct ArchitectureStats {
    pub total_nodes: usize,
    pub total_edges: usize,
    pub avg_degree: f64,
    pub max_degree: usize,
    pub num_connected_components: usize,
    pub spectral_gap: f64,
    pub avg_attention_entropy: f64,
    pub growth_rate: f64,  // nodes per step
}

pub struct MorphogeneticController {
    /// Regulatory network
    network: SmallMLP,
    /// Division threshold
    theta_divide: f32,
    /// Apoptosis threshold
    theta_apoptosis: f32,
    /// Synapse growth threshold
    theta_synapse: f32,
    /// Pruning threshold
    theta_prune: f32,
    /// Maximum allowed nodes
    max_nodes: usize,
    /// Minimum allowed nodes
    min_nodes: usize,
}

3. Autopoietic Graph Transformers

3.1 Autopoiesis: Self-Creating Networks

Autopoiesis (Maturana & Varela, 1973) describes systems that produce and maintain themselves. An autopoietic graph transformer is one where:

  1. The graph transformer produces its own components (nodes, edges, attention weights)
  2. The components interact to produce the transformer (self-referential)
  3. The system maintains its organizational identity despite continuous component replacement

3.2 Self-Producing Attention

In an autopoietic graph transformer, the attention mechanism produces the graph structure that defines the attention mechanism:

Cycle:
  1. Graph G defines attention: alpha = Attention(X, G)
  2. Attention defines new graph: G' = ReconstructGraph(alpha)
  3. New graph defines new attention: alpha' = Attention(X, G')
  4. ...

Fixed point: G* such that ReconstructGraph(Attention(X, G*)) = G*

Finding the fixed point:

Input: Initial graph G_0, features X
Output: Autopoietic fixed-point graph G*

G = G_0
for t = 1 to max_iter:
  // Compute attention on current graph
  alpha = GraphAttention(X, G)

  // Reconstruct graph from attention
  G_new = TopK(alpha, k=avg_degree)  // Keep top-k attention weights as edges

  // Check convergence
  if GraphDistance(G, G_new) < epsilon:
    return G_new

  // Update with momentum
  G = (1 - beta) * G + beta * G_new

return G  // May not have converged

3.3 Component Replacement

An autopoietic system continuously replaces its components. In graph transformer terms:

At each time step:
  1. Select random fraction p of nodes for replacement
  2. For each selected node v:
     - Generate replacement features: h_v' = Generator(context(v))
     - context(v) = {h_u : u in N(v)} union {alpha_{uv} : u in N(v)}
  3. The network must maintain its function despite replacement

Training objective:
  L = TaskLoss(output) + lambda * ReconstructionLoss(replaced_nodes)

Key property: If the autopoietic graph transformer maintains performance despite continuous component replacement, it has truly learned the organization, not just the specific parameters.


4. Neural Cellular Automata on Graphs

4.1 Graph Neural Cellular Automata (GNCA)

Neural Cellular Automata (NCA) use local rules to produce emergent global behavior. On graphs, each node updates based only on its neighborhood:

h_v(t+1) = Update(h_v(t), Aggregate({h_u(t) : u in N(v)}))

The Update and Aggregate functions are learned, but the same functions are applied at every node (weight sharing).

Properties:

  • Scalability: O(n * avg_degree * d) per step -- linear in graph size
  • Robustness: Local rules are inherently fault-tolerant (damage is local)
  • Emergence: Complex global patterns from simple local rules
  • Self-repair: Damaged regions regenerate from surrounding healthy nodes

4.2 Self-Repairing Graph Attention

Damage Protocol:
  1. Remove fraction p of nodes (simulate failure)
  2. Observe: remaining nodes detect damage via missing messages
  3. Repair: surviving nodes adjust attention to compensate

Repair mechanism:
  For each node v that detects missing neighbor u:
    1. Estimate u's contribution: h_u_hat = mean(h_w for w in N(u) - {v})
    2. Create virtual node u' with estimated features
    3. Gradually grow real replacement via morphogenetic program

Self-repair attention:
  alpha_{v,u}^{repair} = alpha_{v,u} * alive(u)
                        + alpha_{v,u} * (1 - alive(u)) * reconstruct_weight(v, u)

4.3 Emergent Specialization

When GNCA runs on a graph for many steps, nodes naturally specialize into roles:

Observed emergent roles:
  - Hub nodes: High degree, diffuse attention (broadcast information)
  - Leaf nodes: Low degree, focused attention (specialize in subtasks)
  - Bridge nodes: Connect communities, high betweenness centrality
  - Memory nodes: Stable embeddings that store persistent information
  - Signal nodes: Oscillating embeddings that propagate temporal patterns

The morphogenetic controller can be trained to encourage or regulate this specialization.


5. Developmental Programs for Architecture Growth

5.1 Gene Regulatory Networks (GRN) for Graph Transformers

In biology, development is controlled by gene regulatory networks -- networks of transcription factors that activate or repress genes. We propose using GRNs to control graph transformer architecture:

GRN for graph transformer development:

Genes (outputs):
  - growth_factor: controls node division rate
  - differentiation_signal: controls specialization
  - apoptosis_signal: controls cell death
  - synapse_factor: controls edge creation
  - pruning_factor: controls edge deletion

Regulation (inputs):
  - local_activity: node's recent attention activity
  - neighbor_signals: morphogen concentrations from neighbors
  - global_signals: broadcast from the "body" (whole graph)
  - gradient_signals: loss gradient at this node
  - age: how many steps since this node was created

GRN dynamics:
  dg_i/dt = sigma(sum_j W_{ij} * g_j + b_i) - decay_i * g_i
  // g_i is gene i's expression level
  // W_{ij} is regulation weight (positive = activation, negative = repression)
  // sigma is sigmoid activation

5.2 Morphogen Gradients

Morphogens are signaling molecules that form concentration gradients, providing positional information to cells. In graph transformers:

Morphogen diffusion on graph:
  dc_v/dt = D * sum_{u in N(v)} (c_u - c_v) / |N(v)| - decay * c_v + source(v)

  D: diffusion coefficient
  decay: degradation rate
  source(v): production rate at node v

Positional information from morphogen:
  position_v = (c_1(v), c_2(v), ..., c_M(v))
  // M different morphogens give M-dimensional positional coordinates

Application: Morphogen-derived positions can replace or augment positional encodings in graph transformers. Unlike hand-crafted positional encodings (random walk, Laplacian eigenvectors), morphogen positions are learned and adaptive.

5.3 Developmental Stages

Graph transformer development can proceed in stages, analogous to embryonic development:

Stage 1: Blastula (steps 0-100)
  - Start with small graph (10-100 nodes)
  - Rapid node division
  - Uniform, undifferentiated nodes
  - No pruning

Stage 2: Gastrulation (steps 100-500)
  - Morphogen gradients establish axes
  - Nodes begin differentiating
  - Three "germ layers" emerge:
    - Ectoderm: attention (surface processing)
    - Mesoderm: message passing (structural)
    - Endoderm: memory (internal storage)

Stage 3: Organogenesis (steps 500-2000)
  - Specialized modules form
  - Edge pruning removes unnecessary connections
  - Modules develop distinct attention patterns
  - Architecture approaches final form

Stage 4: Maturation (steps 2000+)
  - Fine-tuning of weights (no more architectural changes)
  - Synaptic refinement
  - Performance optimization

6. Complexity Analysis

6.1 Growth Dynamics

Theorem. Under the morphogenetic program with division probability p_div and apoptosis probability p_apo, the expected number of nodes at time t is:

E[n(t)] = n(0) * exp((p_div - p_apo) * t)

For a stable architecture, we need p_div = p_apo (zero growth rate) at equilibrium.

Steady-state analysis. At equilibrium:

  • Division rate: R_div = n * p_div(loss, architecture)
  • Death rate: R_apo = n * p_apo(loss, architecture)
  • Equilibrium: R_div = R_apo implies p_div = p_apo
  • Stability: d(p_div - p_apo)/dn < 0 (negative feedback)

6.2 Computational Overhead of Morphogenesis

Operation Cost per event Expected events per step
Node division O(degree(v) * d) O(n * p_div)
Node apoptosis O(degree(v)^2 * d) O(n * p_apo)
Edge creation O(d) O(n * p_synapse)
Edge pruning O(1) O(
Controller inference O(n * d_controller) n (every node, every step)

Total overhead per step: O(n * (avg_degree * d * (p_div + p_apo) + d_controller))

For p_div = p_apo = 0.01 and d_controller = 64: ~2% overhead on top of standard graph transformer forward pass.


7. Projections

7.1 By 2030

Likely:

  • Neural cellular automata on graphs achieving competitive results on graph tasks
  • Simple morphogenetic programs (division + pruning) improving architecture efficiency
  • Self-repairing graph attention demonstrated for fault-tolerant applications

Possible:

  • GRN-controlled graph transformer development matching NAS quality at 100x lower cost
  • Autopoietic graph transformers maintaining function despite continuous component replacement
  • Morphogen-based positional encodings outperforming hand-crafted alternatives

Speculative:

  • Graph transformers that grow from a single node to a full architecture
  • Developmental programs discovered by evolution (genetic algorithms over GRN parameters)

7.2 By 2033

Likely:

  • Morphogenetic graph transformers as standard tool for adaptive architectures
  • Self-organizing graph attention for continual learning (grow new capacity for new tasks)

Possible:

  • Multi-organism graph transformers: separate developmental programs interacting
  • Morphogenetic graph transformers on neuromorphic hardware (biological development on biological hardware)

7.3 By 2036+

Possible:

  • Artificial embryogenesis: graph transformers that develop like organisms
  • Self-evolving graph transformers: mutation + selection over developmental programs

Speculative:

  • Open-ended evolution of graph transformer architectures
  • Graph transformers that reproduce: one network spawns a new network

8. RuVector Implementation Roadmap

Phase 1: Cellular Automata Foundation (2026-2027)

  • Implement GNCA layer in ruvector-gnn
  • Add dynamic graph operations to ruvector-graph (node/edge add/remove during forward pass)
  • Self-repair experiments on graph attention

Phase 2: Morphogenetic Programs (2027-2028)

  • Morphogenetic controller using ruvector-nervous-system competitive learning
  • Node division, differentiation, apoptosis operations
  • GRN implementation for developmental control
  • Integration with ruvector-gnn EWC for continual learning during growth

Phase 3: Autopoiesis (2028-2030)

  • Autopoietic fixed-point computation
  • Component replacement training
  • Morphogen diffusion on graphs
  • Developmental staging system

References

  1. Mordvintsev et al., "Growing Neural Cellular Automata," Distill 2020
  2. Maturana & Varela, "Autopoiesis and Cognition," 1980
  3. Turing, "The Chemical Basis of Morphogenesis," 1952
  4. Wolpert, "Positional Information and the Spatial Pattern of Cellular Differentiation," 1969
  5. Stanley & Miikkulainen, "A Taxonomy for Artificial Embryogeny," Artificial Life 2003
  6. Grattarola et al., "Learning Graph Cellular Automata," NeurIPS 2021

End of Document 25

Next: Doc 26 - Formal Verification: Proof-Carrying GNN