Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
529
docs/research/gnn-v2/25-self-organizing-morphogenetic-nets.md
Normal file
529
docs/research/gnn-v2/25-self-organizing-morphogenetic-nets.md
Normal file
@@ -0,0 +1,529 @@
|
||||
# Axis 5: Self-Organizing Morphogenetic Networks
|
||||
|
||||
**Document:** 25 of 30
|
||||
**Series:** Graph Transformers: 2026-2036 and Beyond
|
||||
**Last Updated:** 2026-02-25
|
||||
**Status:** Research Prospectus
|
||||
|
||||
---
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
Current graph transformers have fixed architectures: the number of nodes, edges, layers, and attention heads is determined before training and remains constant during inference. Biological neural systems, by contrast, grow, prune, specialize, and reorganize throughout their lifetime. The brain develops from a single cell to 86 billion neurons through a developmental program encoded in DNA.
|
||||
|
||||
The self-organizing axis asks: can graph transformers grow their own architecture?
|
||||
|
||||
### 1.1 The Architecture Search Problem
|
||||
|
||||
Current approaches to architecture search (NAS) are external: a controller searches over a space of architectures, trains each candidate, and selects the best. This is:
|
||||
- **Expensive**: Training thousands of candidate architectures
|
||||
- **Brittle**: The search space is hand-designed
|
||||
- **Static**: The architecture cannot adapt after deployment
|
||||
- **Unbiological**: No biological system uses external architecture search
|
||||
|
||||
**Morphogenetic graph transformers** solve this by making architecture growth *intrinsic* to the computation.
|
||||
|
||||
### 1.2 RuVector Baseline
|
||||
|
||||
- **`ruvector-nervous-system`**: Competitive learning (`compete/`), plasticity (`plasticity/`), routing (`routing/`), Hopfield nets (`hopfield/`)
|
||||
- **`ruvector-graph`**: Dynamic graph operations (add/remove nodes, edges), property graph with hyperedges
|
||||
- **`ruvector-gnn`**: Continual learning via EWC (`ewc.rs`), replay buffers (`replay.rs`)
|
||||
- **`ruvector-domain-expansion`**: Domain expansion mechanisms (a form of self-organization)
|
||||
|
||||
---
|
||||
|
||||
## 2. Morphogenetic Graph Transformers
|
||||
|
||||
### 2.1 The Biological Analogy
|
||||
|
||||
Biological development proceeds through:
|
||||
1. **Cell division**: One cell becomes two (node splitting)
|
||||
2. **Differentiation**: Cells specialize based on local signals (attention specialization)
|
||||
3. **Migration**: Cells move to their functional position (graph rewiring)
|
||||
4. **Apoptosis**: Programmed cell death removes unnecessary cells (node pruning)
|
||||
5. **Synaptogenesis**: Neurons form connections based on activity (edge creation)
|
||||
6. **Synaptic pruning**: Unused connections are removed (edge deletion)
|
||||
|
||||
We map each biological process to a graph transformer operation.
|
||||
|
||||
### 2.2 Node Division (Mitosis)
|
||||
|
||||
When a node v becomes "overloaded" (high information throughput, high gradient magnitude, or high attention diversity), it divides into two daughter nodes v1, v2:
|
||||
|
||||
```
|
||||
MITOSIS(v):
|
||||
1. Create daughter nodes v1, v2
|
||||
2. Split features: h_{v1} = h_v + epsilon_1, h_{v2} = h_v + epsilon_2
|
||||
(small perturbation to break symmetry)
|
||||
3. Distribute edges:
|
||||
- Edges to v: assign to v1 or v2 based on attention similarity
|
||||
- Edge (u, v): assign to argmax_{i in {1,2}} alpha_{u, vi}
|
||||
4. Create sibling edge: (v1, v2) with high initial weight
|
||||
5. Remove original node v
|
||||
|
||||
Trigger condition:
|
||||
divide(v) if:
|
||||
information_throughput(v) > theta_divide
|
||||
OR gradient_magnitude(v) > theta_grad
|
||||
OR attention_entropy(v) > theta_entropy
|
||||
```
|
||||
|
||||
**Complexity per division:** O(degree(v) * d) -- proportional to the number of edges being reassigned.
|
||||
|
||||
### 2.3 Node Differentiation
|
||||
|
||||
After division, daughter nodes differentiate by specializing their attention patterns:
|
||||
|
||||
```
|
||||
DIFFERENTIATE(v1, v2):
|
||||
// Over T time steps, v1 and v2 develop different attention profiles
|
||||
|
||||
For t = 1 to T:
|
||||
// Competitive Hebbian learning between siblings
|
||||
if alpha_{u, v1} > alpha_{u, v2} for neighbor u:
|
||||
w_{u, v1} += eta * alpha_{u, v1}
|
||||
w_{u, v2} -= eta * alpha_{u, v2} // Competitive inhibition
|
||||
|
||||
// v1 becomes specialist for one set of neighbors
|
||||
// v2 becomes specialist for the complementary set
|
||||
```
|
||||
|
||||
**RuVector connection:** This directly extends `ruvector-nervous-system/src/compete/` competitive learning mechanisms.
|
||||
|
||||
### 2.4 Node Apoptosis (Programmed Death)
|
||||
|
||||
Underutilized nodes are removed:
|
||||
|
||||
```
|
||||
APOPTOSIS(v):
|
||||
Trigger: if attention_received(v) < theta_min for T_grace consecutive steps
|
||||
|
||||
1. Redistribute v's information to neighbors:
|
||||
For each neighbor u:
|
||||
h_u += (alpha_{v,u} / sum_{w in N(v)} alpha_{v,w}) * h_v
|
||||
2. Reconnect v's neighbors:
|
||||
For each pair (u, w) both in N(v):
|
||||
if not edge(u, w):
|
||||
add_edge(u, w, weight = alpha_{v,u} * alpha_{v,w})
|
||||
3. Remove v and all its edges
|
||||
```
|
||||
|
||||
### 2.5 Edge Growth and Pruning
|
||||
|
||||
**Synaptogenesis (edge creation):**
|
||||
```
|
||||
For each pair (u, v) not connected:
|
||||
Compute predicted utility:
|
||||
utility(u, v) = |h_u . h_v| / (||h_u|| * ||h_v||) // Cosine similarity
|
||||
+ beta * shared_neighbors(u, v) / max_degree
|
||||
If utility(u, v) > theta_synapse:
|
||||
add_edge(u, v, weight = utility(u, v))
|
||||
```
|
||||
|
||||
**Synaptic pruning (edge deletion):**
|
||||
```
|
||||
For each edge (u, v):
|
||||
If attention_weight(u, v) < theta_prune for T_prune steps:
|
||||
remove_edge(u, v)
|
||||
```
|
||||
|
||||
### 2.6 The Morphogenetic Program
|
||||
|
||||
All operations are governed by a learned "genetic program" -- a small regulatory network that controls growth:
|
||||
|
||||
```
|
||||
Morphogenetic Controller:
|
||||
|
||||
Inputs:
|
||||
- Local features: h_v, gradient(v), loss_contribution(v)
|
||||
- Neighborhood signals: mean(h_u for u in N(v)), attention_entropy(v)
|
||||
- Global signals: total_nodes, total_loss, epoch
|
||||
|
||||
Outputs (per node):
|
||||
- p_divide: probability of division [0, 1]
|
||||
- p_differentiate: probability of specialization [0, 1]
|
||||
- p_apoptosis: probability of death [0, 1]
|
||||
- p_synapse_grow: probability of new edge [0, 1]
|
||||
- p_synapse_prune: probability of edge removal [0, 1]
|
||||
|
||||
Architecture:
|
||||
Small MLP (3 layers, 64 hidden units)
|
||||
Trained end-to-end with the main graph transformer
|
||||
```
|
||||
|
||||
**RuVector trait design:**
|
||||
|
||||
```rust
|
||||
/// Morphogenetic graph transformer
|
||||
pub trait MorphogeneticGraphTransformer {
|
||||
/// Execute one developmental step
|
||||
fn develop(
|
||||
&mut self,
|
||||
graph: &mut DynamicPropertyGraph,
|
||||
features: &mut DynamicTensor,
|
||||
controller: &MorphogeneticController,
|
||||
) -> Result<DevelopmentReport, MorphError>;
|
||||
|
||||
/// Get current architecture statistics
|
||||
fn architecture_stats(&self) -> ArchitectureStats;
|
||||
|
||||
/// Freeze architecture (stop growth)
|
||||
fn freeze(&mut self);
|
||||
|
||||
/// Resume growth
|
||||
fn unfreeze(&mut self);
|
||||
}
|
||||
|
||||
pub struct DevelopmentReport {
|
||||
pub nodes_divided: Vec<(NodeId, NodeId, NodeId)>, // (parent, child1, child2)
|
||||
pub nodes_differentiated: Vec<NodeId>,
|
||||
pub nodes_removed: Vec<NodeId>,
|
||||
pub edges_created: Vec<(NodeId, NodeId)>,
|
||||
pub edges_removed: Vec<(NodeId, NodeId)>,
|
||||
pub total_nodes_after: usize,
|
||||
pub total_edges_after: usize,
|
||||
}
|
||||
|
||||
pub struct ArchitectureStats {
|
||||
pub total_nodes: usize,
|
||||
pub total_edges: usize,
|
||||
pub avg_degree: f64,
|
||||
pub max_degree: usize,
|
||||
pub num_connected_components: usize,
|
||||
pub spectral_gap: f64,
|
||||
pub avg_attention_entropy: f64,
|
||||
pub growth_rate: f64, // nodes per step
|
||||
}
|
||||
|
||||
pub struct MorphogeneticController {
|
||||
/// Regulatory network
|
||||
network: SmallMLP,
|
||||
/// Division threshold
|
||||
theta_divide: f32,
|
||||
/// Apoptosis threshold
|
||||
theta_apoptosis: f32,
|
||||
/// Synapse growth threshold
|
||||
theta_synapse: f32,
|
||||
/// Pruning threshold
|
||||
theta_prune: f32,
|
||||
/// Maximum allowed nodes
|
||||
max_nodes: usize,
|
||||
/// Minimum allowed nodes
|
||||
min_nodes: usize,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Autopoietic Graph Transformers
|
||||
|
||||
### 3.1 Autopoiesis: Self-Creating Networks
|
||||
|
||||
Autopoiesis (Maturana & Varela, 1973) describes systems that produce and maintain themselves. An autopoietic graph transformer is one where:
|
||||
1. The graph transformer produces its own components (nodes, edges, attention weights)
|
||||
2. The components interact to produce the transformer (self-referential)
|
||||
3. The system maintains its organizational identity despite continuous component replacement
|
||||
|
||||
### 3.2 Self-Producing Attention
|
||||
|
||||
In an autopoietic graph transformer, the attention mechanism produces the graph structure that defines the attention mechanism:
|
||||
|
||||
```
|
||||
Cycle:
|
||||
1. Graph G defines attention: alpha = Attention(X, G)
|
||||
2. Attention defines new graph: G' = ReconstructGraph(alpha)
|
||||
3. New graph defines new attention: alpha' = Attention(X, G')
|
||||
4. ...
|
||||
|
||||
Fixed point: G* such that ReconstructGraph(Attention(X, G*)) = G*
|
||||
```
|
||||
|
||||
**Finding the fixed point:**
|
||||
|
||||
```
|
||||
Input: Initial graph G_0, features X
|
||||
Output: Autopoietic fixed-point graph G*
|
||||
|
||||
G = G_0
|
||||
for t = 1 to max_iter:
|
||||
// Compute attention on current graph
|
||||
alpha = GraphAttention(X, G)
|
||||
|
||||
// Reconstruct graph from attention
|
||||
G_new = TopK(alpha, k=avg_degree) // Keep top-k attention weights as edges
|
||||
|
||||
// Check convergence
|
||||
if GraphDistance(G, G_new) < epsilon:
|
||||
return G_new
|
||||
|
||||
// Update with momentum
|
||||
G = (1 - beta) * G + beta * G_new
|
||||
|
||||
return G // May not have converged
|
||||
```
|
||||
|
||||
### 3.3 Component Replacement
|
||||
|
||||
An autopoietic system continuously replaces its components. In graph transformer terms:
|
||||
|
||||
```
|
||||
At each time step:
|
||||
1. Select random fraction p of nodes for replacement
|
||||
2. For each selected node v:
|
||||
- Generate replacement features: h_v' = Generator(context(v))
|
||||
- context(v) = {h_u : u in N(v)} union {alpha_{uv} : u in N(v)}
|
||||
3. The network must maintain its function despite replacement
|
||||
|
||||
Training objective:
|
||||
L = TaskLoss(output) + lambda * ReconstructionLoss(replaced_nodes)
|
||||
```
|
||||
|
||||
**Key property:** If the autopoietic graph transformer maintains performance despite continuous component replacement, it has truly learned the *organization*, not just the specific parameters.
|
||||
|
||||
---
|
||||
|
||||
## 4. Neural Cellular Automata on Graphs
|
||||
|
||||
### 4.1 Graph Neural Cellular Automata (GNCA)
|
||||
|
||||
Neural Cellular Automata (NCA) use local rules to produce emergent global behavior. On graphs, each node updates based only on its neighborhood:
|
||||
|
||||
```
|
||||
h_v(t+1) = Update(h_v(t), Aggregate({h_u(t) : u in N(v)}))
|
||||
```
|
||||
|
||||
The Update and Aggregate functions are learned, but the same functions are applied at every node (weight sharing).
|
||||
|
||||
**Properties:**
|
||||
- **Scalability**: O(n * avg_degree * d) per step -- linear in graph size
|
||||
- **Robustness**: Local rules are inherently fault-tolerant (damage is local)
|
||||
- **Emergence**: Complex global patterns from simple local rules
|
||||
- **Self-repair**: Damaged regions regenerate from surrounding healthy nodes
|
||||
|
||||
### 4.2 Self-Repairing Graph Attention
|
||||
|
||||
```
|
||||
Damage Protocol:
|
||||
1. Remove fraction p of nodes (simulate failure)
|
||||
2. Observe: remaining nodes detect damage via missing messages
|
||||
3. Repair: surviving nodes adjust attention to compensate
|
||||
|
||||
Repair mechanism:
|
||||
For each node v that detects missing neighbor u:
|
||||
1. Estimate u's contribution: h_u_hat = mean(h_w for w in N(u) - {v})
|
||||
2. Create virtual node u' with estimated features
|
||||
3. Gradually grow real replacement via morphogenetic program
|
||||
|
||||
Self-repair attention:
|
||||
alpha_{v,u}^{repair} = alpha_{v,u} * alive(u)
|
||||
+ alpha_{v,u} * (1 - alive(u)) * reconstruct_weight(v, u)
|
||||
```
|
||||
|
||||
### 4.3 Emergent Specialization
|
||||
|
||||
When GNCA runs on a graph for many steps, nodes naturally specialize into roles:
|
||||
|
||||
```
|
||||
Observed emergent roles:
|
||||
- Hub nodes: High degree, diffuse attention (broadcast information)
|
||||
- Leaf nodes: Low degree, focused attention (specialize in subtasks)
|
||||
- Bridge nodes: Connect communities, high betweenness centrality
|
||||
- Memory nodes: Stable embeddings that store persistent information
|
||||
- Signal nodes: Oscillating embeddings that propagate temporal patterns
|
||||
```
|
||||
|
||||
The morphogenetic controller can be trained to encourage or regulate this specialization.
|
||||
|
||||
---
|
||||
|
||||
## 5. Developmental Programs for Architecture Growth
|
||||
|
||||
### 5.1 Gene Regulatory Networks (GRN) for Graph Transformers
|
||||
|
||||
In biology, development is controlled by gene regulatory networks -- networks of transcription factors that activate or repress genes. We propose using GRNs to control graph transformer architecture:
|
||||
|
||||
```
|
||||
GRN for graph transformer development:
|
||||
|
||||
Genes (outputs):
|
||||
- growth_factor: controls node division rate
|
||||
- differentiation_signal: controls specialization
|
||||
- apoptosis_signal: controls cell death
|
||||
- synapse_factor: controls edge creation
|
||||
- pruning_factor: controls edge deletion
|
||||
|
||||
Regulation (inputs):
|
||||
- local_activity: node's recent attention activity
|
||||
- neighbor_signals: morphogen concentrations from neighbors
|
||||
- global_signals: broadcast from the "body" (whole graph)
|
||||
- gradient_signals: loss gradient at this node
|
||||
- age: how many steps since this node was created
|
||||
|
||||
GRN dynamics:
|
||||
dg_i/dt = sigma(sum_j W_{ij} * g_j + b_i) - decay_i * g_i
|
||||
// g_i is gene i's expression level
|
||||
// W_{ij} is regulation weight (positive = activation, negative = repression)
|
||||
// sigma is sigmoid activation
|
||||
```
|
||||
|
||||
### 5.2 Morphogen Gradients
|
||||
|
||||
Morphogens are signaling molecules that form concentration gradients, providing positional information to cells. In graph transformers:
|
||||
|
||||
```
|
||||
Morphogen diffusion on graph:
|
||||
dc_v/dt = D * sum_{u in N(v)} (c_u - c_v) / |N(v)| - decay * c_v + source(v)
|
||||
|
||||
D: diffusion coefficient
|
||||
decay: degradation rate
|
||||
source(v): production rate at node v
|
||||
|
||||
Positional information from morphogen:
|
||||
position_v = (c_1(v), c_2(v), ..., c_M(v))
|
||||
// M different morphogens give M-dimensional positional coordinates
|
||||
```
|
||||
|
||||
**Application:** Morphogen-derived positions can replace or augment positional encodings in graph transformers. Unlike hand-crafted positional encodings (random walk, Laplacian eigenvectors), morphogen positions are *learned* and *adaptive*.
|
||||
|
||||
### 5.3 Developmental Stages
|
||||
|
||||
Graph transformer development can proceed in stages, analogous to embryonic development:
|
||||
|
||||
```
|
||||
Stage 1: Blastula (steps 0-100)
|
||||
- Start with small graph (10-100 nodes)
|
||||
- Rapid node division
|
||||
- Uniform, undifferentiated nodes
|
||||
- No pruning
|
||||
|
||||
Stage 2: Gastrulation (steps 100-500)
|
||||
- Morphogen gradients establish axes
|
||||
- Nodes begin differentiating
|
||||
- Three "germ layers" emerge:
|
||||
- Ectoderm: attention (surface processing)
|
||||
- Mesoderm: message passing (structural)
|
||||
- Endoderm: memory (internal storage)
|
||||
|
||||
Stage 3: Organogenesis (steps 500-2000)
|
||||
- Specialized modules form
|
||||
- Edge pruning removes unnecessary connections
|
||||
- Modules develop distinct attention patterns
|
||||
- Architecture approaches final form
|
||||
|
||||
Stage 4: Maturation (steps 2000+)
|
||||
- Fine-tuning of weights (no more architectural changes)
|
||||
- Synaptic refinement
|
||||
- Performance optimization
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Complexity Analysis
|
||||
|
||||
### 6.1 Growth Dynamics
|
||||
|
||||
**Theorem.** Under the morphogenetic program with division probability p_div and apoptosis probability p_apo, the expected number of nodes at time t is:
|
||||
|
||||
```
|
||||
E[n(t)] = n(0) * exp((p_div - p_apo) * t)
|
||||
```
|
||||
|
||||
For a stable architecture, we need p_div = p_apo (zero growth rate) at equilibrium.
|
||||
|
||||
**Steady-state analysis.** At equilibrium:
|
||||
- Division rate: R_div = n * p_div(loss, architecture)
|
||||
- Death rate: R_apo = n * p_apo(loss, architecture)
|
||||
- Equilibrium: R_div = R_apo implies p_div = p_apo
|
||||
- Stability: d(p_div - p_apo)/dn < 0 (negative feedback)
|
||||
|
||||
### 6.2 Computational Overhead of Morphogenesis
|
||||
|
||||
| Operation | Cost per event | Expected events per step |
|
||||
|-----------|---------------|-------------------------|
|
||||
| Node division | O(degree(v) * d) | O(n * p_div) |
|
||||
| Node apoptosis | O(degree(v)^2 * d) | O(n * p_apo) |
|
||||
| Edge creation | O(d) | O(n * p_synapse) |
|
||||
| Edge pruning | O(1) | O(|E| * p_prune) |
|
||||
| Controller inference | O(n * d_controller) | n (every node, every step) |
|
||||
|
||||
**Total overhead per step:** O(n * (avg_degree * d * (p_div + p_apo) + d_controller))
|
||||
|
||||
For p_div = p_apo = 0.01 and d_controller = 64: **~2% overhead on top of standard graph transformer forward pass.**
|
||||
|
||||
---
|
||||
|
||||
## 7. Projections
|
||||
|
||||
### 7.1 By 2030
|
||||
|
||||
**Likely:**
|
||||
- Neural cellular automata on graphs achieving competitive results on graph tasks
|
||||
- Simple morphogenetic programs (division + pruning) improving architecture efficiency
|
||||
- Self-repairing graph attention demonstrated for fault-tolerant applications
|
||||
|
||||
**Possible:**
|
||||
- GRN-controlled graph transformer development matching NAS quality at 100x lower cost
|
||||
- Autopoietic graph transformers maintaining function despite continuous component replacement
|
||||
- Morphogen-based positional encodings outperforming hand-crafted alternatives
|
||||
|
||||
**Speculative:**
|
||||
- Graph transformers that grow from a single node to a full architecture
|
||||
- Developmental programs discovered by evolution (genetic algorithms over GRN parameters)
|
||||
|
||||
### 7.2 By 2033
|
||||
|
||||
**Likely:**
|
||||
- Morphogenetic graph transformers as standard tool for adaptive architectures
|
||||
- Self-organizing graph attention for continual learning (grow new capacity for new tasks)
|
||||
|
||||
**Possible:**
|
||||
- Multi-organism graph transformers: separate developmental programs interacting
|
||||
- Morphogenetic graph transformers on neuromorphic hardware (biological development on biological hardware)
|
||||
|
||||
### 7.3 By 2036+
|
||||
|
||||
**Possible:**
|
||||
- Artificial embryogenesis: graph transformers that develop like organisms
|
||||
- Self-evolving graph transformers: mutation + selection over developmental programs
|
||||
|
||||
**Speculative:**
|
||||
- Open-ended evolution of graph transformer architectures
|
||||
- Graph transformers that reproduce: one network spawns a new network
|
||||
|
||||
---
|
||||
|
||||
## 8. RuVector Implementation Roadmap
|
||||
|
||||
### Phase 1: Cellular Automata Foundation (2026-2027)
|
||||
- Implement GNCA layer in `ruvector-gnn`
|
||||
- Add dynamic graph operations to `ruvector-graph` (node/edge add/remove during forward pass)
|
||||
- Self-repair experiments on graph attention
|
||||
|
||||
### Phase 2: Morphogenetic Programs (2027-2028)
|
||||
- Morphogenetic controller using `ruvector-nervous-system` competitive learning
|
||||
- Node division, differentiation, apoptosis operations
|
||||
- GRN implementation for developmental control
|
||||
- Integration with `ruvector-gnn` EWC for continual learning during growth
|
||||
|
||||
### Phase 3: Autopoiesis (2028-2030)
|
||||
- Autopoietic fixed-point computation
|
||||
- Component replacement training
|
||||
- Morphogen diffusion on graphs
|
||||
- Developmental staging system
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
1. Mordvintsev et al., "Growing Neural Cellular Automata," Distill 2020
|
||||
2. Maturana & Varela, "Autopoiesis and Cognition," 1980
|
||||
3. Turing, "The Chemical Basis of Morphogenesis," 1952
|
||||
4. Wolpert, "Positional Information and the Spatial Pattern of Cellular Differentiation," 1969
|
||||
5. Stanley & Miikkulainen, "A Taxonomy for Artificial Embryogeny," Artificial Life 2003
|
||||
6. Grattarola et al., "Learning Graph Cellular Automata," NeurIPS 2021
|
||||
|
||||
---
|
||||
|
||||
**End of Document 25**
|
||||
|
||||
**Next:** [Doc 26 - Formal Verification: Proof-Carrying GNN](26-formal-verification-proof-carrying-gnn.md)
|
||||
Reference in New Issue
Block a user