Files
wifi-densepose/docs/research/gnn-v2/23-biological-spiking-graph-transformers.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

551 lines
17 KiB
Markdown

# Axis 3: Biological -- Spiking Graph Transformers
**Document:** 23 of 30
**Series:** Graph Transformers: 2026-2036 and Beyond
**Last Updated:** 2026-02-25
**Status:** Research Prospectus
---
## 1. Problem Statement
The brain processes graph-structured information (connectomes, neural circuits, cortical columns) using mechanisms fundamentally different from backpropagation-trained transformers: discrete spikes, local Hebbian learning rules, dendritic computation, and spike-timing-dependent plasticity. These mechanisms are energy-efficient (the brain uses ~20 watts for ~86 billion neurons) and naturally parallel.
The biological axis asks: can we build graph transformers that compute like brains?
### 1.1 The Efficiency Gap
| System | Nodes | Power | Power/Node | Latency |
|--------|-------|-------|------------|---------|
| Human brain | 86 x 10^9 | 20 W | 0.23 nW | ~100ms |
| GPU graph transformer | 10^6 | 300 W | 300 uW | ~1ms |
| Neuromorphic (Loihi 2) | 10^6 | 1 W | 1 uW | ~10ms |
| Spiking graph transformer (proposed) | 10^8 | 10 W | 0.1 uW | ~50ms |
The brain achieves 6 orders of magnitude better power efficiency per node. Spiking graph transformers aim to close this gap by 3-4 orders of magnitude.
### 1.2 RuVector Baseline
- **`ruvector-mincut-gated-transformer`**: Spiking neurons (`spike.rs`), energy gates (`energy_gate.rs`)
- **`ruvector-nervous-system`**: Hopfield nets (`hopfield/`), HDC (`hdc/`), dendrite compute (`dendrite/`), plasticity (`plasticity/`), competitive learning (`compete/`), routing (`routing/`)
- **`ruvector-attention`**: Neighborhood attention (`graph/`), sparse attention (`sparse/`)
---
## 2. Spiking Graph Attention
### 2.1 From Softmax to Spikes
Standard graph attention:
```
alpha_{uv} = softmax_v(Q_u . K_v^T / sqrt(d))
z_u = sum_v alpha_{uv} * V_v
```
Spiking graph attention:
```
// Accumulate input current from neighbors
I_u(t) = sum_{v in N(u)} w_{uv} * S_v(t) * V_v
// Leaky integrate-and-fire (LIF) dynamics
tau * dU_u/dt = -U_u(t) + I_u(t)
// Spike when membrane potential exceeds threshold
if U_u(t) >= theta_u:
S_u(t) = 1 // Emit spike
U_u(t) = U_reset // Reset potential
else:
S_u(t) = 0
```
**Key differences from standard attention:**
1. **Temporal coding**: Information is in spike timing, not continuous values
2. **Winner-take-all**: High-attention nodes spike first (rate and temporal coding)
3. **Energy proportional to activity**: Silent nodes consume zero energy
4. **Local computation**: Each node only sees spikes from its graph neighbors
### 2.2 Spike-Based Attention Weights
We propose three mechanisms for spike-based attention:
**Mechanism 1: Rate-Coded Attention**
```
alpha_{uv} = spike_rate(v, window_T) / sum_w spike_rate(w, window_T)
```
Attention weight proportional to how often a neighbor spikes. Reduces to standard attention in the continuous limit.
**Mechanism 2: Temporal-Coded Attention**
```
alpha_{uv} = exp(-|t_spike(u) - t_spike(v)| / tau) / Z
```
Nodes that spike close in time attend to each other. Implements temporal coincidence detection.
**Mechanism 3: Phase-Coded Attention**
```
alpha_{uv} = cos(phi_u(t) - phi_v(t)) / Z
```
Attention based on oscillatory phase coherence. Nodes oscillating in phase form attention groups. Related to gamma oscillations in the brain.
### 2.3 Spiking Graph Attention Network (SGAT)
```
Architecture:
Input Layer: Encode features as spike trains
|
Spiking Attention Layer 1:
- Each node: LIF neuron
- Attention: via spike timing (Mechanism 2)
- Aggregation: spike-weighted sum
|
Spiking Attention Layer 2:
- Lateral inhibition for competition
- Winner-take-all within neighborhoods
|
...
|
Readout Layer: Decode spike trains to continuous values
- Population coding: average over neuron populations
- Rate decoding: spike count in window
```
**RuVector integration:**
```rust
/// Spiking graph attention layer
pub struct SpikingGraphAttention {
/// Neuron parameters per node
neurons: Vec<LIFNeuron>,
/// Synaptic weights (graph edges)
synapses: SparseMatrix<SynapticWeight>,
/// Attention mechanism
attention_mode: SpikeAttentionMode,
/// Time step
dt: f64,
/// Current simulation time
t: f64,
}
pub struct LIFNeuron {
/// Membrane potential
pub membrane_potential: f32,
/// Resting potential
pub v_rest: f32,
/// Threshold
pub threshold: f32,
/// Reset potential
pub v_reset: f32,
/// Membrane time constant
pub tau: f32,
/// Refractory period counter
pub refractory: f32,
/// Last spike time
pub last_spike: f64,
/// Spike train history
pub spike_train: VecDeque<f64>,
}
pub struct SynapticWeight {
/// Base weight
pub weight: f32,
/// Plasticity trace (for STDP)
pub trace: f32,
/// Delay (in dt units)
pub delay: u16,
}
pub enum SpikeAttentionMode {
/// Attention proportional to spike rate
RateCoded { window: f64 },
/// Attention from spike timing coincidence
TemporalCoded { tau: f64 },
/// Attention from phase coherence
PhaseCoded { frequency: f64 },
}
impl SpikingGraphAttention {
/// Simulate one time step
pub fn step(
&mut self,
graph: &PropertyGraph,
input_currents: &[f32],
) -> Vec<bool> { // Returns which nodes spiked
let mut spikes = vec![false; self.neurons.len()];
for (v, neuron) in self.neurons.iter_mut().enumerate() {
// Skip if in refractory period
if neuron.refractory > 0.0 {
neuron.refractory -= self.dt as f32;
continue;
}
// Accumulate input from spiking neighbors
let mut input = input_currents[v];
for (u, synapse) in self.incoming_synapses(v, graph) {
if self.neurons[u].spiked_at(self.t - synapse.delay as f64 * self.dt) {
input += synapse.weight;
}
}
// LIF dynamics
neuron.membrane_potential +=
self.dt as f32 * (-neuron.membrane_potential + neuron.v_rest + input)
/ neuron.tau;
// Spike check
if neuron.membrane_potential >= neuron.threshold {
spikes[v] = true;
neuron.membrane_potential = neuron.v_reset;
neuron.refractory = 2.0; // 2ms refractory
neuron.last_spike = self.t;
neuron.spike_train.push_back(self.t);
}
}
self.t += self.dt;
spikes
}
}
```
---
## 3. Hebbian Learning on Graphs
### 3.1 Graph Hebbian Rules
Classical Hebb's rule: "Neurons that fire together, wire together."
**Graph Hebbian attention update:**
```
Delta_w_{uv} = eta * (
pre_trace(u) * post_trace(v) // Hebbian term
- lambda * w_{uv} // Weight decay
)
```
where pre_trace and post_trace are exponentially filtered spike trains:
```
pre_trace(u, t) = sum_{t_spike < t} exp(-(t - t_spike) / tau_pre)
post_trace(v, t) = sum_{t_spike < t} exp(-(t - t_spike) / tau_post)
```
### 3.2 Spike-Timing-Dependent Plasticity (STDP) on Graphs
STDP adjusts edge weights based on the relative timing of pre- and post-synaptic spikes:
```
Delta_w_{uv} =
A_+ * exp(-(t_post - t_pre) / tau_+) if t_post > t_pre (LTP)
-A_- * exp(-(t_pre - t_post) / tau_-) if t_pre > t_post (LTD)
```
- LTP (Long-Term Potentiation): Pre before post -> strengthen connection
- LTD (Long-Term Depression): Post before pre -> weaken connection
**Graph STDP attention:**
```
For each edge (u, v) in E:
For each pair of spikes (t_u, t_v):
dt = t_v - t_u
if dt > 0: // u spiked before v
w_{uv} += A_+ * exp(-dt / tau_+) // Strengthen u->v
else:
w_{uv} -= A_- * exp(dt / tau_-) // Weaken u->v
```
**Interpretation as attention learning:** STDP automatically learns attention weights that encode causal influence in the graph. If node u's activity reliably precedes node v's, the u->v attention weight increases.
### 3.3 Homeostatic Plasticity for Attention Stability
Pure STDP can lead to runaway excitation or silencing. Homeostatic mechanisms maintain stable attention distributions:
**Intrinsic plasticity (threshold adaptation):**
```
theta_v += eta_theta * (spike_rate(v) - target_rate)
```
Nodes that spike too often raise their threshold; rarely-spiking nodes lower it.
**Synaptic scaling:**
```
w_{uv} *= (target_rate / actual_rate(v))^{1/3}
```
All incoming weights scale to maintain target activity.
**BCM rule (Bienenstock-Cooper-Munro):**
```
Delta_w_{uv} = eta * post_activity * (post_activity - theta_BCM) * pre_activity
```
The sliding threshold theta_BCM prevents both runaway excitation and complete depression.
---
## 4. Dendritic Graph Computation
### 4.1 Beyond Flat Embeddings
Standard GNNs treat each node as a single computational unit with a flat embedding vector. Real neurons have elaborate dendritic trees with nonlinear computation in individual branches.
**Dendritic graph node:**
```
Each node v has a dendritic tree D_v with:
- B branches, each receiving input from a subset of neighbors
- Nonlinear dendritic activation per branch
- Somatic integration combining branch outputs
Node embedding:
h_v = soma(
branch_1(inputs from neighbors N_1(v)),
branch_2(inputs from neighbors N_2(v)),
...
branch_B(inputs from neighbors N_B(v))
)
```
**Advantage:** A single dendritic node can compute functions (like XOR) that require multiple layers of flat neurons. This makes dendritic graph transformers deeper in computational power despite being shallower in layer count.
### 4.2 Dendritic Attention Mechanism
```
For node v with B dendritic branches:
1. PARTITION neighbors into branches:
N_1(v), N_2(v), ..., N_B(v) = partition(N(v))
(partition can be learned or based on graph structure)
2. BRANCH computation:
For each branch b:
z_b = sigma(W_b * aggregate(h_u for u in N_b(v)))
// Nonlinear dendritic activation per branch
3. BRANCH attention:
alpha_b = softmax(W_attn * z_b)
// Attention across branches (which branch is most relevant)
4. SOMATIC integration:
h_v = soma(sum_b alpha_b * z_b)
// Final node embedding
```
**Complexity:** O(|N(v)| * d + B * d) per node. The B-fold increase in parameters is compensated by the ability to use fewer layers.
**RuVector integration:** The `ruvector-nervous-system/src/dendrite/` module already implements dendritic computation. Extending it to graph attention requires:
1. Neighbor-to-branch assignment (can use graph clustering from `ruvector-mincut`)
2. Branch-level attention computation
3. Integration with the main attention trait system in `ruvector-attention`
---
## 5. Neuromorphic Hardware Deployment
### 5.1 Target Platforms (2026-2030)
| Platform | Neurons | Synapses | Power | Architecture |
|----------|---------|----------|-------|-------------|
| Intel Loihi 2 | 1M per chip | 120M | 1W | Digital LIF, programmable |
| IBM NorthPole | 256M ops/cycle | - | 12W | Digital inference |
| SynSense Speck | 320K | 65M | 0.7mW | Dynamic vision |
| BrainChip Akida | 1.2M | 10B | 1W | Event-driven |
| SpiNNaker 2 | 10M per board | 10B | 10W | ARM cores + digital neurons |
### 5.2 Graph Transformer to Neuromorphic Compilation
```
Compilation pipeline:
Source: SpikingGraphAttention (RuVector Rust)
|
v
Step 1: Graph Partitioning
- Partition graph to fit chip neuron limits
- Use ruvector-mincut for optimal partitioning
- Map partitions to neuromorphic cores
|
v
Step 2: Neuron Mapping
- Map each graph node to a hardware neuron cluster
- Map attention weights to synaptic connections
- Configure LIF parameters (threshold, tau, etc.)
|
v
Step 3: Synapse Routing
- Map graph edges to hardware synaptic routes
- Handle multi-hop routing for non-local edges
- Optimize for communication bandwidth
|
v
Step 4: STDP Configuration
- Program learning rules into on-chip plasticity engines
- Set STDP time constants and learning rates
|
v
Target: Neuromorphic binary (Loihi SLIF, SpiNNaker PyNN, etc.)
```
**RuVector compilation target:**
```rust
/// Trait for neuromorphic compilation targets
pub trait NeuromorphicTarget {
type Config;
type Binary;
/// Maximum neurons per core
fn neurons_per_core(&self) -> usize;
/// Maximum synapses per neuron
fn synapses_per_neuron(&self) -> usize;
/// Supported neuron models
fn supported_models(&self) -> Vec<NeuronModel>;
/// Compile spiking graph attention to target
fn compile(
&self,
sgat: &SpikingGraphAttention,
graph: &PropertyGraph,
config: &Self::Config,
) -> Result<Self::Binary, CompileError>;
/// Estimated power consumption
fn estimate_power(
&self,
binary: &Self::Binary,
spike_rate: f64,
) -> PowerEstimate;
}
pub struct PowerEstimate {
pub static_power_mw: f64,
pub dynamic_power_mw: f64,
pub total_power_mw: f64,
pub energy_per_spike_nj: f64,
pub energy_per_inference_uj: f64,
}
```
---
## 6. Oscillatory Graph Attention
### 6.1 Gamma Oscillations and Binding
The brain uses oscillatory synchronization (gamma: 30-100 Hz) to bind features. Neurons representing the same object oscillate in phase; different objects oscillate out of phase.
**Oscillatory graph attention:**
```
Each node v has phase phi_v(t) and frequency omega_v:
dphi_v/dt = omega_v + sum_{u in N(v)} K_{uv} * sin(phi_u - phi_v)
```
This is a Kuramoto model on the graph. Coupled nodes synchronize; uncoupled nodes desynchronize.
**Attention from synchronization:**
```
alpha_{uv}(t) = (1 + cos(phi_u(t) - phi_v(t))) / 2
```
Synchronized nodes have attention weight 1; anti-phase nodes have weight 0.
### 6.2 Multi-Frequency Attention
Different attention heads operate at different frequencies:
```
Head h at frequency omega_h:
phi_v^h(t) oscillates at omega_h + perturbations from neighbors
alpha_{uv}^h(t) = (1 + cos(phi_u^h - phi_v^h)) / 2
Cross-frequency coupling:
phi_v^{slow}(t) modulates amplitude of phi_v^{fast}(t)
// Implements hierarchical binding:
// slow oscillation groups communities
// fast oscillation groups nodes within communities
```
**RuVector connection:** This connects to `ruvector-coherence`'s spectral coherence tracking. The oscillatory phases define a coherence metric on the graph.
---
## 7. Projections
### 7.1 By 2030
**Likely:**
- Spiking graph transformers achieving 100x energy efficiency over GPU versions on small graphs
- STDP-trained graph attention competitive with backprop on benchmark tasks
- Neuromorphic deployment of graph transformers on Loihi 3 / SpiNNaker 2+
**Possible:**
- Dendritic graph attention reducing required depth by 3-5x
- Oscillatory attention for temporal graph problems (event detection, anomaly detection)
- Hebbian graph learning for continual graph learning (no catastrophic forgetting)
**Speculative:**
- Brain-scale (10^10 neuron) spiking graph transformers on neuromorphic clusters
- Online unsupervised STDP learning matching supervised performance
### 7.2 By 2033
**Likely:**
- Neuromorphic graph transformer chips (custom silicon for spiking graph attention)
- Dendritic computation standard in graph attention toolkits
- 1000x energy efficiency over 2026 GPU baselines
**Possible:**
- Self-organizing spiking graph transformers that grow new neurons/connections
- Cross-frequency attention for multi-scale graph reasoning
- Neuromorphic edge AI: graph transformers in IoT sensors
### 7.3 By 2036+
**Possible:**
- Neuromorphic graph transformers matching brain efficiency (~1 nW/node)
- Spiking graph transformers with emergent cognitive-like capabilities
- Biological-digital hybrid systems (graph transformers interfacing with neural tissue)
**Speculative:**
- True neuromorphic graph intelligence: self-learning, self-organizing, self-repairing
- Graph transformers that implement cortical column dynamics
---
## 8. RuVector Implementation Roadmap
### Phase 1: Spiking Foundation (2026-2027)
- Extend `ruvector-mincut-gated-transformer/src/spike.rs` with full LIF graph dynamics
- Implement STDP learning rules in `ruvector-nervous-system/src/plasticity/`
- Add spike-based attention to `ruvector-attention` trait system
- Benchmark on neuromorphic graph datasets
### Phase 2: Dendritic & Oscillatory (2027-2028)
- Extend `ruvector-nervous-system/src/dendrite/` for graph attention
- Implement Kuramoto oscillatory attention
- Add dendritic branching strategies using `ruvector-mincut` partitioning
- Integration with `ruvector-coherence` for coherence tracking
### Phase 3: Neuromorphic Deployment (2028-2030)
- Neuromorphic compilation pipeline (Loihi, SpiNNaker targets)
- Power-optimized spiking graph attention
- Edge deployment for IoT graph processing
- WASM-based spiking graph simulation via existing WASM crates
---
## References
1. Zhu et al., "Spiking Graph Neural Networks," IEEE TNNLS 2023
2. Hazan et al., "BindsNET: A Machine Learning-Oriented Spiking Neural Networks Library in Python," Frontiers in Neuroinformatics 2018
3. Tavanaei et al., "Deep Learning in Spiking Neural Networks," Neural Networks 2019
4. London & Hausser, "Dendritic Computation," Annual Review of Neuroscience 2005
5. Poirazi & Papoutsi, "Illuminating dendritic function with computational models," Nature Reviews Neuroscience 2020
6. Breakspear, "Dynamic Models of Large-Scale Brain Activity," Nature Neuroscience 2017
7. Davies et al., "Loihi 2: A Neuromorphic Processor with Programmable Synapses and Neuron Models," IEEE Micro 2021
---
**End of Document 23**
**Next:** [Doc 24 - Quantum Graph Attention](24-quantum-graph-attention.md)