# Axis 3: Biological -- Spiking Graph Transformers **Document:** 23 of 30 **Series:** Graph Transformers: 2026-2036 and Beyond **Last Updated:** 2026-02-25 **Status:** Research Prospectus --- ## 1. Problem Statement The brain processes graph-structured information (connectomes, neural circuits, cortical columns) using mechanisms fundamentally different from backpropagation-trained transformers: discrete spikes, local Hebbian learning rules, dendritic computation, and spike-timing-dependent plasticity. These mechanisms are energy-efficient (the brain uses ~20 watts for ~86 billion neurons) and naturally parallel. The biological axis asks: can we build graph transformers that compute like brains? ### 1.1 The Efficiency Gap | System | Nodes | Power | Power/Node | Latency | |--------|-------|-------|------------|---------| | Human brain | 86 x 10^9 | 20 W | 0.23 nW | ~100ms | | GPU graph transformer | 10^6 | 300 W | 300 uW | ~1ms | | Neuromorphic (Loihi 2) | 10^6 | 1 W | 1 uW | ~10ms | | Spiking graph transformer (proposed) | 10^8 | 10 W | 0.1 uW | ~50ms | The brain achieves 6 orders of magnitude better power efficiency per node. Spiking graph transformers aim to close this gap by 3-4 orders of magnitude. ### 1.2 RuVector Baseline - **`ruvector-mincut-gated-transformer`**: Spiking neurons (`spike.rs`), energy gates (`energy_gate.rs`) - **`ruvector-nervous-system`**: Hopfield nets (`hopfield/`), HDC (`hdc/`), dendrite compute (`dendrite/`), plasticity (`plasticity/`), competitive learning (`compete/`), routing (`routing/`) - **`ruvector-attention`**: Neighborhood attention (`graph/`), sparse attention (`sparse/`) --- ## 2. Spiking Graph Attention ### 2.1 From Softmax to Spikes Standard graph attention: ``` alpha_{uv} = softmax_v(Q_u . K_v^T / sqrt(d)) z_u = sum_v alpha_{uv} * V_v ``` Spiking graph attention: ``` // Accumulate input current from neighbors I_u(t) = sum_{v in N(u)} w_{uv} * S_v(t) * V_v // Leaky integrate-and-fire (LIF) dynamics tau * dU_u/dt = -U_u(t) + I_u(t) // Spike when membrane potential exceeds threshold if U_u(t) >= theta_u: S_u(t) = 1 // Emit spike U_u(t) = U_reset // Reset potential else: S_u(t) = 0 ``` **Key differences from standard attention:** 1. **Temporal coding**: Information is in spike timing, not continuous values 2. **Winner-take-all**: High-attention nodes spike first (rate and temporal coding) 3. **Energy proportional to activity**: Silent nodes consume zero energy 4. **Local computation**: Each node only sees spikes from its graph neighbors ### 2.2 Spike-Based Attention Weights We propose three mechanisms for spike-based attention: **Mechanism 1: Rate-Coded Attention** ``` alpha_{uv} = spike_rate(v, window_T) / sum_w spike_rate(w, window_T) ``` Attention weight proportional to how often a neighbor spikes. Reduces to standard attention in the continuous limit. **Mechanism 2: Temporal-Coded Attention** ``` alpha_{uv} = exp(-|t_spike(u) - t_spike(v)| / tau) / Z ``` Nodes that spike close in time attend to each other. Implements temporal coincidence detection. **Mechanism 3: Phase-Coded Attention** ``` alpha_{uv} = cos(phi_u(t) - phi_v(t)) / Z ``` Attention based on oscillatory phase coherence. Nodes oscillating in phase form attention groups. Related to gamma oscillations in the brain. ### 2.3 Spiking Graph Attention Network (SGAT) ``` Architecture: Input Layer: Encode features as spike trains | Spiking Attention Layer 1: - Each node: LIF neuron - Attention: via spike timing (Mechanism 2) - Aggregation: spike-weighted sum | Spiking Attention Layer 2: - Lateral inhibition for competition - Winner-take-all within neighborhoods | ... | Readout Layer: Decode spike trains to continuous values - Population coding: average over neuron populations - Rate decoding: spike count in window ``` **RuVector integration:** ```rust /// Spiking graph attention layer pub struct SpikingGraphAttention { /// Neuron parameters per node neurons: Vec, /// Synaptic weights (graph edges) synapses: SparseMatrix, /// Attention mechanism attention_mode: SpikeAttentionMode, /// Time step dt: f64, /// Current simulation time t: f64, } pub struct LIFNeuron { /// Membrane potential pub membrane_potential: f32, /// Resting potential pub v_rest: f32, /// Threshold pub threshold: f32, /// Reset potential pub v_reset: f32, /// Membrane time constant pub tau: f32, /// Refractory period counter pub refractory: f32, /// Last spike time pub last_spike: f64, /// Spike train history pub spike_train: VecDeque, } pub struct SynapticWeight { /// Base weight pub weight: f32, /// Plasticity trace (for STDP) pub trace: f32, /// Delay (in dt units) pub delay: u16, } pub enum SpikeAttentionMode { /// Attention proportional to spike rate RateCoded { window: f64 }, /// Attention from spike timing coincidence TemporalCoded { tau: f64 }, /// Attention from phase coherence PhaseCoded { frequency: f64 }, } impl SpikingGraphAttention { /// Simulate one time step pub fn step( &mut self, graph: &PropertyGraph, input_currents: &[f32], ) -> Vec { // Returns which nodes spiked let mut spikes = vec![false; self.neurons.len()]; for (v, neuron) in self.neurons.iter_mut().enumerate() { // Skip if in refractory period if neuron.refractory > 0.0 { neuron.refractory -= self.dt as f32; continue; } // Accumulate input from spiking neighbors let mut input = input_currents[v]; for (u, synapse) in self.incoming_synapses(v, graph) { if self.neurons[u].spiked_at(self.t - synapse.delay as f64 * self.dt) { input += synapse.weight; } } // LIF dynamics neuron.membrane_potential += self.dt as f32 * (-neuron.membrane_potential + neuron.v_rest + input) / neuron.tau; // Spike check if neuron.membrane_potential >= neuron.threshold { spikes[v] = true; neuron.membrane_potential = neuron.v_reset; neuron.refractory = 2.0; // 2ms refractory neuron.last_spike = self.t; neuron.spike_train.push_back(self.t); } } self.t += self.dt; spikes } } ``` --- ## 3. Hebbian Learning on Graphs ### 3.1 Graph Hebbian Rules Classical Hebb's rule: "Neurons that fire together, wire together." **Graph Hebbian attention update:** ``` Delta_w_{uv} = eta * ( pre_trace(u) * post_trace(v) // Hebbian term - lambda * w_{uv} // Weight decay ) ``` where pre_trace and post_trace are exponentially filtered spike trains: ``` pre_trace(u, t) = sum_{t_spike < t} exp(-(t - t_spike) / tau_pre) post_trace(v, t) = sum_{t_spike < t} exp(-(t - t_spike) / tau_post) ``` ### 3.2 Spike-Timing-Dependent Plasticity (STDP) on Graphs STDP adjusts edge weights based on the relative timing of pre- and post-synaptic spikes: ``` Delta_w_{uv} = A_+ * exp(-(t_post - t_pre) / tau_+) if t_post > t_pre (LTP) -A_- * exp(-(t_pre - t_post) / tau_-) if t_pre > t_post (LTD) ``` - LTP (Long-Term Potentiation): Pre before post -> strengthen connection - LTD (Long-Term Depression): Post before pre -> weaken connection **Graph STDP attention:** ``` For each edge (u, v) in E: For each pair of spikes (t_u, t_v): dt = t_v - t_u if dt > 0: // u spiked before v w_{uv} += A_+ * exp(-dt / tau_+) // Strengthen u->v else: w_{uv} -= A_- * exp(dt / tau_-) // Weaken u->v ``` **Interpretation as attention learning:** STDP automatically learns attention weights that encode causal influence in the graph. If node u's activity reliably precedes node v's, the u->v attention weight increases. ### 3.3 Homeostatic Plasticity for Attention Stability Pure STDP can lead to runaway excitation or silencing. Homeostatic mechanisms maintain stable attention distributions: **Intrinsic plasticity (threshold adaptation):** ``` theta_v += eta_theta * (spike_rate(v) - target_rate) ``` Nodes that spike too often raise their threshold; rarely-spiking nodes lower it. **Synaptic scaling:** ``` w_{uv} *= (target_rate / actual_rate(v))^{1/3} ``` All incoming weights scale to maintain target activity. **BCM rule (Bienenstock-Cooper-Munro):** ``` Delta_w_{uv} = eta * post_activity * (post_activity - theta_BCM) * pre_activity ``` The sliding threshold theta_BCM prevents both runaway excitation and complete depression. --- ## 4. Dendritic Graph Computation ### 4.1 Beyond Flat Embeddings Standard GNNs treat each node as a single computational unit with a flat embedding vector. Real neurons have elaborate dendritic trees with nonlinear computation in individual branches. **Dendritic graph node:** ``` Each node v has a dendritic tree D_v with: - B branches, each receiving input from a subset of neighbors - Nonlinear dendritic activation per branch - Somatic integration combining branch outputs Node embedding: h_v = soma( branch_1(inputs from neighbors N_1(v)), branch_2(inputs from neighbors N_2(v)), ... branch_B(inputs from neighbors N_B(v)) ) ``` **Advantage:** A single dendritic node can compute functions (like XOR) that require multiple layers of flat neurons. This makes dendritic graph transformers deeper in computational power despite being shallower in layer count. ### 4.2 Dendritic Attention Mechanism ``` For node v with B dendritic branches: 1. PARTITION neighbors into branches: N_1(v), N_2(v), ..., N_B(v) = partition(N(v)) (partition can be learned or based on graph structure) 2. BRANCH computation: For each branch b: z_b = sigma(W_b * aggregate(h_u for u in N_b(v))) // Nonlinear dendritic activation per branch 3. BRANCH attention: alpha_b = softmax(W_attn * z_b) // Attention across branches (which branch is most relevant) 4. SOMATIC integration: h_v = soma(sum_b alpha_b * z_b) // Final node embedding ``` **Complexity:** O(|N(v)| * d + B * d) per node. The B-fold increase in parameters is compensated by the ability to use fewer layers. **RuVector integration:** The `ruvector-nervous-system/src/dendrite/` module already implements dendritic computation. Extending it to graph attention requires: 1. Neighbor-to-branch assignment (can use graph clustering from `ruvector-mincut`) 2. Branch-level attention computation 3. Integration with the main attention trait system in `ruvector-attention` --- ## 5. Neuromorphic Hardware Deployment ### 5.1 Target Platforms (2026-2030) | Platform | Neurons | Synapses | Power | Architecture | |----------|---------|----------|-------|-------------| | Intel Loihi 2 | 1M per chip | 120M | 1W | Digital LIF, programmable | | IBM NorthPole | 256M ops/cycle | - | 12W | Digital inference | | SynSense Speck | 320K | 65M | 0.7mW | Dynamic vision | | BrainChip Akida | 1.2M | 10B | 1W | Event-driven | | SpiNNaker 2 | 10M per board | 10B | 10W | ARM cores + digital neurons | ### 5.2 Graph Transformer to Neuromorphic Compilation ``` Compilation pipeline: Source: SpikingGraphAttention (RuVector Rust) | v Step 1: Graph Partitioning - Partition graph to fit chip neuron limits - Use ruvector-mincut for optimal partitioning - Map partitions to neuromorphic cores | v Step 2: Neuron Mapping - Map each graph node to a hardware neuron cluster - Map attention weights to synaptic connections - Configure LIF parameters (threshold, tau, etc.) | v Step 3: Synapse Routing - Map graph edges to hardware synaptic routes - Handle multi-hop routing for non-local edges - Optimize for communication bandwidth | v Step 4: STDP Configuration - Program learning rules into on-chip plasticity engines - Set STDP time constants and learning rates | v Target: Neuromorphic binary (Loihi SLIF, SpiNNaker PyNN, etc.) ``` **RuVector compilation target:** ```rust /// Trait for neuromorphic compilation targets pub trait NeuromorphicTarget { type Config; type Binary; /// Maximum neurons per core fn neurons_per_core(&self) -> usize; /// Maximum synapses per neuron fn synapses_per_neuron(&self) -> usize; /// Supported neuron models fn supported_models(&self) -> Vec; /// Compile spiking graph attention to target fn compile( &self, sgat: &SpikingGraphAttention, graph: &PropertyGraph, config: &Self::Config, ) -> Result; /// Estimated power consumption fn estimate_power( &self, binary: &Self::Binary, spike_rate: f64, ) -> PowerEstimate; } pub struct PowerEstimate { pub static_power_mw: f64, pub dynamic_power_mw: f64, pub total_power_mw: f64, pub energy_per_spike_nj: f64, pub energy_per_inference_uj: f64, } ``` --- ## 6. Oscillatory Graph Attention ### 6.1 Gamma Oscillations and Binding The brain uses oscillatory synchronization (gamma: 30-100 Hz) to bind features. Neurons representing the same object oscillate in phase; different objects oscillate out of phase. **Oscillatory graph attention:** ``` Each node v has phase phi_v(t) and frequency omega_v: dphi_v/dt = omega_v + sum_{u in N(v)} K_{uv} * sin(phi_u - phi_v) ``` This is a Kuramoto model on the graph. Coupled nodes synchronize; uncoupled nodes desynchronize. **Attention from synchronization:** ``` alpha_{uv}(t) = (1 + cos(phi_u(t) - phi_v(t))) / 2 ``` Synchronized nodes have attention weight 1; anti-phase nodes have weight 0. ### 6.2 Multi-Frequency Attention Different attention heads operate at different frequencies: ``` Head h at frequency omega_h: phi_v^h(t) oscillates at omega_h + perturbations from neighbors alpha_{uv}^h(t) = (1 + cos(phi_u^h - phi_v^h)) / 2 Cross-frequency coupling: phi_v^{slow}(t) modulates amplitude of phi_v^{fast}(t) // Implements hierarchical binding: // slow oscillation groups communities // fast oscillation groups nodes within communities ``` **RuVector connection:** This connects to `ruvector-coherence`'s spectral coherence tracking. The oscillatory phases define a coherence metric on the graph. --- ## 7. Projections ### 7.1 By 2030 **Likely:** - Spiking graph transformers achieving 100x energy efficiency over GPU versions on small graphs - STDP-trained graph attention competitive with backprop on benchmark tasks - Neuromorphic deployment of graph transformers on Loihi 3 / SpiNNaker 2+ **Possible:** - Dendritic graph attention reducing required depth by 3-5x - Oscillatory attention for temporal graph problems (event detection, anomaly detection) - Hebbian graph learning for continual graph learning (no catastrophic forgetting) **Speculative:** - Brain-scale (10^10 neuron) spiking graph transformers on neuromorphic clusters - Online unsupervised STDP learning matching supervised performance ### 7.2 By 2033 **Likely:** - Neuromorphic graph transformer chips (custom silicon for spiking graph attention) - Dendritic computation standard in graph attention toolkits - 1000x energy efficiency over 2026 GPU baselines **Possible:** - Self-organizing spiking graph transformers that grow new neurons/connections - Cross-frequency attention for multi-scale graph reasoning - Neuromorphic edge AI: graph transformers in IoT sensors ### 7.3 By 2036+ **Possible:** - Neuromorphic graph transformers matching brain efficiency (~1 nW/node) - Spiking graph transformers with emergent cognitive-like capabilities - Biological-digital hybrid systems (graph transformers interfacing with neural tissue) **Speculative:** - True neuromorphic graph intelligence: self-learning, self-organizing, self-repairing - Graph transformers that implement cortical column dynamics --- ## 8. RuVector Implementation Roadmap ### Phase 1: Spiking Foundation (2026-2027) - Extend `ruvector-mincut-gated-transformer/src/spike.rs` with full LIF graph dynamics - Implement STDP learning rules in `ruvector-nervous-system/src/plasticity/` - Add spike-based attention to `ruvector-attention` trait system - Benchmark on neuromorphic graph datasets ### Phase 2: Dendritic & Oscillatory (2027-2028) - Extend `ruvector-nervous-system/src/dendrite/` for graph attention - Implement Kuramoto oscillatory attention - Add dendritic branching strategies using `ruvector-mincut` partitioning - Integration with `ruvector-coherence` for coherence tracking ### Phase 3: Neuromorphic Deployment (2028-2030) - Neuromorphic compilation pipeline (Loihi, SpiNNaker targets) - Power-optimized spiking graph attention - Edge deployment for IoT graph processing - WASM-based spiking graph simulation via existing WASM crates --- ## References 1. Zhu et al., "Spiking Graph Neural Networks," IEEE TNNLS 2023 2. Hazan et al., "BindsNET: A Machine Learning-Oriented Spiking Neural Networks Library in Python," Frontiers in Neuroinformatics 2018 3. Tavanaei et al., "Deep Learning in Spiking Neural Networks," Neural Networks 2019 4. London & Hausser, "Dendritic Computation," Annual Review of Neuroscience 2005 5. Poirazi & Papoutsi, "Illuminating dendritic function with computational models," Nature Reviews Neuroscience 2020 6. Breakspear, "Dynamic Models of Large-Scale Brain Activity," Nature Neuroscience 2017 7. Davies et al., "Loihi 2: A Neuromorphic Processor with Programmable Synapses and Neuron Models," IEEE Micro 2021 --- **End of Document 23** **Next:** [Doc 24 - Quantum Graph Attention](24-quantum-graph-attention.md)