wifi-densepose/docs/research/gnn-v2/23-biological-spiking-graph-transformers.md

# Axis 3: Biological -- Spiking Graph Transformers

**Document:** 23 of 30
**Series:** Graph Transformers: 2026-2036 and Beyond
**Last Updated:** 2026-02-25
**Status:** Research Prospectus

---

## 1. Problem Statement

The brain processes graph-structured information (connectomes, neural circuits, cortical columns) using mechanisms fundamentally different from backpropagation-trained transformers: discrete spikes, local Hebbian learning rules, dendritic computation, and spike-timing-dependent plasticity. These mechanisms are energy-efficient (the brain uses ~20 watts for ~86 billion neurons) and naturally parallel.

The biological axis asks: can we build graph transformers that compute like brains?

### 1.1 The Efficiency Gap

| System | Nodes | Power | Power/Node | Latency |
|--------|-------|-------|------------|---------|
| Human brain | 86 x 10^9 | 20 W | 0.23 nW | ~100ms |
| GPU graph transformer | 10^6 | 300 W | 300 uW | ~1ms |
| Neuromorphic (Loihi 2) | 10^6 | 1 W | 1 uW | ~10ms |
| Spiking graph transformer (proposed) | 10^8 | 10 W | 0.1 uW | ~50ms |

The brain achieves 6 orders of magnitude better power efficiency per node. Spiking graph transformers aim to close this gap by 3-4 orders of magnitude.

### 1.2 RuVector Baseline

- **`ruvector-mincut-gated-transformer`**: Spiking neurons (`spike.rs`), energy gates (`energy_gate.rs`)
- **`ruvector-nervous-system`**: Hopfield nets (`hopfield/`), HDC (`hdc/`), dendrite compute (`dendrite/`), plasticity (`plasticity/`), competitive learning (`compete/`), routing (`routing/`)
- **`ruvector-attention`**: Neighborhood attention (`graph/`), sparse attention (`sparse/`)

---

## 2. Spiking Graph Attention

### 2.1 From Softmax to Spikes

Standard graph attention:
```
alpha_{uv} = softmax_v(Q_u . K_v^T / sqrt(d))
z_u = sum_v alpha_{uv} * V_v
```

Spiking graph attention:
```
// Accumulate input current from neighbors
I_u(t) = sum_{v in N(u)} w_{uv} * S_v(t) * V_v

// Leaky integrate-and-fire (LIF) dynamics
tau * dU_u/dt = -U_u(t) + I_u(t)

// Spike when membrane potential exceeds threshold
if U_u(t) >= theta_u:
    S_u(t) = 1     // Emit spike
    U_u(t) = U_reset  // Reset potential
else:
    S_u(t) = 0
```

**Key differences from standard attention:**
1. **Temporal coding**: Information is in spike timing, not continuous values
2. **Winner-take-all**: High-attention nodes spike first (rate and temporal coding)
3. **Energy proportional to activity**: Silent nodes consume zero energy
4. **Local computation**: Each node only sees spikes from its graph neighbors

### 2.2 Spike-Based Attention Weights

We propose three mechanisms for spike-based attention:

**Mechanism 1: Rate-Coded Attention**
```
alpha_{uv} = spike_rate(v, window_T) / sum_w spike_rate(w, window_T)
```
Attention weight proportional to how often a neighbor spikes. Reduces to standard attention in the continuous limit.

**Mechanism 2: Temporal-Coded Attention**
```
alpha_{uv} = exp(-|t_spike(u) - t_spike(v)| / tau) / Z
```
Nodes that spike close in time attend to each other. Implements temporal coincidence detection.

**Mechanism 3: Phase-Coded Attention**
```
alpha_{uv} = cos(phi_u(t) - phi_v(t)) / Z
```
Attention based on oscillatory phase coherence. Nodes oscillating in phase form attention groups. Related to gamma oscillations in the brain.

### 2.3 Spiking Graph Attention Network (SGAT)

```
Architecture:

Input Layer: Encode features as spike trains
  |
Spiking Attention Layer 1:
  - Each node: LIF neuron
  - Attention: via spike timing (Mechanism 2)
  - Aggregation: spike-weighted sum
  |
Spiking Attention Layer 2:
  - Lateral inhibition for competition
  - Winner-take-all within neighborhoods
  |
...
  |
Readout Layer: Decode spike trains to continuous values
  - Population coding: average over neuron populations
  - Rate decoding: spike count in window
```

**RuVector integration:**

```rust
/// Spiking graph attention layer
pub struct SpikingGraphAttention {
    /// Neuron parameters per node
    neurons: Vec<LIFNeuron>,
    /// Synaptic weights (graph edges)
    synapses: SparseMatrix<SynapticWeight>,
    /// Attention mechanism
    attention_mode: SpikeAttentionMode,
    /// Time step
    dt: f64,
    /// Current simulation time
    t: f64,
}

pub struct LIFNeuron {
    /// Membrane potential
    pub membrane_potential: f32,
    /// Resting potential
    pub v_rest: f32,
    /// Threshold
    pub threshold: f32,
    /// Reset potential
    pub v_reset: f32,
    /// Membrane time constant
    pub tau: f32,
    /// Refractory period counter
    pub refractory: f32,
    /// Last spike time
    pub last_spike: f64,
    /// Spike train history
    pub spike_train: VecDeque<f64>,
}

pub struct SynapticWeight {
    /// Base weight
    pub weight: f32,
    /// Plasticity trace (for STDP)
    pub trace: f32,
    /// Delay (in dt units)
    pub delay: u16,
}

pub enum SpikeAttentionMode {
    /// Attention proportional to spike rate
    RateCoded { window: f64 },
    /// Attention from spike timing coincidence
    TemporalCoded { tau: f64 },
    /// Attention from phase coherence
    PhaseCoded { frequency: f64 },
}

impl SpikingGraphAttention {
    /// Simulate one time step
    pub fn step(
        &mut self,
        graph: &PropertyGraph,
        input_currents: &[f32],
    ) -> Vec<bool> {  // Returns which nodes spiked
        let mut spikes = vec![false; self.neurons.len()];

        for (v, neuron) in self.neurons.iter_mut().enumerate() {
            // Skip if in refractory period
            if neuron.refractory > 0.0 {
                neuron.refractory -= self.dt as f32;
                continue;
            }

            // Accumulate input from spiking neighbors
            let mut input = input_currents[v];
            for (u, synapse) in self.incoming_synapses(v, graph) {
                if self.neurons[u].spiked_at(self.t - synapse.delay as f64 * self.dt) {
                    input += synapse.weight;
                }
            }

            // LIF dynamics
            neuron.membrane_potential +=
                self.dt as f32 * (-neuron.membrane_potential + neuron.v_rest + input)
                / neuron.tau;

            // Spike check
            if neuron.membrane_potential >= neuron.threshold {
                spikes[v] = true;
                neuron.membrane_potential = neuron.v_reset;
                neuron.refractory = 2.0; // 2ms refractory
                neuron.last_spike = self.t;
                neuron.spike_train.push_back(self.t);
            }
        }

        self.t += self.dt;
        spikes
    }
}
```

---

## 3. Hebbian Learning on Graphs

### 3.1 Graph Hebbian Rules

Classical Hebb's rule: "Neurons that fire together, wire together."

**Graph Hebbian attention update:**
```
Delta_w_{uv} = eta * (
    pre_trace(u) * post_trace(v)  // Hebbian term
    - lambda * w_{uv}              // Weight decay
)
```

where pre_trace and post_trace are exponentially filtered spike trains:
```
pre_trace(u, t) = sum_{t_spike < t} exp(-(t - t_spike) / tau_pre)
post_trace(v, t) = sum_{t_spike < t} exp(-(t - t_spike) / tau_post)
```

### 3.2 Spike-Timing-Dependent Plasticity (STDP) on Graphs

STDP adjusts edge weights based on the relative timing of pre- and post-synaptic spikes:

```
Delta_w_{uv} =
  A_+ * exp(-(t_post - t_pre) / tau_+)  if t_post > t_pre  (LTP)
  -A_- * exp(-(t_pre - t_post) / tau_-)  if t_pre > t_post  (LTD)
```

- LTP (Long-Term Potentiation): Pre before post -> strengthen connection
- LTD (Long-Term Depression): Post before pre -> weaken connection

**Graph STDP attention:**
```
For each edge (u, v) in E:
  For each pair of spikes (t_u, t_v):
    dt = t_v - t_u
    if dt > 0:  // u spiked before v
      w_{uv} += A_+ * exp(-dt / tau_+)   // Strengthen u->v
    else:
      w_{uv} -= A_- * exp(dt / tau_-)     // Weaken u->v
```

**Interpretation as attention learning:** STDP automatically learns attention weights that encode causal influence in the graph. If node u's activity reliably precedes node v's, the u->v attention weight increases.

### 3.3 Homeostatic Plasticity for Attention Stability

Pure STDP can lead to runaway excitation or silencing. Homeostatic mechanisms maintain stable attention distributions:

**Intrinsic plasticity (threshold adaptation):**
```
theta_v += eta_theta * (spike_rate(v) - target_rate)
```
Nodes that spike too often raise their threshold; rarely-spiking nodes lower it.

**Synaptic scaling:**
```
w_{uv} *= (target_rate / actual_rate(v))^{1/3}
```
All incoming weights scale to maintain target activity.

**BCM rule (Bienenstock-Cooper-Munro):**
```
Delta_w_{uv} = eta * post_activity * (post_activity - theta_BCM) * pre_activity
```
The sliding threshold theta_BCM prevents both runaway excitation and complete depression.

---

## 4. Dendritic Graph Computation

### 4.1 Beyond Flat Embeddings

Standard GNNs treat each node as a single computational unit with a flat embedding vector. Real neurons have elaborate dendritic trees with nonlinear computation in individual branches.

**Dendritic graph node:**
```
Each node v has a dendritic tree D_v with:
- B branches, each receiving input from a subset of neighbors
- Nonlinear dendritic activation per branch
- Somatic integration combining branch outputs

Node embedding:
  h_v = soma(
    branch_1(inputs from neighbors N_1(v)),
    branch_2(inputs from neighbors N_2(v)),
    ...
    branch_B(inputs from neighbors N_B(v))
  )
```

**Advantage:** A single dendritic node can compute functions (like XOR) that require multiple layers of flat neurons. This makes dendritic graph transformers deeper in computational power despite being shallower in layer count.

### 4.2 Dendritic Attention Mechanism

```
For node v with B dendritic branches:

1. PARTITION neighbors into branches:
   N_1(v), N_2(v), ..., N_B(v) = partition(N(v))
   (partition can be learned or based on graph structure)

2. BRANCH computation:
   For each branch b:
     z_b = sigma(W_b * aggregate(h_u for u in N_b(v)))
     // Nonlinear dendritic activation per branch

3. BRANCH attention:
   alpha_b = softmax(W_attn * z_b)
   // Attention across branches (which branch is most relevant)

4. SOMATIC integration:
   h_v = soma(sum_b alpha_b * z_b)
   // Final node embedding
```

**Complexity:** O(|N(v)| * d + B * d) per node. The B-fold increase in parameters is compensated by the ability to use fewer layers.

**RuVector integration:** The `ruvector-nervous-system/src/dendrite/` module already implements dendritic computation. Extending it to graph attention requires:
1. Neighbor-to-branch assignment (can use graph clustering from `ruvector-mincut`)
2. Branch-level attention computation
3. Integration with the main attention trait system in `ruvector-attention`

---

## 5. Neuromorphic Hardware Deployment

### 5.1 Target Platforms (2026-2030)

| Platform | Neurons | Synapses | Power | Architecture |
|----------|---------|----------|-------|-------------|
| Intel Loihi 2 | 1M per chip | 120M | 1W | Digital LIF, programmable |
| IBM NorthPole | 256M ops/cycle | - | 12W | Digital inference |
| SynSense Speck | 320K | 65M | 0.7mW | Dynamic vision |
| BrainChip Akida | 1.2M | 10B | 1W | Event-driven |
| SpiNNaker 2 | 10M per board | 10B | 10W | ARM cores + digital neurons |

### 5.2 Graph Transformer to Neuromorphic Compilation

```
Compilation pipeline:

Source: SpikingGraphAttention (RuVector Rust)
  |
  v
Step 1: Graph Partitioning
  - Partition graph to fit chip neuron limits
  - Use ruvector-mincut for optimal partitioning
  - Map partitions to neuromorphic cores
  |
  v
Step 2: Neuron Mapping
  - Map each graph node to a hardware neuron cluster
  - Map attention weights to synaptic connections
  - Configure LIF parameters (threshold, tau, etc.)
  |
  v
Step 3: Synapse Routing
  - Map graph edges to hardware synaptic routes
  - Handle multi-hop routing for non-local edges
  - Optimize for communication bandwidth
  |
  v
Step 4: STDP Configuration
  - Program learning rules into on-chip plasticity engines
  - Set STDP time constants and learning rates
  |
  v
Target: Neuromorphic binary (Loihi SLIF, SpiNNaker PyNN, etc.)
```

**RuVector compilation target:**

```rust
/// Trait for neuromorphic compilation targets
pub trait NeuromorphicTarget {
    type Config;
    type Binary;

    /// Maximum neurons per core
    fn neurons_per_core(&self) -> usize;

    /// Maximum synapses per neuron
    fn synapses_per_neuron(&self) -> usize;

    /// Supported neuron models
    fn supported_models(&self) -> Vec<NeuronModel>;

    /// Compile spiking graph attention to target
    fn compile(
        &self,
        sgat: &SpikingGraphAttention,
        graph: &PropertyGraph,
        config: &Self::Config,
    ) -> Result<Self::Binary, CompileError>;

    /// Estimated power consumption
    fn estimate_power(
        &self,
        binary: &Self::Binary,
        spike_rate: f64,
    ) -> PowerEstimate;
}

pub struct PowerEstimate {
    pub static_power_mw: f64,
    pub dynamic_power_mw: f64,
    pub total_power_mw: f64,
    pub energy_per_spike_nj: f64,
    pub energy_per_inference_uj: f64,
}
```

---

## 6. Oscillatory Graph Attention

### 6.1 Gamma Oscillations and Binding

The brain uses oscillatory synchronization (gamma: 30-100 Hz) to bind features. Neurons representing the same object oscillate in phase; different objects oscillate out of phase.

**Oscillatory graph attention:**
```
Each node v has phase phi_v(t) and frequency omega_v:

dphi_v/dt = omega_v + sum_{u in N(v)} K_{uv} * sin(phi_u - phi_v)
```

This is a Kuramoto model on the graph. Coupled nodes synchronize; uncoupled nodes desynchronize.

**Attention from synchronization:**
```
alpha_{uv}(t) = (1 + cos(phi_u(t) - phi_v(t))) / 2
```

Synchronized nodes have attention weight 1; anti-phase nodes have weight 0.

### 6.2 Multi-Frequency Attention

Different attention heads operate at different frequencies:

```
Head h at frequency omega_h:
  phi_v^h(t) oscillates at omega_h + perturbations from neighbors
  alpha_{uv}^h(t) = (1 + cos(phi_u^h - phi_v^h)) / 2

Cross-frequency coupling:
  phi_v^{slow}(t) modulates amplitude of phi_v^{fast}(t)
  // Implements hierarchical binding:
  // slow oscillation groups communities
  // fast oscillation groups nodes within communities
```

**RuVector connection:** This connects to `ruvector-coherence`'s spectral coherence tracking. The oscillatory phases define a coherence metric on the graph.

---

## 7. Projections

### 7.1 By 2030

**Likely:**
- Spiking graph transformers achieving 100x energy efficiency over GPU versions on small graphs
- STDP-trained graph attention competitive with backprop on benchmark tasks
- Neuromorphic deployment of graph transformers on Loihi 3 / SpiNNaker 2+

**Possible:**
- Dendritic graph attention reducing required depth by 3-5x
- Oscillatory attention for temporal graph problems (event detection, anomaly detection)
- Hebbian graph learning for continual graph learning (no catastrophic forgetting)

**Speculative:**
- Brain-scale (10^10 neuron) spiking graph transformers on neuromorphic clusters
- Online unsupervised STDP learning matching supervised performance

### 7.2 By 2033

**Likely:**
- Neuromorphic graph transformer chips (custom silicon for spiking graph attention)
- Dendritic computation standard in graph attention toolkits
- 1000x energy efficiency over 2026 GPU baselines

**Possible:**
- Self-organizing spiking graph transformers that grow new neurons/connections
- Cross-frequency attention for multi-scale graph reasoning
- Neuromorphic edge AI: graph transformers in IoT sensors

### 7.3 By 2036+

**Possible:**
- Neuromorphic graph transformers matching brain efficiency (~1 nW/node)
- Spiking graph transformers with emergent cognitive-like capabilities
- Biological-digital hybrid systems (graph transformers interfacing with neural tissue)

**Speculative:**
- True neuromorphic graph intelligence: self-learning, self-organizing, self-repairing
- Graph transformers that implement cortical column dynamics

---

## 8. RuVector Implementation Roadmap

### Phase 1: Spiking Foundation (2026-2027)
- Extend `ruvector-mincut-gated-transformer/src/spike.rs` with full LIF graph dynamics
- Implement STDP learning rules in `ruvector-nervous-system/src/plasticity/`
- Add spike-based attention to `ruvector-attention` trait system
- Benchmark on neuromorphic graph datasets

### Phase 2: Dendritic & Oscillatory (2027-2028)
- Extend `ruvector-nervous-system/src/dendrite/` for graph attention
- Implement Kuramoto oscillatory attention
- Add dendritic branching strategies using `ruvector-mincut` partitioning
- Integration with `ruvector-coherence` for coherence tracking

### Phase 3: Neuromorphic Deployment (2028-2030)
- Neuromorphic compilation pipeline (Loihi, SpiNNaker targets)
- Power-optimized spiking graph attention
- Edge deployment for IoT graph processing
- WASM-based spiking graph simulation via existing WASM crates

---

## References

1. Zhu et al., "Spiking Graph Neural Networks," IEEE TNNLS 2023
2. Hazan et al., "BindsNET: A Machine Learning-Oriented Spiking Neural Networks Library in Python," Frontiers in Neuroinformatics 2018
3. Tavanaei et al., "Deep Learning in Spiking Neural Networks," Neural Networks 2019
4. London & Hausser, "Dendritic Computation," Annual Review of Neuroscience 2005
5. Poirazi & Papoutsi, "Illuminating dendritic function with computational models," Nature Reviews Neuroscience 2020
6. Breakspear, "Dynamic Models of Large-Scale Brain Activity," Nature Neuroscience 2017
7. Davies et al., "Loihi 2: A Neuromorphic Processor with Programmable Synapses and Neuron Models," IEEE Micro 2021

---

**End of Document 23**

**Next:** [Doc 24 - Quantum Graph Attention](24-quantum-graph-attention.md)