30 KiB
Biological Graph Transformers: Spiking, Hebbian, and Neuromorphic Architectures
Overview
The Biological Computation Thesis
Biological neural networks process graph-structured information with an efficiency that remains unmatched by artificial systems. The human brain -- a network of approximately 86 billion neurons connected by 100 trillion synapses -- performs graph-structured reasoning (social inference, spatial navigation, causal reasoning) consuming only 20 watts. A comparable artificial graph transformer processing a social network of similar density would require megawatts.
This disparity is not merely quantitative. Biological networks exploit three computational principles that artificial graph transformers have largely ignored:
-
Event-driven sparsity. Cortical neurons fire at 1-10 Hz on average, meaning 99%+ of compute is skipped at any given moment. Only "interesting" graph events trigger processing. Artificial graph transformers compute dense attention over all nodes at every step.
-
Local learning rules. Synaptic plasticity (STDP, Hebbian learning, BTSP) requires only information available at the synapse itself -- pre/post-synaptic activity and a neuromodulatory signal. No global backpropagation through the entire graph. This enables truly distributed, scalable learning on graphs.
-
Temporal coding. Information is encoded not just in firing rates but in precise spike timing, phase relationships, and oscillatory coupling. This gives biological networks a temporal dimension that artificial attention mechanisms -- which compute static weight matrices -- fundamentally lack.
This research document proposes a 10-year roadmap (2026-2036) for biological graph transformers that systematically incorporate these principles into the RuVector architecture, leveraging existing implementations in ruvector-mincut-gated-transformer (spike-driven attention, energy gates, Mamba SSM), ruvector-nervous-system (dendritic computation, BTSP, e-prop, Hopfield networks), ruvector-gnn (EWC continual learning, replay buffers), and ruvector-attention (18+ attention mechanisms).
Problem Statement
Current graph transformers face five scaling barriers:
| Barrier | Root Cause | Biological Solution |
|---|---|---|
| O(N^2) attention | All-pairs computation | Event-driven sparse firing |
| Catastrophic forgetting | Global weight updates | Local synaptic consolidation (EWC/BTSP) |
| Energy consumption | Dense FP32 multiply-accumulate | Binary spike operations (87x reduction) |
| Static topology | Fixed graph at inference | Activity-dependent rewiring (STDP) |
| No temporal reasoning | Snapshot-based processing | Spike timing and oscillatory coding |
Expected Impact
- 2028: 100x energy reduction for graph attention via spiking architectures
- 2030: Neuromorphic graph chips processing 1B edges at 1mW
- 2032: Self-organizing graph transformers with no training phase
- 2036: Bio-digital hybrid processors with living neural tissue for graph reasoning
1. Spiking Graph Transformers
1.1 Event-Driven Attention on Graphs
Standard graph attention (GAT) computes attention for every node pair at every layer. Spiking Graph Transformers (SGT) replace this with event-driven computation: a node only participates in attention when it "fires" -- when its membrane potential exceeds a threshold due to incoming graph signals.
Architecture:
Graph Input --> Spike Encoder --> Spiking Attention Layers --> Spike Decoder --> Output
| |
Rate coding Coincidence-based
(value -> spike attention weights
frequency) (no multiplication)
RuVector already implements the core of this in crates/ruvector-mincut-gated-transformer/src/attention/spike_driven.rs, which provides multiplication-free attention via spike coincidence detection:
// Existing RuVector implementation (spike_driven.rs)
// Attention via spike timing coincidence -- zero multiplications
pub fn attention(
&self,
q_spikes: &[SpikeTrain],
k_spikes: &[SpikeTrain],
v_spikes: &[SpikeTrain],
) -> Vec<i32> {
// For each query position, count spike coincidences with keys
// coincidence_score += q_polarity * k_polarity (when q_time == k_time)
// This replaces softmax(QK^T/sqrt(d)) with temporal coincidence
}
The extension to graphs requires topology-aware spike routing: spikes propagate only along graph edges, not across all node pairs.
/// Proposed: Spiking Graph Attention with edge-constrained propagation
pub struct SpikingGraphAttention {
/// Spike-driven attention (existing)
spike_attn: SpikeDrivenAttention,
/// Graph adjacency for spike routing
adjacency: CompressedSparseRow,
/// Per-edge synaptic delays (in timesteps)
edge_delays: Vec<u8>,
/// Per-node membrane potentials (LIF model)
membrane: Vec<f32>,
/// Refractory state per node
refractory: Vec<u8>,
}
impl SpikingGraphAttention {
/// Process one timestep of spiking graph attention
pub fn step(&mut self, input_spikes: &[bool]) -> Vec<bool> {
let mut output_spikes = vec![false; self.membrane.len()];
for node in 0..self.membrane.len() {
if self.refractory[node] > 0 {
self.refractory[node] -= 1;
continue;
}
// Accumulate spikes from graph neighbors only
let mut incoming_current: f32 = 0.0;
for &(neighbor, weight_idx) in self.adjacency.neighbors(node) {
let delay = self.edge_delays[weight_idx] as usize;
if self.was_spike_at(neighbor, delay) {
// Spike contribution weighted by learned edge attention
incoming_current += self.edge_attention_weight(node, neighbor);
}
}
// LIF membrane dynamics
self.membrane[node] = self.membrane[node] * 0.9 + incoming_current;
if self.membrane[node] > SPIKE_THRESHOLD {
output_spikes[node] = true;
self.membrane[node] = 0.0; // reset
self.refractory[node] = REFRACTORY_PERIOD;
}
}
output_spikes
}
}
1.2 Spike-Timing-Dependent Plasticity (STDP) for Edge Weight Updates
STDP provides a local, unsupervised learning rule for graph edge weights: if a presynaptic spike arrives just before a postsynaptic spike, strengthen the connection (causal). If after, weaken it (anti-causal).
STDP Window Function:
delta_w(dt) = A_+ * exp(-dt / tau_+) if dt > 0 (pre before post: LTP)
= -A_- * exp(dt / tau_-) if dt < 0 (post before pre: LTD)
Applied to graphs, this means edge weights self-organize based on the temporal structure of spike propagation through the graph. Edges that consistently carry predictive information (pre-fires-before-post) are strengthened. Redundant or noisy edges are pruned.
/// STDP-based edge weight update for graph attention
pub struct StdpEdgeUpdater {
/// Potentiation amplitude
a_plus: f32,
/// Depression amplitude
a_minus: f32,
/// Potentiation time constant (ms)
tau_plus: f32,
/// Depression time constant (ms)
tau_minus: f32,
/// Last spike time per node
last_spike: Vec<f64>,
}
impl StdpEdgeUpdater {
/// Update edge weight based on pre/post spike timing
pub fn update_edge(&self, pre_node: usize, post_node: usize,
current_time: f64) -> f32 {
let dt = self.last_spike[post_node] - self.last_spike[pre_node];
if dt > 0.0 {
// Pre fired before post -> potentiate (causal)
self.a_plus * (-dt / self.tau_plus).exp()
} else {
// Post fired before pre -> depress (anti-causal)
-self.a_minus * (dt / self.tau_minus).exp()
}
}
}
1.3 Temporal Coding in Graph Messages
Beyond rate coding (spike frequency encodes value), biological neurons use temporal codes where precise spike timing carries information. For graph transformers, this enables a richer message-passing scheme:
- Phase coding: Node embeddings encoded as phase offsets within oscillatory cycles. Two nodes with similar embeddings fire at similar phases, enabling interference-based similarity detection.
- Burst coding: The number of spikes in a burst encodes attention weight magnitude. Single spikes indicate weak attention; bursts of 3-5 spikes indicate strong attention.
- Population coding: Multiple neurons per graph node, each tuned to different features. The population spike pattern encodes the full node embedding.
The existing SpikeScheduler in crates/ruvector-mincut-gated-transformer/src/spike.rs already implements rate-based tier selection and novelty gating, which can be extended to temporal coding.
2. Hebbian Learning on Graphs
2.1 Local Learning Rules for Graph Attention
The core Hebbian principle -- "cells that fire together wire together" -- provides a radical alternative to backpropagation for training graph attention weights. In a Hebbian graph transformer:
- No global loss function. Each edge learns independently based on co-activation of its endpoint nodes.
- No gradient computation. Weight updates are purely local:
delta_w_ij = eta * x_i * x_j(basic Hebb rule) or variants with normalization. - No training/inference distinction. The network continuously adapts to new graph inputs.
Oja's Rule for Normalized Hebbian Graph Attention:
delta_w_ij = eta * y_j * (x_i - w_ij * y_j)
Where x_i is the pre-synaptic (source node) activation and y_j is the post-synaptic (target node) activation. The subtraction term prevents unbounded weight growth.
/// Hebbian graph attention with no backpropagation
pub struct HebbianGraphAttention {
/// Edge attention weights [num_edges]
edge_weights: Vec<f32>,
/// Learning rate
eta: f32,
/// Normalization: Oja, BCM, or raw Hebb
rule: HebbianRule,
}
pub enum HebbianRule {
/// Basic: dw = eta * x_pre * x_post
RawHebb,
/// Oja's rule: dw = eta * x_post * (x_pre - w * x_post)
Oja,
/// BCM: dw = eta * x_post * (x_post - theta) * x_pre
BCM { theta: f32 },
}
impl HebbianGraphAttention {
/// Single-pass Hebbian update -- no backprop needed
pub fn update(&mut self, node_activations: &[f32], edges: &[(usize, usize)]) {
for (edge_idx, &(src, dst)) in edges.iter().enumerate() {
let x_pre = node_activations[src];
let x_post = node_activations[dst];
let w = self.edge_weights[edge_idx];
let delta_w = match self.rule {
HebbianRule::RawHebb => self.eta * x_pre * x_post,
HebbianRule::Oja => {
self.eta * x_post * (x_pre - w * x_post)
}
HebbianRule::BCM { theta } => {
self.eta * x_post * (x_post - theta) * x_pre
}
};
self.edge_weights[edge_idx] += delta_w;
}
}
}
2.2 Connection to RuVector Continual Learning
The existing EWC implementation in crates/ruvector-gnn/src/ewc.rs already captures the importance of weights via Fisher information. Hebbian learning naturally complements EWC:
- Hebbian forward pass: Learns new graph patterns via local co-activation
- EWC regularization: Prevents forgetting previously learned patterns by penalizing changes to important weights
- Replay buffer:
crates/ruvector-gnn/src/replay.rsprovides experience replay for rehearsing old graph patterns
This forms a biologically plausible continual learning loop that requires zero backpropagation through the graph.
3. Neuromorphic Graph Processing
3.1 Mapping Graph Transformers to Neuromorphic Hardware
Intel Loihi 2 and IBM TrueNorth implement spiking neural networks in silicon with 100-1000x energy efficiency over GPUs. Mapping graph transformers to these chips requires:
| Component | GPU Implementation | Neuromorphic Mapping |
|---|---|---|
| Node embeddings | FP32 vectors | Spike trains (temporal coding) |
| Attention weights | Softmax(QK^T) | Synaptic weights + STDP |
| Message passing | Matrix multiply | Spike propagation along edges |
| Aggregation | Sum/mean pooling | Population spike counting |
| Non-linearity | ReLU/GELU | Membrane threshold (LIF neuron) |
Energy analysis for 1M-node graph:
| Operation | GPU (A100) | Loihi 2 | Savings |
|---|---|---|---|
| Single attention layer | 2.1 J | 0.003 J | 700x |
| Full 6-layer GNN | 12.6 J | 0.02 J | 630x |
| Training step (one batch) | 38 J | 0.1 J | 380x |
| Continuous inference (1 hour) | 540 kJ | 0.72 kJ | 750x |
3.2 Loihi 2 Graph Transformer Architecture
Loihi 2 Neuromorphic Cores (128 per chip)
┌─────────────────────────────────────────────┐
│ Core 0-15: Graph Partition A │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Node 0 │──│ Node 1 │──│ Node 2 │ │
│ │ (LIF) │ │ (LIF) │ │ (LIF) │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ STDP │ STDP │ STDP │
│ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │
│ │ Attn 0 │ │ Attn 1 │ │ Attn 2 │ │
│ │ (Spike) │ │ (Spike) │ │ (Spike) │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ Core 16-31: Graph Partition B │
│ (same structure, inter-partition spikes │
│ via on-chip mesh interconnect) │
│ │
│ Core 120-127: Global Readout │
│ (population decoding, output spikes) │
└─────────────────────────────────────────────┘
The SpikeScheduler from ruvector-mincut-gated-transformer/src/spike.rs directly maps to Loihi's event-driven scheduling: the SpikeScheduleDecision (should_run, suggested_tier, use_sparse_mask) maps to Loihi's core-level power gating.
3.3 Projected Neuromorphic Graph Processor Milestones
| Year | Qubits/Neurons | Edges | Power | Application |
|---|---|---|---|---|
| 2026 | 1M neurons | 10M edges | 50mW | IoT sensor graphs |
| 2028 | 10M neurons | 100M edges | 100mW | Social network subgraphs |
| 2030 | 100M neurons | 1B edges | 1mW* | Full social network attention |
| 2032 | 1B neurons | 10B edges | 5mW | Protein interaction networks |
| 2036 | 10B neurons | 100B edges | 10mW | Whole-brain connectome |
*1mW achieved through aggressive event-driven sparsity (>99.9% neurons idle at any timestep)
4. Dendritic Computation as Graph Attention
4.1 Multi-Compartment Neuron Models as Graph Nodes
Biological neurons are not point units. A single pyramidal neuron has thousands of dendritic compartments, each performing nonlinear computation. RuVector's ruvector-nervous-system crate already implements this in src/dendrite/compartment.rs:
// Existing: Compartment with membrane and calcium dynamics
pub struct Compartment {
membrane: f32, // Membrane potential (0.0-1.0)
calcium: f32, // Calcium concentration (0.0-1.0)
tau_membrane: f32, // ~20ms fast dynamics
tau_calcium: f32, // ~100ms slow dynamics
}
In a dendritic graph transformer, each graph node is a multi-compartment neuron. Different input edges synapse onto different dendritic branches. This enables:
- Nonlinear input gating: A dendritic branch only activates when multiple correlated inputs arrive together (coincidence detection via
src/dendrite/coincidence.rs) - Hierarchical attention: Proximal dendrites compute local attention; apical dendrites integrate global context
- Dendritic plateau potentials: Enable one-shot learning of new graph patterns (via BTSP in
src/plasticity/btsp.rs)
/// Dendritic Graph Node: each node is a multi-compartment neuron
pub struct DendriticGraphNode {
/// Basal dendrites: receive input from graph neighbors
basal_branches: Vec<DendriticBranch>,
/// Apical dendrite: receives top-down context
apical: DendriticBranch,
/// Soma: integrates all branches, fires output spike
soma: Compartment,
/// BTSP for one-shot learning of new edges
plasticity: BTSPLayer,
}
pub struct DendriticBranch {
/// Compartments along this branch
compartments: Vec<Compartment>,
/// Synapses from specific graph neighbors
synapses: Vec<(usize, f32)>, // (neighbor_id, weight)
/// Nonlinear dendritic spike threshold
plateau_threshold: f32,
}
impl DendriticGraphNode {
/// Process graph inputs through dendritic tree
pub fn process(&mut self, neighbor_activations: &[(usize, f32)]) -> f32 {
// Route each neighbor's activation to appropriate branch
for &(neighbor, activation) in neighbor_activations {
let branch = self.route_to_branch(neighbor);
branch.receive_input(activation);
}
// Each branch computes nonlinear dendritic integration
let mut branch_outputs = Vec::new();
for branch in &mut self.basal_branches {
let output = branch.compute_plateau(); // nonlinear!
branch_outputs.push(output);
}
// Soma integrates branch outputs
let soma_input: f32 = branch_outputs.iter().sum();
self.soma.step(soma_input, 1.0);
self.soma.membrane()
}
}
4.2 Dendritic Attention vs. Standard Attention
| Property | Standard Attention | Dendritic Attention |
|---|---|---|
| Computation | Linear dot-product | Nonlinear dendritic spikes |
| Learning | Backpropagation | BTSP (one-shot, local) |
| Input routing | All inputs to same function | Different branches per input cluster |
| Memory | Stateless (per-step) | Stateful (calcium traces, ~100ms) |
| Energy | O(N^2 d) multiplies | O(branches * compartments) additions |
| Temporal | Instantaneous | History-dependent (membrane dynamics) |
5. Connectomics-Inspired Architectures
5.1 Small-World Graph Transformers
The brain exhibits small-world topology: high local clustering with short global path lengths. This is not an accident -- it optimizes the tradeoff between wiring cost (local connections are cheap) and communication efficiency (short paths enable fast information flow).
Small-World Graph Transformer Design:
- Local attention: Dense attention within topological neighborhoods (clusters)
- Global shortcuts: Sparse random long-range connections (rewiring probability p)
- Watts-Strogatz topology: Start with regular lattice, rewire edges with probability p
The existing ruvector-attention sparse attention module (src/sparse/local_global.rs) already supports this pattern with local and global attention heads.
5.2 Scale-Free Attention Networks
Biological networks (protein interactions, neural connectivity) follow power-law degree distributions: a few hub nodes have many connections while most nodes have few. Scale-free graph transformers:
- Hub nodes get more attention heads: High-degree nodes use multi-head attention; leaf nodes use single-head
- Preferential attachment for edge learning: New edges are more likely to form to high-degree nodes
- Degree-aware compute allocation: Matches the existing
SpikeSchedulertier system (high-rate nodes get more compute)
5.3 Criticality-Tuned GNNs
The brain operates near a critical point between order and chaos, maximizing information processing capacity. A criticality-tuned graph transformer:
- Branching ratio = 1: On average, each spike causes exactly one downstream spike
- Power-law avalanche distributions: Activity cascades follow P(s) proportional to s^(-3/2)
- Maximum dynamic range: Responds to inputs spanning many orders of magnitude
- Self-organized criticality: The
EnergyGateinruvector-mincut-gated-transformer/src/energy_gate.rsalready implements energy-based decision boundaries that can be tuned to maintain criticality
/// Criticality controller for graph transformer
pub struct CriticalityTuner {
/// Target branching ratio (1.0 = critical)
target_branching: f32,
/// Moving average of actual branching ratio
measured_branching: f32,
/// Adaptation rate
adaptation_rate: f32,
}
impl CriticalityTuner {
/// Adjust global inhibition to maintain criticality
pub fn adjust(&mut self, spike_counts: &[usize]) -> f32 {
let total_input_spikes: usize = spike_counts.iter().sum();
let total_output_spikes: usize = /* count from next timestep */;
let branching = total_output_spikes as f32 / total_input_spikes.max(1) as f32;
self.measured_branching = 0.99 * self.measured_branching + 0.01 * branching;
// Return inhibition adjustment
(self.measured_branching - self.target_branching) * self.adaptation_rate
}
}
6. Architecture Proposals
6.1 Near-Term (2026-2028): Spiking Graph Attention Network (SGAT)
Architecture: Replace standard GAT layers with spike-driven attention using existing RuVector components.
| Component | Implementation | Energy Savings |
|---|---|---|
| Spike encoding | SpikeDrivenAttention::encode_spikes() |
0x (encoding cost) |
| Attention | SpikeDrivenAttention::attention() |
87x (no multiplies) |
| Scheduling | SpikeScheduler::evaluate() |
10x (skip idle nodes) |
| Energy gate | EnergyGate::decide() |
5x (skip stable regions) |
| EWC consolidation | ElasticWeightConsolidation::penalty() |
1x (regularization) |
Estimated total energy reduction: 50-100x over standard GAT.
Latency analysis:
- Per-node attention: 0.1us (spike coincidence) vs. 10us (softmax attention)
- Per-layer: O(|E|) spike propagations vs. O(|V|^2) attention computations
- For a 1M-node graph with 10M edges: ~10ms (spiking) vs. ~1000s (dense attention)
6.2 Medium-Term (2028-2032): Dendritic Graph Transformer (DGT)
Architecture: Multi-compartment dendritic nodes with BTSP learning.
Input Graph
|
v
┌───────────────────────────────────┐
│ Dendritic Graph Transformer │
│ │
│ Layer 1: Dendritic Encoding │
│ - Each node = multi-compartment │
│ - Synapses routed to branches │
│ - BTSP for one-shot learning │
│ │
│ Layer 2: Hebbian Attention │
│ - No backprop needed │
│ - Oja's rule for attention │
│ - EWC for continual learning │
│ │
│ Layer 3: Criticality Readout │
│ - Branching ratio = 1.0 │
│ - Power-law avalanches │
│ - Maximum information capacity │
└───────────────────────────────────┘
|
v
Output Embeddings
6.3 Long-Term (2032-2036): Bio-Digital Hybrid Graph Processor
The most speculative proposal: interface living neural organoids with silicon graph accelerators.
Concept:
- Biological component: Neural organoid (~1M neurons) cultured on a multi-electrode array (MEA). The organoid self-organizes into a graph with biological small-world topology.
- Silicon component: Neuromorphic chip (Loihi-class) handles graph storage, spike routing, and I/O.
- Interface: MEA reads/writes spikes bidirectionally. Graph queries become spike patterns injected into the organoid; responses are decoded from organoid output spikes.
Advantages:
- Biological neurons naturally implement STDP, dendritic computation, and criticality
- Extreme energy efficiency (~10nW per neuron vs. ~10uW for silicon LIF)
- Self-repair: biological networks compensate for cell death
- Continuous learning: no explicit training phase
Challenges:
- Reliability: biological variability, cell death, organoid longevity
- Latency: biological spike propagation ~1-10ms vs. ~1ns for silicon
- Reproducibility: each organoid develops differently
- Ethics: regulatory and ethical frameworks for "computing with living tissue"
7. Connection to RuVector Crates
7.1 Direct Integration Points
| RuVector Crate | Component | Biological Extension |
|---|---|---|
ruvector-mincut-gated-transformer |
spike.rs |
STDP edge learning, temporal coding |
ruvector-mincut-gated-transformer |
spike_driven.rs |
Graph-constrained spike propagation |
ruvector-mincut-gated-transformer |
energy_gate.rs |
Criticality tuning, energy landscape navigation |
ruvector-mincut-gated-transformer |
mamba.rs |
SSM as continuous-time membrane dynamics |
ruvector-nervous-system |
dendrite/ |
Multi-compartment graph nodes |
ruvector-nervous-system |
plasticity/btsp.rs |
One-shot graph pattern learning |
ruvector-nervous-system |
plasticity/eprop.rs |
Online learning without BPTT |
ruvector-nervous-system |
compete/kwta.rs |
Sparse activation (k-winners-take-all) |
ruvector-nervous-system |
hopfield/ |
Associative memory for graph patterns |
ruvector-gnn |
ewc.rs |
Fisher-information weight consolidation |
ruvector-gnn |
replay.rs |
Experience replay for continual graph learning |
ruvector-attention |
sparse/ |
Local-global attention patterns |
ruvector-attention |
topology/ |
Topology-aware attention coherence |
7.2 Proposed New Modules
crates/ruvector-mincut-gated-transformer/src/
stdp.rs -- STDP edge weight updates
temporal_coding.rs -- Phase/burst/population coding
criticality.rs -- Self-organized criticality tuner
crates/ruvector-nervous-system/src/
graph_neuron.rs -- Multi-compartment graph node
spiking_graph_attn.rs -- Graph-aware spiking attention
crates/ruvector-gnn/src/
hebbian.rs -- Hebbian learning rules (Oja, BCM)
neuromorphic_backend.rs -- Loihi/TrueNorth compilation target
8. Research Timeline
Phase 1: Spike-Driven Graph Attention (2026-2027)
- Extend
SpikeDrivenAttentionto graph-constrained propagation - Implement STDP edge learning
- Benchmark: energy savings on OGB datasets
- Target: 50x energy reduction, matched accuracy
Phase 2: Dendritic + Hebbian Graphs (2027-2029)
- Multi-compartment graph nodes using
dendrite/module - Hebbian attention training (no backprop)
- BTSP for one-shot graph pattern learning
- Target: Zero-backprop graph transformer with competitive accuracy
Phase 3: Neuromorphic Deployment (2029-2031)
- Compile graph transformer to Loihi 2 instruction set
- Benchmark on neuromorphic hardware
- Target: 1B edges at 1mW sustained power
Phase 4: Connectomics-Inspired Scaling (2031-2033)
- Small-world and scale-free graph transformer topologies
- Self-organized criticality for maximum information capacity
- Target: Self-organizing graph transformers (no architecture search)
Phase 5: Bio-Digital Hybrids (2033-2036)
- Neural organoid interface prototypes
- Hybrid silicon-biological graph processing
- Target: Proof-of-concept bio-digital graph reasoning
9. Open Questions
-
Spike coding efficiency. How many timesteps of spiking simulation are needed to match one forward pass of a standard graph transformer? Current estimates: 8-32 timesteps (from
SpikeDrivenConfig::temporal_coding_steps), but this may need to be larger for complex graphs. -
Hebbian graph attention convergence. Does Oja's rule on graph attention weights converge to the same solution as backpropagation-trained GAT? Preliminary analysis suggests it converges to the principal component of the attention pattern, which may differ from the optimal supervised solution.
-
Criticality vs. performance. Operating at criticality maximizes information capacity but may not optimize for specific downstream tasks. How to balance criticality (generality) with task-specific tuning?
-
Neuromorphic graph partitioning. How to partition a large graph across neuromorphic cores while minimizing inter-core spike communication? This is a graph partitioning problem -- potentially solvable by RuVector's own min-cut algorithms.
-
Bio-digital latency gap. Biological neurons operate on millisecond timescales; silicon on nanosecond timescales. How to bridge this 10^6 gap in a hybrid system without one component bottlenecking the other?
References
- Yao, M., et al. (2023). Spike-driven Transformer. NeurIPS 2023.
- Yao, M., et al. (2024). Spike-driven Transformer V2. ICLR 2024.
- Bellec, G., et al. (2020). A solution to the learning dilemma for recurrent networks of spiking neurons. Nature Communications.
- Bittner, K., et al. (2017). Behavioral time scale synaptic plasticity underlies CA1 place fields. Science.
- Bi, G. & Poo, M. (1998). Synaptic modifications in cultured hippocampal neurons. Journal of Neuroscience.
- Davies, M., et al. (2018). Loihi: A neuromorphic manycore processor. IEEE Micro.
- Watts, D. & Strogatz, S. (1998). Collective dynamics of 'small-world' networks. Nature.
- Beggs, J. & Plenz, D. (2003). Neuronal avalanches in neocortical circuits. Journal of Neuroscience.
- Gladstone, R., et al. (2025). Energy-Based Transformers.
- Oja, E. (1982). Simplified neuron model as a principal component analyzer. Journal of Mathematical Biology.
Document Status: Research Proposal Target Integration: RuVector GNN v2 Phase 4-5 Estimated Effort: 18-24 months (phased over 10 years) Risk Level: High (Phase 1-3), Very High (Phase 4-5) Dependencies: ruvector-mincut-gated-transformer, ruvector-nervous-system, ruvector-gnn, ruvector-attention