Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/crates/ruvector-attn-mincut/README.md
+++ b/vendor/ruvector/crates/ruvector-attn-mincut/README.md
@@ -0,0 +1,247 @@
+# ruvector-attn-mincut
+
+[![Crates.io](https://img.shields.io/crates/v/ruvector-attn-mincut.svg)](https://crates.io/crates/ruvector-attn-mincut)
+[![docs.rs](https://docs.rs/ruvector-attn-mincut/badge.svg)](https://docs.rs/ruvector-attn-mincut)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+
+**Dynamic min-cut gating as an alternative to softmax attention — prune low-value attention edges via graph theory.**
+
+| | Softmax Attention | Min-Cut Gated |
+|---|---|---|
+| **Attention pattern** | All-to-all (dense) | Structure-aware (sparse) |
+| **KV-cache usage** | Full | 15-40% reduction |
+| **Energy per sample** | Baseline | 10-20% lower |
+| **Coherence** | Reference | < 1% degradation |
+| **Deterministic replay** | No | SHA-256 witness chain |
+
+## Overview
+
+Standard transformer attention applies softmax uniformly across all Q*K^T logits,
+forcing every token to attend to every other token. This crate replaces that
+all-to-all pattern with a graph-theoretic approach: attention logits are modelled
+as a weighted directed graph, and Dinic's max-flow / min-cut algorithm partitions
+the graph to prune low-value attention edges. Only surviving edges pass through
+row-softmax before multiplying by V.
+
+The result is a **sparse, structure-aware attention mask** that can reduce KV-cache
+pressure, lower memory footprint, and cut energy-per-sample -- while preserving
+output coherence within configurable bounds.
+
+## Concept: How Min-Cut Replaces Softmax
+
+```
+Standard attention:           Min-cut gated attention:
+
+  Q*K^T --> softmax --> W*V     Q*K^T --> graph --> min-cut --> mask
+                                                      |
+                                              surviving edges only
+                                                      |
+                                               softmax --> W*V
+```
+
+1. **Graph construction** -- Positive logits become weighted directed edges.
+   Non-positive entries are discarded.
+2. **Min-cut gating** -- Dinic's algorithm computes an s-t min-cut. Edges whose
+   removal cost falls below `lambda * mean_weight` are pruned.
+3. **Masked softmax** -- Pruned positions are set to `-INF` so softmax zeros
+   them out. The remaining edges are normalized per row.
+4. **Hysteresis** -- A temporal tracker prevents gate masks from oscillating
+   between steps; a flip only commits after `tau` consecutive agreements.
+5. **Witness logging** -- Every gating decision is hashed with SHA-256 for
+   deterministic verification on a second machine.
+
+## Modules
+
+| Module | Purpose |
+|--------|---------|
+| `config` | `MinCutConfig` with tunable parameters and serde support |
+| `graph` | `graph_from_logits` builds an `AttentionGraph` with `Edge` list |
+| `mincut` | `DinicSolver` and `dynamic_min_cut` for s-t partitioning |
+| `gating` | `attn_softmax` (baseline) and `attn_mincut` (gated) operators |
+| `hysteresis` | `HysteresisTracker` for temporally stable gate masks |
+| `witness` | `hash_tensor` and `witness_log` for SHA-256 witness entries |
+
+## Configuration Parameters
+
+```rust
+pub struct MinCutConfig {
+    pub lambda: f32,           // Sparsity budget (0.0 = keep all, 1.0 = aggressive pruning)
+    pub tau: usize,            // Temporal hysteresis steps before a gate flip commits
+    pub eps: f32,              // Safety floor -- logits below eps are clamped to zero
+    pub seed: u64,             // RNG seed for reproducibility
+    pub witness_enabled: bool, // Whether to emit witness logs
+}
+```
+
+| Parameter | Default | Range | Effect |
+|-----------|---------|-------|--------|
+| `lambda` | 0.5 | 0.0 -- 1.0 | Higher values prune more aggressively |
+| `tau` | 2 | 0+ | Higher values stabilize masks at the cost of adaptation speed |
+| `eps` | 0.01 | > 0 | Filters near-zero logits before graph construction |
+| `seed` | 42 | any u64 | Deterministic witness hashing |
+
+## API Highlights
+
+### Build a graph and run gating
+
+```rust
+use ruvector_attn_mincut::{graph_from_logits, dynamic_min_cut};
+
+// Flattened 3x3 logit matrix (seq_len = 3)
+let logits = vec![
+    1.0, 0.5, 0.0,
+    0.0, 1.0, 0.5,
+    0.0, 0.0, 1.0,
+];
+
+let graph = graph_from_logits(&logits, 3);
+println!("Edges: {}", graph.edges.len()); // only positive logits
+
+let result = dynamic_min_cut(&logits, 3, 0.5, 2, 0.01);
+println!("Kept {}/{} edges", result.edges_kept, result.edges_total);
+println!("Cut cost: {:.3}", result.cut_cost);
+```
+
+### End-to-end gated attention
+
+```rust
+use ruvector_attn_mincut::{attn_softmax, attn_mincut};
+
+let seq_len = 4;
+let d = 8;
+let q = vec![0.1f32; seq_len * d];
+let k = vec![0.1f32; seq_len * d];
+let v = vec![1.0f32; seq_len * d];
+
+// Baseline
+let baseline = attn_softmax(&q, &k, &v, d, seq_len);
+
+// Min-cut gated (lambda=0.5, tau=2, eps=0.01)
+let gated = attn_mincut(&q, &k, &v, d, seq_len, 0.5, 2, 0.01);
+
+println!("Output length: {}", gated.output.len());        // seq_len * d
+println!("Edges kept: {}", gated.gating.edges_kept);
+println!("Edges total: {}", gated.gating.edges_total);
+```
+
+### Temporal hysteresis
+
+```rust
+use ruvector_attn_mincut::HysteresisTracker;
+
+let mut tracker = HysteresisTracker::new(3); // tau = 3 steps
+
+let mask_a = vec![true, true, false, true];
+let stabilized = tracker.apply(&mask_a);     // first call passes through
+
+// Subsequent calls require tau consecutive disagreements to flip
+let mask_b = vec![false, true, true, true];
+let still_a = tracker.apply(&mask_b);        // not flipped yet (1/3)
+```
+
+### Witness logging
+
+```rust
+use ruvector_attn_mincut::{hash_tensor, witness_log, WitnessEntry};
+
+let q_hash = hash_tensor(&[1.0, 2.0, 3.0]);
+let entry = WitnessEntry {
+    q_hash,
+    k_hash: hash_tensor(&[4.0, 5.0, 6.0]),
+    keep_mask: vec![true, false, true, true],
+    cut_cost: 1.5,
+    lambda: 0.5,
+    tau: 2,
+    eps: 0.01,
+    timestamp: 1700000000,
+};
+let jsonl_line = witness_log(&entry);
+```
+
+## Expected Benefits
+
+| Metric | Typical improvement | Notes |
+|--------|-------------------|-------|
+| KV-cache memory | 15--40% reduction | Pruned edges skip cache allocation |
+| Peak RSS | 10--25% reduction | Fewer active attention paths |
+| Energy per sample | 10--20% reduction | Less compute on sparse masks |
+| Coherence delta | < 1% degradation | Tunable via lambda/tau |
+
+## Dependencies
+
+- `serde` / `serde_json` -- serialization for configs and witness entries
+- `sha2` -- SHA-256 hashing for deterministic witness chain
+
+## Architecture
+
+```
+attn_mincut --> coherence metrics --> profiler CSV --> analysis
+```
+
+All public types implement `Debug` and `Clone`. Config and result types implement
+`Serialize` / `Deserialize` for JSON round-tripping.
+
+<details>
+<summary><strong>Tutorial: End-to-End Min-Cut Benchmark</strong></summary>
+
+### Step 1: Configure and run gated attention
+
+```rust
+use ruvector_attn_mincut::{MinCutConfig, attn_softmax, attn_mincut};
+
+let config = MinCutConfig {
+    lambda: 0.5,     // moderate pruning
+    tau: 2,          // 2-step hysteresis
+    eps: 0.01,       // filter near-zero logits
+    seed: 42,
+    witness_enabled: true,
+};
+
+let (seq_len, d) = (64, 128);
+let q = vec![0.1f32; seq_len * d];
+let k = vec![0.1f32; seq_len * d];
+let v = vec![1.0f32; seq_len * d];
+
+let baseline = attn_softmax(&q, &k, &v, d, seq_len);
+let gated = attn_mincut(&q, &k, &v, d, seq_len, config.lambda, config.tau, config.eps);
+
+println!("Pruned {}/{} edges",
+    gated.gating.edges_total - gated.gating.edges_kept,
+    gated.gating.edges_total);
+```
+
+### Step 2: Measure coherence
+
+```rust
+use ruvector_coherence::{quality_check, evaluate_batch};
+
+let result = quality_check(&baseline.output, &gated.output, 0.99);
+println!("Cosine sim: {:.4} | Passes: {}", result.cosine_sim, result.passes_threshold);
+```
+
+### Step 3: Profile and export
+
+```rust
+use ruvector_profiler::{compute_latency_stats, write_results_csv};
+// ... collect timing data, export CSV
+```
+
+### Step 4: Run the benchmark grid
+
+```bash
+./scripts/run_mincut_bench.sh --samples 1000 --lambda "0.3 0.5 0.7" --tau "0 2"
+```
+
+</details>
+
+## Related Crates
+
+| Crate | Role |
+|-------|------|
+| [`ruvector-coherence`](../ruvector-coherence/README.md) | Measures output quality after gating |
+| [`ruvector-profiler`](../ruvector-profiler/README.md) | Memory, power, latency benchmarking |
+| [`ruvector-solver`](../ruvector-solver/README.md) | Sublinear solvers powering the graph algorithms |
+
+## License
+
+Licensed under the [MIT License](../../LICENSE).