Files
wifi-densepose/docs/implementation/SPECULATIVE_DECODING.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

149 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# EAGLE-3 Speculative Decoding
Implementation of EAGLE-3 style speculative decoding for the mincut-gated-transformer crate.
## Overview
Speculative decoding accelerates inference by drafting multiple tokens in parallel and verifying them against the target model using rejection sampling. This implementation uses mincut λ-stability as a confidence signal to guide draft tree generation.
## Files
- `/home/user/ruvector/crates/ruvector-mincut-gated-transformer/src/speculative.rs` - Core implementation
## Key Features
### 1. Draft Tree Generation
Dynamic tree structure that adapts based on model confidence:
```rust
let config = SpeculativeConfig {
max_draft_tokens: 5, // Draft up to 5 tokens ahead
tree_width: 3, // Up to 3 branches per node
acceptance_threshold: 0.7, // 70% confidence for acceptance
use_lambda_guidance: true, // Use λ as confidence signal
};
let decoder = SpeculativeDecoder::new(config);
let tree = decoder.generate_draft_tree(lambda, lambda_prev, draft_logits);
```
### 2. λ-Guided Confidence
Uses mincut λ-stability to scale draft confidence:
- **Higher λ** = More stable partitioning = Higher draft confidence
- **Increasing λ** = Improving stability = Confidence bonus
- **Decreasing λ** = Degrading stability = Confidence penalty
### 3. Adaptive Tree Width
Tree branching adapts to confidence levels:
- **High confidence (≥0.9)**: Narrow tree (fewer branches)
- **Medium confidence (0.6-0.9)**: Normal width
- **Low confidence (0.3-0.6)**: Wider tree (more exploration)
- **Very low confidence (<0.3)**: Minimal branching
### 4. Rejection Sampling Verification
EAGLE-3 style verification using:
```
accept_prob = min(1, target_prob / draft_prob)
```
Drafts are accepted if they match the target model's distribution.
### 5. Tree Attention Masks
Parallel verification of draft tokens using causal tree attention:
```rust
let mask = generate_tree_attention_mask(&tree, seq_len);
// Each token can attend to all ancestors in its path
```
## Usage Example
```rust
use ruvector_mincut_gated_transformer::prelude::*;
// Create decoder
let config = SpeculativeConfig::default();
let decoder = SpeculativeDecoder::new(config);
// Generate draft tree (5 tokens, dynamic structure)
let lambda = 100; // Current mincut stability
let lambda_prev = 95; // Previous stability
let draft_logits = vec![vec![0.0; 1000]; 5]; // Draft model outputs
let tree = decoder.generate_draft_tree(lambda, lambda_prev, &draft_logits);
// Verify against target model
let target_logits = vec![vec![0.0; 1000]; 5]; // Target model outputs
let result = decoder.verify_drafts(&tree, &target_logits, 1.0);
println!("Accepted {} tokens with {:.1}% acceptance rate",
result.accepted_count,
result.acceptance_rate * 100.0);
```
## Performance Characteristics
- **Speedup**: 2-5x for high acceptance rates
- **Memory**: O(max_draft_tokens × tree_width × vocab_size)
- **Overhead**: ~10% for low acceptance rates
- **Best case**: Stable models (high λ) with predictable outputs
## Academic Foundation
Based on **EAGLE-3** (NeurIPS 2025):
1. **Dynamic tree structure**: Adapts to model confidence
2. **Multi-level feature fusion**: Uses λ-stability as confidence signal
3. **Rejection sampling**: Mathematically correct acceptance criteria
4. **Tree attention**: Parallel draft verification
## Integration with Mincut Gating
The speculative decoder integrates with the mincut-gated-transformer's coherence signals:
- **λ-stability** guides draft confidence
- **High λ** (stable partitioning) → More aggressive speculation
- **Low λ** (unstable partitioning) → Conservative speculation
- **λ trends** influence tree width adaptation
## Testing
Comprehensive test suite covering:
- ✓ Single-path speculation (sequential drafting)
- ✓ Tree speculation with branching (parallel drafting)
- ✓ Rejection sampling correctness
- ✓ λ-guided confidence scaling
- ✓ Draft verification against target model
- ✓ Tree attention mask generation
- ✓ Adaptive tree width calculation
- ✓ Edge cases (empty inputs, etc.)
Run tests:
```bash
cd crates/ruvector-mincut-gated-transformer
cargo test --lib speculative
```
All 8 tests pass successfully.
## Future Enhancements
Potential improvements:
1. **Multi-token drafting**: Draft multiple positions simultaneously
2. **Learned draft models**: Train lightweight draft models
3. **Dynamic threshold adaptation**: Adjust acceptance threshold based on λ
4. **Quantized drafting**: Use INT8/INT4 for draft model
5. **Cached drafts**: Reuse draft trees across timesteps
6. **Hybrid verification**: Combine rejection sampling with direct comparison