Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/crates/ruvector-sparse-inference/tests/README.md
+++ b/vendor/ruvector/crates/ruvector-sparse-inference/tests/README.md
@@ -0,0 +1,188 @@
+# Sparse Inference Engine - Test Suite
+
+Comprehensive test suite for the RuVector sparse inference engine with 78+ tests and 10 benchmarks across 1516 lines of test code.
+
+## Test Structure
+
+### Unit Tests (`tests/unit/`)
+
+**Predictor Tests** (`predictor_tests.rs` - 12 tests)
+- Low-rank predictor creation and configuration
+- Active neuron prediction validation
+- Top-K mode functionality
+- Calibration effectiveness
+- Input validation and edge cases
+- Consistency and determinism
+
+**Sparse FFN Tests** (`sparse_ffn_tests.rs` - 14 tests)
+- Sparse vs dense computation equivalence
+- Different activation functions (ReLU, GeLU, SiLU)
+- SwiGLU paired neuron handling
+- Empty and partial activation sets
+- Out-of-bounds and duplicate neuron handling
+- Deterministic output verification
+
+**Quantization Tests** (`quantization_tests.rs` - 15 tests)
+- INT8 quantization roundtrip accuracy
+- INT4 compression ratios
+- Different group sizes (16, 32, 64, 128)
+- Selective row dequantization
+- Range preservation
+- Uniform and zero value handling
+- Odd-length array support
+
+### Integration Tests (`tests/integration/`)
+
+**Model Loading Tests** (`model_loading_tests.rs` - 15 tests)
+- GGUF header parsing
+- Invalid format detection
+- Model structure validation
+- Forward pass execution
+- Configuration handling
+- Multiple model sizes
+
+**Sparse Inference Tests** (`sparse_inference_tests.rs` - 12 tests)
+- Full sparse pipeline execution
+- Dense vs sparse accuracy comparison
+- Batch processing
+- Calibration improvements
+- Different sparsity levels (10%-90%)
+- Consistency verification
+- Extreme input handling
+
+### Property-Based Tests (`tests/property/mod.rs` - 10 tests)
+Using `proptest` for generative testing:
+- Output finiteness invariants
+- Valid index generation
+- Dense/sparse equivalence
+- Quantization ordering preservation
+- Top-K constraints
+- Dimension correctness
+- INT4 roundtrip properties
+- Output dimension consistency
+- SwiGLU output validation
+- Calibration robustness
+
+### Benchmark Tests (`benches/sparse_inference_bench.rs` - 10 benchmarks)
+
+**Performance Comparisons:**
+1. **Sparse vs Dense**: Baseline comparison
+2. **Sparsity Levels**: 30%, 50%, 70%, 90% sparsity
+3. **Predictor Performance**: Prediction latency
+4. **Top-K Modes**: K=100, 500, 1000, 2000
+5. **Sparse FFN**: Dense vs 10% vs 50% sparse
+6. **Activation Functions**: ReLU, GeLU, SiLU comparison
+7. **Quantization**: Dequantization of 1, 10, 100 rows
+8. **INT4 vs INT8**: Quantization speed and accuracy
+9. **Calibration**: Sample sizes 10, 50, 100, 500
+10. **SwiGLU**: Dense vs sparse comparison
+
+## Common Test Utilities (`tests/common/mod.rs`)
+
+Helper functions for all tests:
+- `random_vector(dim)` - Generate test vectors
+- `random_activations(max)` - Generate activation patterns
+- `create_test_ffn(input, hidden)` - FFN factory
+- `create_calibrated_predictor()` - Pre-calibrated predictor
+- `create_quantized_matrix(rows, cols)` - Quantized weights
+- `load_test_llama_model()` - Test model loader
+- `assert_vectors_close(a, b, tol)` - Approximate equality
+- `mse(a, b)` - Mean squared error
+- `generate_calibration_data(n)` - Calibration dataset
+
+## Running Tests
+
+```bash
+# Run all tests
+cargo test -p ruvector-sparse-inference
+
+# Run specific test categories
+cargo test -p ruvector-sparse-inference --test unit
+cargo test -p ruvector-sparse-inference --test integration
+cargo test -p ruvector-sparse-inference --test property
+
+# Run unit tests for a specific module
+cargo test -p ruvector-sparse-inference predictor_tests
+cargo test -p ruvector-sparse-inference quantization_tests
+cargo test -p ruvector-sparse-inference sparse_ffn_tests
+
+# Run benchmarks
+cargo bench -p ruvector-sparse-inference
+
+# Run specific benchmark
+cargo bench -p ruvector-sparse-inference -- sparse_vs_dense
+cargo bench -p ruvector-sparse-inference -- sparsity_levels
+cargo bench -p ruvector-sparse-inference -- quantization
+```
+
+## Test Coverage Goals
+
+- **Statements**: >80%
+- **Branches**: >75%
+- **Functions**: >80%
+- **Lines**: >80%
+
+## Test Characteristics
+
+Tests follow the **FIRST** principles:
+- **Fast**: Unit tests <100ms
+- **Isolated**: No dependencies between tests
+- **Repeatable**: Same result every time
+- **Self-validating**: Clear pass/fail
+- **Timely**: Written with implementation
+
+## Property-Based Testing
+
+Tests use `proptest` to verify invariants across wide input ranges:
+- Input values: -10.0 to 10.0
+- Vector dimensions: 256 to 1024
+- Hidden dimensions: 512 to 4096
+- Group sizes: 16, 32, 64, 128
+- Sample counts: 1 to 100
+
+## Edge Cases Tested
+
+1. **Empty inputs**: Zero-length vectors, no active neurons
+2. **Boundary values**: Maximum dimensions, extreme values
+3. **Invalid inputs**: Wrong dimensions, out-of-bounds indices
+4. **Numerical stability**: Very large/small values, precision loss
+5. **Concurrent operations**: Parallel inference requests
+6. **Memory efficiency**: Large datasets, quantization compression
+
+## Test Organization
+
+```
+tests/
+├── common/
+│   └── mod.rs                    # Shared test utilities
+├── unit/
+│   ├── predictor_tests.rs        # Neuron prediction tests
+│   ├── sparse_ffn_tests.rs       # Sparse computation tests
+│   └── quantization_tests.rs     # Weight compression tests
+├── integration/
+│   ├── model_loading_tests.rs    # GGUF parsing tests
+│   └── sparse_inference_tests.rs # End-to-end pipeline tests
+└── property/
+    └── mod.rs                     # Property-based tests
+
+benches/
+└── sparse_inference_bench.rs      # Performance benchmarks
+```
+
+## Future Test Additions
+
+Potential areas for expansion:
+1. Stress tests for memory limits
+2. Concurrent inference benchmarks
+3. Hardware-specific SIMD tests
+4. Model-specific accuracy tests
+5. Calibration strategy comparisons
+6. Cache effectiveness tests
+7. Quantization accuracy analysis
+
+---
+
+**Total Test Coverage**: 78+ tests across 1516 lines
+- 68 unit/integration tests
+- 10 property-based tests
+- 10 performance benchmarks
--- a/vendor/ruvector/crates/ruvector-sparse-inference/tests/backend_simd_tests.rs
+++ b/vendor/ruvector/crates/ruvector-sparse-inference/tests/backend_simd_tests.rs
@@ -0,0 +1,207 @@
+//! Standalone tests for SIMD backend kernels
+
+use ndarray::Array2;
+use ruvector_sparse_inference::backend::{cpu::CpuBackend, get_backend, Backend};
+use ruvector_sparse_inference::config::ActivationType;
+
+#[test]
+fn test_cpu_backend_dot_product() {
+    let backend = CpuBackend;
+
+    // Test small vector
+    let a = vec![1.0, 2.0, 3.0, 4.0];
+    let b = vec![2.0, 3.0, 4.0, 5.0];
+    let result = backend.dot_product(&a, &b);
+    assert!(
+        (result - 40.0).abs() < 1e-5,
+        "Expected 40.0, got {}",
+        result
+    );
+
+    // Test larger vector (exercises SIMD paths)
+    let a: Vec<f32> = (0..256).map(|i| i as f32).collect();
+    let b: Vec<f32> = (0..256).map(|i| (i * 2) as f32).collect();
+    let result = backend.dot_product(&a, &b);
+    let expected: f32 = (0..256).map(|i| (i * i * 2) as f32).sum();
+    assert!(
+        (result - expected).abs() < 1.0,
+        "Expected {}, got {}",
+        expected,
+        result
+    );
+}
+
+#[test]
+fn test_cpu_backend_relu() {
+    let backend = CpuBackend;
+
+    let mut data = vec![-2.0, -1.0, 0.0, 1.0, 2.0, 3.0, -4.0, 5.0];
+    backend.activation(&mut data, ActivationType::Relu);
+    assert_eq!(data, vec![0.0, 0.0, 0.0, 1.0, 2.0, 3.0, 0.0, 5.0]);
+
+    // Test larger array (exercises SIMD paths)
+    let mut data: Vec<f32> = (0..256).map(|i| i as f32 - 128.0).collect();
+    backend.activation(&mut data, ActivationType::Relu);
+    for (i, &val) in data.iter().enumerate() {
+        let expected = (i as f32 - 128.0).max(0.0);
+        assert!(
+            (val - expected).abs() < 1e-5,
+            "Index {}: expected {}, got {}",
+            i,
+            expected,
+            val
+        );
+    }
+}
+
+#[test]
+fn test_cpu_backend_gelu() {
+    let backend = CpuBackend;
+
+    let mut data = vec![0.0, 1.0, -1.0, 2.0];
+    backend.activation(&mut data, ActivationType::Gelu);
+
+    // GELU(0) ≈ 0
+    assert!(
+        data[0].abs() < 0.01,
+        "GELU(0) should be ≈0, got {}",
+        data[0]
+    );
+
+    // GELU(1) ≈ 0.841
+    assert!(
+        (data[1] - 0.841).abs() < 0.01,
+        "GELU(1) should be ≈0.841, got {}",
+        data[1]
+    );
+
+    // GELU(-1) ≈ -0.159 (GELU is NOT an odd function)
+    assert!(
+        (data[2] + 0.159).abs() < 0.1,
+        "GELU(-1) should be ≈-0.159, got {}",
+        data[2]
+    );
+}
+
+#[test]
+fn test_cpu_backend_silu() {
+    let backend = CpuBackend;
+
+    let mut data = vec![0.0, 1.0, -1.0, 2.0];
+    backend.activation(&mut data, ActivationType::Silu);
+
+    // SiLU(0) ≈ 0
+    assert!(
+        data[0].abs() < 0.01,
+        "SiLU(0) should be ≈0, got {}",
+        data[0]
+    );
+
+    // SiLU(1) ≈ 0.731
+    assert!(
+        (data[1] - 0.731).abs() < 0.01,
+        "SiLU(1) should be ≈0.731, got {}",
+        data[1]
+    );
+}
+
+#[test]
+fn test_cpu_backend_add() {
+    let backend = CpuBackend;
+
+    let mut a = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
+    let b = vec![10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0];
+    backend.add(&mut a, &b);
+    assert_eq!(a, vec![11.0, 22.0, 33.0, 44.0, 55.0, 66.0, 77.0, 88.0]);
+}
+
+#[test]
+fn test_cpu_backend_axpy() {
+    let backend = CpuBackend;
+
+    let mut a = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
+    let b = vec![1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0];
+    backend.axpy(&mut a, &b, 2.5);
+    assert_eq!(a, vec![3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5]);
+}
+
+#[test]
+fn test_cpu_backend_sparse_matmul() {
+    let backend = CpuBackend;
+
+    // Create a 4x4 matrix
+    let matrix = Array2::from_shape_vec(
+        (4, 4),
+        vec![
+            1.0, 0.0, 2.0, 0.0, 0.0, 3.0, 0.0, 4.0, 5.0, 0.0, 6.0, 0.0, 0.0, 7.0, 0.0, 8.0,
+        ],
+    )
+    .unwrap();
+
+    let input = vec![1.0, 2.0, 3.0, 4.0];
+
+    // Only compute rows 0 and 2
+    let active_rows = vec![0, 2];
+    let output = backend.sparse_matmul(&matrix, &input, &active_rows);
+
+    // Row 0: 1*1 + 0*2 + 2*3 + 0*4 = 7
+    // Row 2: 5*1 + 0*2 + 6*3 + 0*4 = 23
+    assert_eq!(output.len(), 2);
+    assert!((output[0] - 7.0).abs() < 1e-5);
+    assert!((output[1] - 23.0).abs() < 1e-5);
+}
+
+#[test]
+fn test_cpu_backend_sparse_matmul_accumulate() {
+    let backend = CpuBackend;
+
+    let matrix = Array2::from_shape_vec(
+        (4, 4),
+        vec![
+            1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0,
+        ],
+    )
+    .unwrap();
+
+    let input = vec![1.0, 2.0];
+    let active_cols = vec![0, 2];
+    let mut output = vec![0.0; 4];
+
+    backend.sparse_matmul_accumulate(&matrix, &input, &active_cols, &mut output);
+
+    // Column 0 * 1.0 + Column 2 * 2.0
+    // [1, 5, 9, 13] * 1.0 + [3, 7, 11, 15] * 2.0
+    assert!((output[0] - 7.0).abs() < 1e-5); // 1 + 6
+    assert!((output[1] - 19.0).abs() < 1e-5); // 5 + 14
+    assert!((output[2] - 31.0).abs() < 1e-5); // 9 + 22
+    assert!((output[3] - 43.0).abs() < 1e-5); // 13 + 30
+}
+
+#[test]
+fn test_get_backend() {
+    let backend = get_backend();
+    println!("Using backend: {}", backend.name());
+    println!("SIMD width: {}", backend.simd_width());
+
+    // Verify backend works
+    let a = vec![1.0, 2.0, 3.0, 4.0];
+    let b = vec![2.0, 3.0, 4.0, 5.0];
+    let result = backend.dot_product(&a, &b);
+    assert!((result - 40.0).abs() < 1e-5);
+}
+
+#[test]
+fn test_backend_simd_width() {
+    let backend = CpuBackend;
+    let width = backend.simd_width();
+
+    // Width should be 1, 4, or 8 depending on CPU features
+    assert!(
+        width == 1 || width == 4 || width == 8,
+        "Unexpected SIMD width: {}",
+        width
+    );
+
+    println!("Backend: {}", backend.name());
+    println!("SIMD width: {}", width);
+}
--- a/vendor/ruvector/crates/ruvector-sparse-inference/tests/common/mod.rs
+++ b/vendor/ruvector/crates/ruvector-sparse-inference/tests/common/mod.rs
@@ -0,0 +1,106 @@
+//! Common test utilities for sparse inference tests
+
+use rand::Rng;
+use ruvector_sparse_inference::*;
+
+/// Generate a random vector of given dimension
+pub fn random_vector(dim: usize) -> Vec<f32> {
+    let mut rng = rand::thread_rng();
+    (0..dim).map(|_| rng.gen_range(-1.0..1.0)).collect()
+}
+
+/// Generate random activations (neuron indices)
+pub fn random_activations(max_neurons: usize) -> Vec<usize> {
+    let mut rng = rand::thread_rng();
+    let num_active = rng.gen_range(max_neurons / 4..max_neurons / 2);
+
+    let mut activations: Vec<usize> = (0..max_neurons).collect();
+    activations.truncate(num_active);
+    activations
+}
+
+/// Create a test FFN with known dimensions
+pub fn create_test_ffn(input_dim: usize, hidden_dim: usize) -> sparse::SparseFfn {
+    sparse::SparseFfn::new(input_dim, hidden_dim, sparse::ActivationType::Silu)
+}
+
+/// Create a calibrated predictor for testing
+pub fn create_calibrated_predictor() -> predictor::LowRankPredictor {
+    let mut predictor = predictor::LowRankPredictor::new(512, 4096, 128, 0.1);
+
+    // Generate some calibration data
+    let samples: Vec<Vec<f32>> = (0..50)
+        .map(|_| random_vector(512))
+        .collect();
+
+    let activations: Vec<Vec<usize>> = (0..50)
+        .map(|_| random_activations(4096))
+        .collect();
+
+    predictor.calibrate(&samples, &activations);
+    predictor
+}
+
+/// Create a quantized matrix for testing
+pub fn create_quantized_matrix(rows: usize, cols: usize) -> memory::quantization::QuantizedWeights {
+    let data: Vec<f32> = (0..rows * cols)
+        .map(|i| (i as f32) * 0.01)
+        .collect();
+
+    memory::quantization::QuantizedWeights::quantize_int8(&data)
+}
+
+/// Create a test LLaMA model
+pub fn load_test_llama_model() -> model::LlamaModel {
+    model::LlamaModel::new(512, 2048, 4, 32000)
+}
+
+/// Create a test model for benchmarks
+pub fn load_benchmark_model() -> model::LlamaModel {
+    model::LlamaModel::new(512, 2048, 4, 32000)
+}
+
+/// Create a mock GGUF header
+pub fn create_mock_gguf_header() -> Vec<u8> {
+    let mut data = Vec::new();
+    data.extend_from_slice(&0x46554747u32.to_le_bytes()); // "GGUF" magic
+    data.extend_from_slice(&3u32.to_le_bytes()); // version 3
+    data.extend_from_slice(&0u64.to_le_bytes()); // tensor count
+    data.extend_from_slice(&0u64.to_le_bytes()); // metadata kv count
+    data
+}
+
+/// Assert two vectors are close within tolerance
+pub fn assert_vectors_close(a: &[f32], b: &[f32], tolerance: f32) {
+    assert_eq!(a.len(), b.len(), "Vector lengths don't match");
+    for (i, (&x, &y)) in a.iter().zip(b.iter()).enumerate() {
+        let diff = (x - y).abs();
+        assert!(
+            diff < tolerance,
+            "Vectors differ at index {}: {} vs {} (diff: {})",
+            i, x, y, diff
+        );
+    }
+}
+
+/// Calculate mean squared error between two vectors
+pub fn mse(a: &[f32], b: &[f32]) -> f64 {
+    assert_eq!(a.len(), b.len(), "Vector lengths don't match");
+
+    let sum: f64 = a.iter()
+        .zip(b.iter())
+        .map(|(&x, &y)| {
+            let diff = (x - y) as f64;
+            diff * diff
+        })
+        .sum();
+
+    sum / a.len() as f64
+}
+
+/// Generate calibration data for testing
+pub fn generate_calibration_data(num_samples: usize) -> Vec<Vec<f32>> {
+    (0..num_samples)
+        .map(|_| random_vector(512))
+        .collect()
+}
--- a/vendor/ruvector/crates/ruvector-sparse-inference/tests/integration/model_loading_tests.rs
+++ b/vendor/ruvector/crates/ruvector-sparse-inference/tests/integration/model_loading_tests.rs
@@ -0,0 +1,166 @@
+//! Integration tests for model loading
+
+use ruvector_sparse_inference::model::*;
+
+mod common;
+use common::*;
+
+#[test]
+fn test_gguf_header_parsing() {
+    let mock_gguf = create_mock_gguf_header();
+    let header = GgufParser::parse_header(&mock_gguf).unwrap();
+
+    assert_eq!(header.magic, 0x46554747); // "GGUF"
+    assert_eq!(header.version, 3);
+}
+
+#[test]
+fn test_gguf_invalid_magic() {
+    let mut invalid_gguf = vec![0u8; 8];
+    invalid_gguf[0..4].copy_from_slice(&0x12345678u32.to_le_bytes()); // Wrong magic
+    invalid_gguf[4..8].copy_from_slice(&3u32.to_le_bytes());
+
+    let result = GgufParser::parse_header(&invalid_gguf);
+    assert!(result.is_err(), "Should fail with invalid magic number");
+}
+
+#[test]
+fn test_gguf_too_small() {
+    let tiny_data = vec![0u8; 4]; // Too small
+    let result = GgufParser::parse_header(&tiny_data);
+    assert!(result.is_err(), "Should fail with too small data");
+}
+
+#[test]
+fn test_llama_model_structure() {
+    let model = load_test_llama_model();
+
+    assert!(model.metadata().hidden_size > 0);
+    assert!(model.layers.len() > 0);
+    assert!(model.embed_tokens.vocab_size() > 0);
+}
+
+#[test]
+fn test_llama_model_dimensions() {
+    let model = load_test_llama_model();
+
+    assert_eq!(model.hidden_size(), 512);
+    assert_eq!(model.intermediate_size(), 2048);
+    assert_eq!(model.layers.len(), 4);
+    assert_eq!(model.embed_tokens.vocab_size(), 32000);
+}
+
+#[test]
+fn test_model_forward_pass() {
+    let model = load_test_llama_model();
+    let input = ModelInput::TokenIds(vec![1, 2, 3, 4, 5]);
+    let config = InferenceConfig::default();
+
+    let output = model.forward(&input, &config).unwrap();
+
+    assert!(!output.logits.is_empty());
+    assert_eq!(output.logits.len(), model.embed_tokens.vocab_size());
+}
+
+#[test]
+fn test_model_forward_with_embeddings() {
+    let model = load_test_llama_model();
+    let embeddings = vec![
+        random_vector(512),
+        random_vector(512),
+        random_vector(512),
+    ];
+    let input = ModelInput::Embeddings(embeddings);
+    let config = InferenceConfig::default();
+
+    let output = model.forward(&input, &config).unwrap();
+    assert!(!output.logits.is_empty());
+}
+
+#[test]
+fn test_inference_config_default() {
+    let config = InferenceConfig::default();
+
+    assert_eq!(config.temperature, 1.0);
+    assert_eq!(config.top_k, None);
+    assert_eq!(config.top_p, None);
+}
+
+#[test]
+fn test_inference_config_custom() {
+    let config = InferenceConfig {
+        temperature: 0.8,
+        top_k: Some(50),
+        top_p: Some(0.95),
+    };
+
+    assert_eq!(config.temperature, 0.8);
+    assert_eq!(config.top_k, Some(50));
+    assert_eq!(config.top_p, Some(0.95));
+}
+
+#[test]
+fn test_model_metadata_access() {
+    let model = load_test_llama_model();
+    let metadata = model.metadata();
+
+    assert_eq!(metadata.hidden_size(), 512);
+    assert_eq!(metadata.hidden_size, 512);
+    assert_eq!(metadata.intermediate_size, 2048);
+    assert_eq!(metadata.num_layers, 4);
+    assert_eq!(metadata.vocab_size, 32000);
+}
+
+#[test]
+fn test_embed_tokens_vocab_size() {
+    let embed = EmbedTokens::new(50000, 768);
+    assert_eq!(embed.vocab_size(), 50000);
+}
+
+#[test]
+fn test_transformer_layer_indices() {
+    let model = load_test_llama_model();
+
+    for (i, layer) in model.layers.iter().enumerate() {
+        assert_eq!(layer.layer_idx, i, "Layer index should match position");
+    }
+}
+
+#[test]
+fn test_model_creation_various_sizes() {
+    // Test different model sizes
+    let small = LlamaModel::new(256, 1024, 2, 10000);
+    assert_eq!(small.hidden_size(), 256);
+    assert_eq!(small.layers.len(), 2);
+
+    let large = LlamaModel::new(2048, 8192, 32, 100000);
+    assert_eq!(large.hidden_size(), 2048);
+    assert_eq!(large.layers.len(), 32);
+}
+
+#[test]
+fn test_gguf_header_version() {
+    let mut data = create_mock_gguf_header();
+
+    // Modify version
+    data[4..8].copy_from_slice(&2u32.to_le_bytes());
+
+    let header = GgufParser::parse_header(&data).unwrap();
+    assert_eq!(header.version, 2);
+}
+
+#[test]
+fn test_model_forward_deterministic() {
+    let model = load_test_llama_model();
+    let input = ModelInput::TokenIds(vec![1, 2, 3]);
+    let config = InferenceConfig::default();
+
+    let output1 = model.forward(&input, &config).unwrap();
+    let output2 = model.forward(&input, &config).unwrap();
+
+    // Same input should produce same output
+    assert_eq!(output1.logits.len(), output2.logits.len());
+    for (a, b) in output1.logits.iter().zip(output2.logits.iter()) {
+        assert_eq!(a, b);
+    }
+}
--- a/vendor/ruvector/crates/ruvector-sparse-inference/tests/integration/sparse_inference_tests.rs
+++ b/vendor/ruvector/crates/ruvector-sparse-inference/tests/integration/sparse_inference_tests.rs
@@ -0,0 +1,206 @@
+//! Integration tests for sparse inference pipeline
+
+use ruvector_sparse_inference::*;
+
+mod common;
+use common::*;
+
+#[test]
+fn test_full_sparse_pipeline() {
+    let model = load_test_llama_model();
+    let mut engine = SparseInferenceEngine::new_sparse(model, 0.3);
+
+    // Calibrate
+    let calibration_samples = generate_calibration_data(100);
+    engine.calibrate(&calibration_samples).unwrap();
+
+    // Run inference
+    let input = random_vector(512);
+    let output = engine.infer(&input).unwrap();
+
+    // Verify output
+    assert_eq!(output.len(), 512, "Output dimension should match input");
+    assert!(output.iter().all(|&x| x.is_finite()), "All outputs should be finite");
+
+    // Check sparsity was applied
+    let stats = engine.sparsity_statistics();
+    assert!(stats.average_active_ratio < 0.5, "Should have at least 50% sparsity");
+}
+
+#[test]
+fn test_dense_vs_sparse_accuracy() {
+    let model = load_test_llama_model();
+    let dense_engine = SparseInferenceEngine::new_dense(model.clone());
+    let sparse_engine = SparseInferenceEngine::new_sparse(model, 0.1);
+
+    let inputs: Vec<_> = (0..100).map(|_| random_vector(512)).collect();
+
+    let mut total_error = 0.0;
+    for input in &inputs {
+        let dense_out = dense_engine.infer(input).unwrap();
+        let sparse_out = sparse_engine.infer(input).unwrap();
+
+        let error = mse(&dense_out, &sparse_out);
+        total_error += error;
+    }
+
+    let avg_error = total_error / inputs.len() as f64;
+    assert!(avg_error < 0.1, "Average error too high: {}", avg_error);
+}
+
+#[test]
+fn test_sparse_inference_batch_processing() {
+    let model = load_test_llama_model();
+    let engine = SparseInferenceEngine::new_sparse(model, 0.2);
+
+    let batch_size = 10;
+    let inputs: Vec<_> = (0..batch_size).map(|_| random_vector(512)).collect();
+
+    let mut outputs = Vec::new();
+    for input in &inputs {
+        let output = engine.infer(input).unwrap();
+        outputs.push(output);
+    }
+
+    assert_eq!(outputs.len(), batch_size);
+    for output in &outputs {
+        assert_eq!(output.len(), 512);
+        assert!(output.iter().all(|&x| x.is_finite()));
+    }
+}
+
+#[test]
+fn test_calibration_improves_accuracy() {
+    let model = load_test_llama_model();
+
+    // Create two engines: one calibrated, one not
+    let mut calibrated = SparseInferenceEngine::new_sparse(model.clone(), 0.3);
+    let uncalibrated = SparseInferenceEngine::new_sparse(model, 0.3);
+
+    // Calibrate one
+    let calibration_samples = generate_calibration_data(50);
+    calibrated.calibrate(&calibration_samples).unwrap();
+
+    // Test both
+    let test_inputs: Vec<_> = (0..20).map(|_| random_vector(512)).collect();
+
+    for input in &test_inputs {
+        let cal_output = calibrated.infer(input).unwrap();
+        let uncal_output = uncalibrated.infer(input).unwrap();
+
+        assert_eq!(cal_output.len(), uncal_output.len());
+        assert!(cal_output.iter().all(|&x| x.is_finite()));
+        assert!(uncal_output.iter().all(|&x| x.is_finite()));
+    }
+}
+
+#[test]
+fn test_different_sparsity_levels() {
+    let model = load_test_llama_model();
+    let input = random_vector(512);
+
+    for sparsity in [0.1, 0.3, 0.5, 0.7, 0.9] {
+        let engine = SparseInferenceEngine::new_sparse(model.clone(), sparsity);
+        let output = engine.infer(&input).unwrap();
+
+        assert_eq!(output.len(), 512, "Output dimension mismatch for sparsity {}", sparsity);
+        assert!(output.iter().all(|&x| x.is_finite()), "Non-finite output for sparsity {}", sparsity);
+    }
+}
+
+#[test]
+fn test_sparse_inference_consistency() {
+    let model = load_test_llama_model();
+    let engine = SparseInferenceEngine::new_sparse(model, 0.3);
+    let input = random_vector(512);
+
+    // Same input should produce same output
+    let output1 = engine.infer(&input).unwrap();
+    let output2 = engine.infer(&input).unwrap();
+
+    assert_vectors_close(&output1, &output2, 1e-10);
+}
+
+#[test]
+fn test_sparsity_statistics() {
+    let model = load_test_llama_model();
+    let engine = SparseInferenceEngine::new_sparse(model, 0.4);
+
+    let stats = engine.sparsity_statistics();
+
+    assert!(stats.average_active_ratio >= 0.0);
+    assert!(stats.average_active_ratio <= 1.0);
+    assert!(stats.min_active <= stats.max_active);
+}
+
+#[test]
+fn test_dense_engine_activates_all_neurons() {
+    let model = load_test_llama_model();
+    let dense_engine = SparseInferenceEngine::new_dense(model);
+
+    let stats = dense_engine.sparsity_statistics();
+
+    // Dense engine should have statistics indicating all neurons are active
+    // (exact values depend on implementation, but ratio should be high)
+    assert!(stats.average_active_ratio >= 0.0);
+}
+
+#[test]
+fn test_multiple_inferences() {
+    let model = load_test_llama_model();
+    let engine = SparseInferenceEngine::new_sparse(model, 0.2);
+
+    // Run many inferences to ensure stability
+    for _ in 0..100 {
+        let input = random_vector(512);
+        let output = engine.infer(&input).unwrap();
+
+        assert_eq!(output.len(), 512);
+        assert!(output.iter().all(|&x| x.is_finite()));
+    }
+}
+
+#[test]
+fn test_extreme_input_values() {
+    let model = load_test_llama_model();
+    let engine = SparseInferenceEngine::new_sparse(model, 0.3);
+
+    // Test with very large values
+    let large_input = vec![1000.0f32; 512];
+    let output_large = engine.infer(&large_input).unwrap();
+    assert!(output_large.iter().all(|&x| x.is_finite()));
+
+    // Test with very small values
+    let small_input = vec![-1000.0f32; 512];
+    let output_small = engine.infer(&small_input).unwrap();
+    assert!(output_small.iter().all(|&x| x.is_finite()));
+
+    // Test with zero
+    let zero_input = vec![0.0f32; 512];
+    let output_zero = engine.infer(&zero_input).unwrap();
+    assert!(output_zero.iter().all(|&x| x.is_finite()));
+}
+
+#[test]
+fn test_calibration_with_empty_samples() {
+    let model = load_test_llama_model();
+    let mut engine = SparseInferenceEngine::new_sparse(model, 0.3);
+
+    let empty_samples: Vec<Vec<f32>> = vec![];
+    let result = engine.calibrate(&empty_samples);
+
+    // Should handle empty calibration gracefully
+    assert!(result.is_ok());
+}
+
+#[test]
+fn test_calibration_with_many_samples() {
+    let model = load_test_llama_model();
+    let mut engine = SparseInferenceEngine::new_sparse(model, 0.3);
+
+    // Large calibration set
+    let samples = generate_calibration_data(1000);
+    let result = engine.calibrate(&samples);
+
+    assert!(result.is_ok());
+}
--- a/vendor/ruvector/crates/ruvector-sparse-inference/tests/property/mod.rs
+++ b/vendor/ruvector/crates/ruvector-sparse-inference/tests/property/mod.rs
@@ -0,0 +1,157 @@
+//! Property-based tests using proptest
+
+use proptest::prelude::*;
+use ruvector_sparse_inference::*;
+
+proptest! {
+    #[test]
+    fn sparse_output_finite(input in prop::collection::vec(-10.0f32..10.0, 512)) {
+        let ffn = sparse::SparseFfn::new(512, 2048, sparse::ActivationType::Silu);
+        let active: Vec<usize> = (0..1024).collect();
+
+        let output = ffn.forward_sparse(&input, &active);
+
+        prop_assert!(output.iter().all(|x| x.is_finite()));
+    }
+
+    #[test]
+    fn predictor_returns_valid_indices(
+        input in prop::collection::vec(-1.0f32..1.0, 512)
+    ) {
+        let predictor = predictor::LowRankPredictor::new(512, 4096, 128, 0.1);
+        let active = predictor.predict(&input);
+
+        prop_assert!(active.iter().all(|&i| i < 4096));
+        prop_assert!(active.len() <= 4096);
+    }
+
+    #[test]
+    fn sparse_matches_dense_with_all_neurons(
+        input in prop::collection::vec(-5.0f32..5.0, 512)
+    ) {
+        let ffn = sparse::SparseFfn::new(512, 2048, sparse::ActivationType::Silu);
+        let all_neurons: Vec<usize> = (0..2048).collect();
+
+        let dense = ffn.forward_dense(&input);
+        let sparse = ffn.forward_sparse(&input, &all_neurons);
+
+        // Allow small numerical differences
+        for (d, s) in dense.iter().zip(sparse.iter()) {
+            prop_assert!((d - s).abs() < 1e-4);
+        }
+    }
+
+    #[test]
+    fn quantization_preserves_order(
+        mut values in prop::collection::vec(-100.0f32..100.0, 1..1000)
+    ) {
+        values.sort_by(|a, b| a.partial_cmp(b).unwrap());
+
+        let quantized = memory::quantization::QuantizedWeights::quantize_int8(&values);
+        let dequantized = quantized.dequantize_row(0);
+
+        // Dequantized values should maintain relative ordering (mostly)
+        for i in 1..dequantized.len() {
+            // Allow for some quantization error
+            prop_assert!(
+                dequantized[i] >= dequantized[i-1] - 0.5,
+                "Order not preserved at index {}: {} vs {}",
+                i, dequantized[i-1], dequantized[i]
+            );
+        }
+    }
+
+    #[test]
+    fn predictor_top_k_returns_k_neurons(
+        input in prop::collection::vec(-1.0f32..1.0, 512),
+        k in 1usize..=2048
+    ) {
+        let mut predictor = predictor::LowRankPredictor::new(512, 4096, 128, 0.0);
+        predictor.set_top_k(Some(k));
+
+        let active = predictor.predict(&input);
+
+        prop_assert_eq!(active.len(), k);
+        prop_assert!(active.iter().all(|&i| i < 4096));
+    }
+
+    #[test]
+    fn sparse_output_dimension_correct(
+        input in prop::collection::vec(-10.0f32..10.0, 256..=1024),
+        hidden_dim in 512usize..=4096
+    ) {
+        let input_dim = input.len();
+        let ffn = sparse::SparseFfn::new(input_dim, hidden_dim, sparse::ActivationType::Relu);
+        let active: Vec<usize> = (0..hidden_dim.min(100)).collect();
+
+        let output = ffn.forward_sparse(&input, &active);
+
+        prop_assert_eq!(output.len(), input_dim);
+    }
+
+    #[test]
+    fn quantization_int4_roundtrip(
+        values in prop::collection::vec(-50.0f32..50.0, 64..=512),
+        group_size in prop::sample::select(vec![16, 32, 64, 128])
+    ) {
+        let quantized = memory::quantization::QuantizedWeights::quantize_int4(&values, group_size);
+        let dequantized = quantized.dequantize_row(0);
+
+        prop_assert_eq!(values.len(), dequantized.len());
+
+        // Check approximate equality (int4 has lower precision)
+        for (orig, deq) in values.iter().zip(dequantized.iter()) {
+            prop_assert!(
+                (orig - deq).abs() < 5.0,
+                "Too much error: {} vs {}",
+                orig, deq
+            );
+        }
+    }
+
+    #[test]
+    fn sparse_inference_output_dimension(
+        input in prop::collection::vec(-5.0f32..5.0, 512)
+    ) {
+        let model = model::LlamaModel::new(512, 2048, 4, 32000);
+        let engine = SparseInferenceEngine::new_sparse(model, 0.3);
+
+        let output = engine.infer(&input).unwrap();
+
+        prop_assert_eq!(output.len(), 512);
+        prop_assert!(output.iter().all(|x| x.is_finite()));
+    }
+
+    #[test]
+    fn swiglu_output_finite(
+        input in prop::collection::vec(-10.0f32..10.0, 512)
+    ) {
+        let ffn = sparse::SwiGLUFfn::new(512, 2048);
+        let active: Vec<usize> = (0..500).map(|i| i * 2).collect();
+
+        let output = ffn.forward_sparse(&input, &active);
+
+        prop_assert!(output.iter().all(|x| x.is_finite()));
+        prop_assert_eq!(output.len(), 512);
+    }
+
+    #[test]
+    fn calibration_handles_any_samples(
+        num_samples in 1usize..=100
+    ) {
+        let mut predictor = predictor::LowRankPredictor::new(512, 4096, 128, 0.1);
+
+        let samples: Vec<Vec<f32>> = (0..num_samples)
+            .map(|_| vec![0.1; 512])
+            .collect();
+
+        let activations: Vec<Vec<usize>> = (0..num_samples)
+            .map(|_| (0..100).collect())
+            .collect();
+
+        predictor.calibrate(&samples, &activations);
+
+        // Should complete without panicking
+        prop_assert!(true);
+    }
+}
--- a/vendor/ruvector/crates/ruvector-sparse-inference/tests/unit/predictor_tests.rs
+++ b/vendor/ruvector/crates/ruvector-sparse-inference/tests/unit/predictor_tests.rs
@@ -0,0 +1,159 @@
+//! Unit tests for neuron predictors
+
+use ruvector_sparse_inference::predictor::*;
+
+mod common;
+use common::*;
+
+#[test]
+fn test_lowrank_predictor_creation() {
+    let predictor = LowRankPredictor::new(512, 4096, 128, 0.1);
+    assert_eq!(predictor.input_dim(), 512);
+    assert_eq!(predictor.hidden_dim(), 4096);
+    assert_eq!(predictor.rank(), 128);
+}
+
+#[test]
+fn test_predictor_predicts_active_neurons() {
+    let predictor = create_calibrated_predictor();
+    let input = vec![0.1f32; 512];
+
+    let active = predictor.predict(&input);
+
+    // Should predict some neurons as active
+    assert!(!active.is_empty(), "Predictor should activate some neurons");
+    // Should predict fewer than total neurons (sparsity)
+    assert!(active.len() < 4096, "Predictor should be sparse");
+    // All indices should be valid
+    assert!(active.iter().all(|&i| i < 4096), "All indices should be valid");
+}
+
+#[test]
+fn test_predictor_top_k_mode() {
+    let mut predictor = LowRankPredictor::new(512, 4096, 128, 0.0);
+    predictor.set_top_k(Some(100));
+
+    let input = vec![0.1f32; 512];
+    let active = predictor.predict(&input);
+
+    assert_eq!(active.len(), 100, "Top-K should return exactly K neurons");
+}
+
+#[test]
+fn test_predictor_top_k_larger_than_hidden() {
+    let mut predictor = LowRankPredictor::new(512, 100, 64, 0.0);
+    predictor.set_top_k(Some(200)); // More than hidden_dim
+
+    let input = random_vector(512);
+    let active = predictor.predict(&input);
+
+    // Should return at most hidden_dim neurons
+    assert!(active.len() <= 100);
+}
+
+#[test]
+fn test_predictor_calibration() {
+    let mut predictor = LowRankPredictor::new(512, 4096, 128, 0.5);
+
+    // Initial threshold
+    let initial_threshold = 0.5;
+
+    // Generate calibration data
+    let samples: Vec<_> = (0..100)
+        .map(|_| random_vector(512))
+        .collect();
+
+    let activations: Vec<_> = (0..100)
+        .map(|_| {
+            // Simulate 30% activation rate
+            let num_active = (4096 as f32 * 0.3) as usize;
+            (0..num_active).collect::<Vec<_>>()
+        })
+        .collect();
+
+    predictor.calibrate(&samples, &activations);
+
+    // After calibration, predictor should make better predictions
+    let test_input = random_vector(512);
+    let active = predictor.predict(&test_input);
+    assert!(!active.is_empty(), "Calibrated predictor should activate neurons");
+}
+
+#[test]
+fn test_predictor_different_inputs_different_outputs() {
+    let predictor = LowRankPredictor::new(512, 4096, 128, 0.1);
+
+    let input1 = random_vector(512);
+    let input2 = random_vector(512);
+
+    let active1 = predictor.predict(&input1);
+    let active2 = predictor.predict(&input2);
+
+    // Different inputs should generally produce different activations
+    // (This test might occasionally fail due to randomness, but should pass most of the time)
+    assert_ne!(active1, active2, "Different inputs should produce different activations");
+}
+
+#[test]
+fn test_dense_predictor_activates_all() {
+    let predictor = DensePredictor::new(4096);
+    let input = random_vector(512);
+
+    let active = predictor.predict(&input);
+
+    assert_eq!(active.len(), 4096, "Dense predictor should activate all neurons");
+    assert_eq!(active, (0..4096).collect::<Vec<_>>(), "Should be sequential indices");
+}
+
+#[test]
+fn test_dense_predictor_num_neurons() {
+    let predictor = DensePredictor::new(2048);
+    assert_eq!(predictor.num_neurons(), 2048);
+}
+
+#[test]
+#[should_panic(expected = "Input dimension mismatch")]
+fn test_predictor_wrong_input_dimension() {
+    let predictor = LowRankPredictor::new(512, 4096, 128, 0.1);
+    let wrong_input = vec![0.1f32; 256]; // Wrong dimension
+
+    predictor.predict(&wrong_input);
+}
+
+#[test]
+fn test_predictor_zero_input() {
+    let predictor = LowRankPredictor::new(512, 4096, 128, 0.1);
+    let zero_input = vec![0.0f32; 512];
+
+    let active = predictor.predict(&zero_input);
+
+    // Zero input should still produce some output (might be threshold-dependent)
+    assert!(active.len() <= 4096, "Should not exceed total neurons");
+}
+
+#[test]
+fn test_predictor_extreme_values() {
+    let predictor = LowRankPredictor::new(512, 4096, 128, 0.1);
+
+    // Test with very large values
+    let large_input = vec![1000.0f32; 512];
+    let active_large = predictor.predict(&large_input);
+    assert!(active_large.iter().all(|&i| i < 4096));
+
+    // Test with very small values
+    let small_input = vec![-1000.0f32; 512];
+    let active_small = predictor.predict(&small_input);
+    assert!(active_small.iter().all(|&i| i < 4096));
+}
+
+#[test]
+fn test_predictor_consistent_predictions() {
+    let predictor = LowRankPredictor::new(512, 4096, 128, 0.1);
+    let input = random_vector(512);
+
+    // Same input should produce same output
+    let active1 = predictor.predict(&input);
+    let active2 = predictor.predict(&input);
+
+    assert_eq!(active1, active2, "Same input should produce same output");
+}
--- a/vendor/ruvector/crates/ruvector-sparse-inference/tests/unit/quantization_tests.rs
+++ b/vendor/ruvector/crates/ruvector-sparse-inference/tests/unit/quantization_tests.rs
@@ -0,0 +1,193 @@
+//! Unit tests for weight quantization
+
+use ruvector_sparse_inference::memory::quantization::*;
+
+mod common;
+use common::*;
+
+#[test]
+fn test_int8_quantization_roundtrip() {
+    let original = random_vector(1024);
+    let quantized = QuantizedWeights::quantize_int8(&original);
+    let dequantized = quantized.dequantize_row(0);
+
+    // Should be close after dequantization
+    assert_vectors_close(&original, &dequantized, 0.01);
+}
+
+#[test]
+fn test_int8_quantization_dimensions() {
+    let original = random_vector(1024);
+    let quantized = QuantizedWeights::quantize_int8(&original);
+
+    assert_eq!(quantized.nrows(), 1);
+    assert_eq!(quantized.ncols(), 1024);
+}
+
+#[test]
+fn test_int4_quantization_compression() {
+    let original: Vec<f32> = (0..1024).map(|i| (i as f32) * 0.01).collect();
+    let quantized = QuantizedWeights::quantize_int4(&original, 64); // group_size=64
+
+    // Int4 should be significantly smaller than original (4 bytes per f32)
+    let original_size = original.len() * 4;
+    let quantized_size = quantized.size_bytes();
+
+    assert!(quantized_size < original_size / 4,
+        "Int4 quantization should compress data (original: {}, quantized: {})",
+        original_size, quantized_size);
+}
+
+#[test]
+fn test_int4_quantization_roundtrip() {
+    let original: Vec<f32> = (0..256).map(|i| (i as f32) * 0.01).collect();
+    let quantized = QuantizedWeights::quantize_int4(&original, 32);
+    let dequantized = quantized.dequantize_row(0);
+
+    // Int4 has lower precision, so tolerance is higher
+    assert_vectors_close(&original, &dequantized, 0.05);
+}
+
+#[test]
+fn test_int4_different_group_sizes() {
+    let original = random_vector(512);
+
+    for group_size in [16, 32, 64, 128] {
+        let quantized = QuantizedWeights::quantize_int4(&original, group_size);
+        let dequantized = quantized.dequantize_row(0);
+
+        assert_eq!(original.len(), dequantized.len(),
+            "Length mismatch for group_size {}", group_size);
+        assert_vectors_close(&original, &dequantized, 0.1);
+    }
+}
+
+#[test]
+fn test_selective_dequantization() {
+    // Create a larger matrix to test selective dequantization
+    let rows_data: Vec<Vec<f32>> = (0..100)
+        .map(|_| random_vector(512))
+        .collect();
+
+    // For this test, we'll quantize each row separately and store them
+    // (In real implementation, you'd have a multi-row quantization)
+    let quantized = QuantizedWeights::quantize_int8(&rows_data[0]);
+
+    let selected_rows = vec![0];
+    let dequantized = quantized.dequantize_rows(&selected_rows);
+
+    assert_eq!(dequantized.nrows(), selected_rows.len());
+    assert_eq!(dequantized.ncols(), 512);
+}
+
+#[test]
+fn test_quantization_preserves_range() {
+    let original: Vec<f32> = vec![-5.0, -2.5, 0.0, 2.5, 5.0];
+    let quantized = QuantizedWeights::quantize_int8(&original);
+    let dequantized = quantized.dequantize_row(0);
+
+    // Check that min and max are approximately preserved
+    let orig_min = original.iter().cloned().fold(f32::INFINITY, f32::min);
+    let orig_max = original.iter().cloned().fold(f32::NEG_INFINITY, f32::max);
+    let deq_min = dequantized.iter().cloned().fold(f32::INFINITY, f32::min);
+    let deq_max = dequantized.iter().cloned().fold(f32::NEG_INFINITY, f32::max);
+
+    assert!((orig_min - deq_min).abs() < 0.1);
+    assert!((orig_max - deq_max).abs() < 0.1);
+}
+
+#[test]
+fn test_quantization_uniform_values() {
+    let original = vec![3.14f32; 100];
+    let quantized = QuantizedWeights::quantize_int8(&original);
+    let dequantized = quantized.dequantize_row(0);
+
+    // All values should be approximately the same
+    for &val in &dequantized {
+        assert!((val - 3.14).abs() < 0.1);
+    }
+}
+
+#[test]
+fn test_quantization_zero_values() {
+    let original = vec![0.0f32; 100];
+    let quantized = QuantizedWeights::quantize_int8(&original);
+    let dequantized = quantized.dequantize_row(0);
+
+    // All values should be close to zero
+    for &val in &dequantized {
+        assert!(val.abs() < 0.01);
+    }
+}
+
+#[test]
+fn test_int4_odd_length() {
+    // Test with odd number of elements (tests padding)
+    let original = random_vector(513); // Odd number
+    let quantized = QuantizedWeights::quantize_int4(&original, 32);
+    let dequantized = quantized.dequantize_row(0);
+
+    assert_eq!(original.len(), dequantized.len());
+}
+
+#[test]
+fn test_quantization_size_reduction() {
+    let original = random_vector(4096);
+    let original_size = original.len() * std::mem::size_of::<f32>();
+
+    let int8_quantized = QuantizedWeights::quantize_int8(&original);
+    let int8_size = int8_quantized.size_bytes();
+
+    let int4_quantized = QuantizedWeights::quantize_int4(&original, 64);
+    let int4_size = int4_quantized.size_bytes();
+
+    // Verify compression ratios
+    assert!(int8_size < original_size / 2, "Int8 should be ~4x smaller");
+    assert!(int4_size < int8_size, "Int4 should be smaller than Int8");
+}
+
+#[test]
+fn test_multiple_row_dequantization() {
+    let quantized = create_quantized_matrix(100, 512);
+    let rows = vec![10, 50, 99];
+
+    let dequantized = quantized.dequantize_rows(&rows);
+
+    assert_eq!(dequantized.nrows(), rows.len());
+    assert_eq!(dequantized.ncols(), 512);
+
+    // All values should be finite
+    for i in 0..dequantized.nrows() {
+        for j in 0..dequantized.ncols() {
+            assert!(dequantized[[i, j]].is_finite());
+        }
+    }
+}
+
+#[test]
+#[should_panic(expected = "Row index out of bounds")]
+fn test_dequantize_out_of_bounds_row() {
+    let quantized = QuantizedWeights::quantize_int8(&random_vector(512));
+    quantized.dequantize_row(5); // Only 1 row exists
+}
+
+#[test]
+fn test_quantization_large_values() {
+    let original = vec![1000.0, 5000.0, -3000.0, 10000.0];
+    let quantized = QuantizedWeights::quantize_int8(&original);
+    let dequantized = quantized.dequantize_row(0);
+
+    // Should handle large values reasonably
+    assert_vectors_close(&original, &dequantized, 100.0); // Higher tolerance for large values
+}
+
+#[test]
+fn test_int4_group_boundary() {
+    // Test that group boundaries are handled correctly
+    let original = random_vector(128);
+    let quantized = QuantizedWeights::quantize_int4(&original, 32); // 4 groups exactly
+    let dequantized = quantized.dequantize_row(0);
+
+    assert_eq!(original.len(), dequantized.len());
+    assert_vectors_close(&original, &dequantized, 0.1);
+}
--- a/vendor/ruvector/crates/ruvector-sparse-inference/tests/unit/sparse_ffn_tests.rs
+++ b/vendor/ruvector/crates/ruvector-sparse-inference/tests/unit/sparse_ffn_tests.rs
@@ -0,0 +1,187 @@
+//! Unit tests for sparse feed-forward networks
+
+use ruvector_sparse_inference::sparse::*;
+
+mod common;
+use common::*;
+
+#[test]
+fn test_sparse_ffn_matches_dense() {
+    let ffn = create_test_ffn(512, 2048);
+    let input = random_vector(512);
+    let all_neurons: Vec<usize> = (0..2048).collect();
+
+    let dense_output = ffn.forward_dense(&input);
+    let sparse_output = ffn.forward_sparse(&input, &all_neurons);
+
+    // When all neurons are active, sparse should match dense
+    assert_vectors_close(&dense_output, &sparse_output, 1e-5);
+}
+
+#[test]
+fn test_sparse_ffn_with_subset() {
+    let ffn = create_test_ffn(512, 2048);
+    let input = random_vector(512);
+    let active_neurons: Vec<usize> = (0..1024).collect(); // 50% sparsity
+
+    let output = ffn.forward_sparse(&input, &active_neurons);
+
+    assert_eq!(output.len(), 512, "Output dimension should match input dimension");
+    assert!(output.iter().all(|&x| x.is_finite()), "All outputs should be finite");
+}
+
+#[test]
+fn test_sparse_ffn_empty_activations() {
+    let ffn = create_test_ffn(512, 2048);
+    let input = random_vector(512);
+    let no_neurons: Vec<usize> = vec![];
+
+    let output = ffn.forward_sparse(&input, &no_neurons);
+
+    assert_eq!(output.len(), 512);
+    // With no active neurons, output should be near zero
+    assert!(output.iter().all(|&x| x.abs() < 1e-5), "Output should be near zero with no active neurons");
+}
+
+#[test]
+fn test_different_activations() {
+    for activation in [ActivationType::Relu, ActivationType::Gelu, ActivationType::Silu] {
+        let ffn = SparseFfn::new(512, 2048, activation);
+        let input = random_vector(512);
+        let active: Vec<usize> = (0..500).collect();
+
+        let output = ffn.forward_sparse(&input, &active);
+        assert_eq!(output.len(), 512, "Output dimension should be 512 for {:?}", activation);
+        assert!(output.iter().all(|&x| x.is_finite()), "All outputs should be finite for {:?}", activation);
+    }
+}
+
+#[test]
+fn test_relu_activation_properties() {
+    let ffn = SparseFfn::new(512, 2048, ActivationType::Relu);
+    let input = vec![-1.0f32; 512]; // Negative input
+    let all_neurons: Vec<usize> = (0..2048).collect();
+
+    let output = ffn.forward_dense(&input);
+
+    // ReLU should zero out negative activations
+    // (though final output might still be negative due to w2 projection)
+    assert!(output.iter().all(|&x| x.is_finite()));
+}
+
+#[test]
+fn test_swiglu_paired_neurons() {
+    // SwiGLU uses paired neurons (gate and up projections)
+    let ffn = SwiGLUFfn::new(512, 2048);
+    let input = random_vector(512);
+
+    // Active neurons should be pairs
+    let active_pairs: Vec<usize> = (0..500).map(|i| i * 2).collect();
+    let output = ffn.forward_sparse(&input, &active_pairs);
+
+    assert_eq!(output.len(), 512);
+    assert!(output.iter().all(|&x| x.is_finite()));
+}
+
+#[test]
+fn test_swiglu_matches_dense() {
+    let ffn = SwiGLUFfn::new(512, 2048);
+    let input = random_vector(512);
+    let all_neurons: Vec<usize> = (0..2048).collect();
+
+    let dense_output = ffn.forward_dense(&input);
+    let sparse_output = ffn.forward_sparse(&input, &all_neurons);
+
+    assert_vectors_close(&dense_output, &sparse_output, 1e-5);
+}
+
+#[test]
+fn test_swiglu_empty_activations() {
+    let ffn = SwiGLUFfn::new(512, 2048);
+    let input = random_vector(512);
+    let no_neurons: Vec<usize> = vec![];
+
+    let output = ffn.forward_sparse(&input, &no_neurons);
+
+    assert_eq!(output.len(), 512);
+    assert!(output.iter().all(|&x| x.abs() < 1e-5));
+}
+
+#[test]
+#[should_panic(expected = "Input dimension mismatch")]
+fn test_sparse_ffn_wrong_input_dimension() {
+    let ffn = create_test_ffn(512, 2048);
+    let wrong_input = vec![0.1f32; 256];
+    let active: Vec<usize> = (0..100).collect();
+
+    ffn.forward_sparse(&wrong_input, &active);
+}
+
+#[test]
+fn test_sparse_ffn_out_of_bounds_neurons() {
+    let ffn = create_test_ffn(512, 2048);
+    let input = random_vector(512);
+
+    // Include some out-of-bounds indices
+    let mut active: Vec<usize> = (0..100).collect();
+    active.push(5000); // Out of bounds
+    active.push(10000); // Out of bounds
+
+    let output = ffn.forward_sparse(&input, &active);
+
+    // Should handle gracefully
+    assert_eq!(output.len(), 512);
+    assert!(output.iter().all(|&x| x.is_finite()));
+}
+
+#[test]
+fn test_sparse_ffn_duplicate_neurons() {
+    let ffn = create_test_ffn(512, 2048);
+    let input = random_vector(512);
+
+    // Include duplicate indices
+    let active = vec![10, 20, 10, 30, 20, 10];
+
+    let output = ffn.forward_sparse(&input, &active);
+
+    assert_eq!(output.len(), 512);
+    assert!(output.iter().all(|&x| x.is_finite()));
+}
+
+#[test]
+fn test_sparse_ffn_sparsity_reduces_computation() {
+    let ffn = create_test_ffn(512, 2048);
+    let input = random_vector(512);
+
+    // 10% sparsity
+    let sparse_neurons: Vec<usize> = (0..204).collect();
+
+    let sparse_output = ffn.forward_sparse(&input, &sparse_neurons);
+
+    // Should still produce valid output with much less computation
+    assert_eq!(sparse_output.len(), 512);
+    assert!(sparse_output.iter().all(|&x| x.is_finite()));
+}
+
+#[test]
+fn test_dense_output_deterministic() {
+    let ffn = create_test_ffn(512, 2048);
+    let input = random_vector(512);
+
+    let output1 = ffn.forward_dense(&input);
+    let output2 = ffn.forward_dense(&input);
+
+    assert_vectors_close(&output1, &output2, 1e-10);
+}
+
+#[test]
+fn test_sparse_output_deterministic() {
+    let ffn = create_test_ffn(512, 2048);
+    let input = random_vector(512);
+    let active: Vec<usize> = (0..500).collect();
+
+    let output1 = ffn.forward_sparse(&input, &active);
+    let output2 = ffn.forward_sparse(&input, &active);
+
+    assert_vectors_close(&output1, &output2, 1e-10);
+}