Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
671
docs/adr/delta-behavior/ADR-DB-006-delta-compression-strategy.md
Normal file
671
docs/adr/delta-behavior/ADR-DB-006-delta-compression-strategy.md
Normal file
@@ -0,0 +1,671 @@
|
||||
# ADR-DB-006: Delta Compression Strategy
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-01-28
|
||||
**Authors**: RuVector Architecture Team
|
||||
**Deciders**: Architecture Review Board
|
||||
**Parent**: ADR-DB-001 Delta Behavior Core Architecture
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 0.1 | 2026-01-28 | Architecture Team | Initial proposal |
|
||||
|
||||
---
|
||||
|
||||
## Context and Problem Statement
|
||||
|
||||
### The Compression Challenge
|
||||
|
||||
Delta-first architecture generates significant data volume:
|
||||
- Each delta includes metadata (IDs, clocks, timestamps)
|
||||
- Delta chains accumulate over time
|
||||
- Network transmission requires bandwidth
|
||||
- Storage persists all deltas for history
|
||||
|
||||
### Compression Opportunities
|
||||
|
||||
| Data Type | Characteristics | Compression Potential |
|
||||
|-----------|-----------------|----------------------|
|
||||
| Delta values (f32) | Smooth distributions | 2-4x with quantization |
|
||||
| Indices (u32) | Sparse, sorted | 3-5x with delta+varint |
|
||||
| Metadata | Repetitive strings | 5-10x with dictionary |
|
||||
| Batches | Similar patterns | 10-50x with deduplication |
|
||||
|
||||
### Requirements
|
||||
|
||||
1. **Speed**: Compression/decompression < 1ms for typical deltas
|
||||
2. **Ratio**: >3x compression for storage, >5x for network
|
||||
3. **Streaming**: Support for streaming compression/decompression
|
||||
4. **Lossless Option**: Must support exact reconstruction
|
||||
5. **WASM Compatible**: Must work in browser environment
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### Adopt Multi-Tier Compression Strategy
|
||||
|
||||
We implement a tiered compression system that adapts to data characteristics and use case requirements.
|
||||
|
||||
### Compression Tiers
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ COMPRESSION TIER SELECTION │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
|
||||
Input Delta
|
||||
│
|
||||
v
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ TIER 0: ENCODING │
|
||||
│ Format selection (Sparse/Dense/RLE/Dict) │
|
||||
│ Typical: 1-10x compression, <10us │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
v
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ TIER 1: VALUE COMPRESSION │
|
||||
│ Quantization (f32 -> f16/i8/i4) │
|
||||
│ Typical: 2-8x compression, <50us │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
v
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ TIER 2: ENTROPY CODING │
|
||||
│ LZ4 (fast) / Zstd (balanced) / Brotli (max) │
|
||||
│ Typical: 1.5-3x additional, 10us-1ms │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
v
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ TIER 3: BATCH COMPRESSION │
|
||||
│ Dictionary, deduplication, delta-of-deltas │
|
||||
│ Typical: 2-10x additional for batches │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Tier 0: Encoding Layer
|
||||
|
||||
See ADR-DB-002 for format selection. This tier handles:
|
||||
- Sparse vs Dense vs RLE vs Dictionary encoding
|
||||
- Index delta-encoding
|
||||
- Varint encoding for integers
|
||||
|
||||
### Tier 1: Value Compression
|
||||
|
||||
```rust
|
||||
/// Value quantization for delta compression
|
||||
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
|
||||
pub enum QuantizationLevel {
|
||||
/// No quantization (f32)
|
||||
None,
|
||||
/// Half precision (f16)
|
||||
Float16,
|
||||
/// 8-bit scaled integers
|
||||
Int8 { scale: f32, offset: f32 },
|
||||
/// 4-bit scaled integers
|
||||
Int4 { scale: f32, offset: f32 },
|
||||
/// Binary (sign only)
|
||||
Binary,
|
||||
}
|
||||
|
||||
/// Quantize delta values
|
||||
pub fn quantize_values(
|
||||
values: &[f32],
|
||||
level: QuantizationLevel,
|
||||
) -> QuantizedValues {
|
||||
match level {
|
||||
QuantizationLevel::None => {
|
||||
QuantizedValues::Float32(values.to_vec())
|
||||
}
|
||||
|
||||
QuantizationLevel::Float16 => {
|
||||
let quantized: Vec<u16> = values.iter()
|
||||
.map(|&v| half::f16::from_f32(v).to_bits())
|
||||
.collect();
|
||||
QuantizedValues::Float16(quantized)
|
||||
}
|
||||
|
||||
QuantizationLevel::Int8 { scale, offset } => {
|
||||
let quantized: Vec<i8> = values.iter()
|
||||
.map(|&v| ((v - offset) / scale).round().clamp(-128.0, 127.0) as i8)
|
||||
.collect();
|
||||
QuantizedValues::Int8 {
|
||||
values: quantized,
|
||||
scale,
|
||||
offset,
|
||||
}
|
||||
}
|
||||
|
||||
QuantizationLevel::Int4 { scale, offset } => {
|
||||
// Pack two 4-bit values per byte
|
||||
let packed: Vec<u8> = values.chunks(2)
|
||||
.map(|chunk| {
|
||||
let v0 = ((chunk[0] - offset) / scale).round().clamp(-8.0, 7.0) as i8;
|
||||
let v1 = chunk.get(1)
|
||||
.map(|&v| ((v - offset) / scale).round().clamp(-8.0, 7.0) as i8)
|
||||
.unwrap_or(0);
|
||||
((v0 as u8 & 0x0F) << 4) | (v1 as u8 & 0x0F)
|
||||
})
|
||||
.collect();
|
||||
QuantizedValues::Int4 {
|
||||
packed,
|
||||
count: values.len(),
|
||||
scale,
|
||||
offset,
|
||||
}
|
||||
}
|
||||
|
||||
QuantizationLevel::Binary => {
|
||||
// Pack 8 signs per byte
|
||||
let packed: Vec<u8> = values.chunks(8)
|
||||
.map(|chunk| {
|
||||
chunk.iter().enumerate().fold(0u8, |acc, (i, &v)| {
|
||||
if v >= 0.0 {
|
||||
acc | (1 << i)
|
||||
} else {
|
||||
acc
|
||||
}
|
||||
})
|
||||
})
|
||||
.collect();
|
||||
QuantizedValues::Binary {
|
||||
packed,
|
||||
count: values.len(),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Adaptive quantization based on value distribution
|
||||
pub fn select_quantization(values: &[f32], config: &QuantizationConfig) -> QuantizationLevel {
|
||||
// Compute statistics
|
||||
let min = values.iter().cloned().fold(f32::INFINITY, f32::min);
|
||||
let max = values.iter().cloned().fold(f32::NEG_INFINITY, f32::max);
|
||||
let range = max - min;
|
||||
|
||||
// Check if values are clustered enough for aggressive quantization
|
||||
let variance = compute_variance(values);
|
||||
let coefficient_of_variation = variance.sqrt() / (values.iter().sum::<f32>() / values.len() as f32).abs();
|
||||
|
||||
if config.allow_lossy {
|
||||
if coefficient_of_variation < 0.01 {
|
||||
// Very uniform - use binary
|
||||
return QuantizationLevel::Binary;
|
||||
} else if range < 0.1 {
|
||||
// Small range - use int4
|
||||
return QuantizationLevel::Int4 {
|
||||
scale: range / 15.0,
|
||||
offset: min,
|
||||
};
|
||||
} else if range < 2.0 {
|
||||
// Medium range - use int8
|
||||
return QuantizationLevel::Int8 {
|
||||
scale: range / 255.0,
|
||||
offset: min,
|
||||
};
|
||||
} else {
|
||||
// Large range - use float16
|
||||
return QuantizationLevel::Float16;
|
||||
}
|
||||
}
|
||||
|
||||
QuantizationLevel::None
|
||||
}
|
||||
```
|
||||
|
||||
### Tier 2: Entropy Coding
|
||||
|
||||
```rust
|
||||
/// Entropy compression with algorithm selection
|
||||
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
|
||||
pub enum EntropyCodec {
|
||||
/// No entropy coding
|
||||
None,
|
||||
/// LZ4: Fastest, moderate compression
|
||||
Lz4 { level: i32 },
|
||||
/// Zstd: Balanced speed/compression
|
||||
Zstd { level: i32 },
|
||||
/// Brotli: Maximum compression (for cold storage)
|
||||
Brotli { level: u32 },
|
||||
}
|
||||
|
||||
impl EntropyCodec {
|
||||
/// Compress data
|
||||
pub fn compress(&self, data: &[u8]) -> Result<Vec<u8>> {
|
||||
match self {
|
||||
EntropyCodec::None => Ok(data.to_vec()),
|
||||
|
||||
EntropyCodec::Lz4 { level } => {
|
||||
let mut encoder = lz4_flex::frame::FrameEncoder::new(Vec::new());
|
||||
encoder.write_all(data)?;
|
||||
Ok(encoder.finish()?)
|
||||
}
|
||||
|
||||
EntropyCodec::Zstd { level } => {
|
||||
Ok(zstd::encode_all(data, *level)?)
|
||||
}
|
||||
|
||||
EntropyCodec::Brotli { level } => {
|
||||
let mut output = Vec::new();
|
||||
let mut params = brotli::enc::BrotliEncoderParams::default();
|
||||
params.quality = *level as i32;
|
||||
brotli::BrotliCompress(&mut data.as_ref(), &mut output, ¶ms)?;
|
||||
Ok(output)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Decompress data
|
||||
pub fn decompress(&self, data: &[u8]) -> Result<Vec<u8>> {
|
||||
match self {
|
||||
EntropyCodec::None => Ok(data.to_vec()),
|
||||
|
||||
EntropyCodec::Lz4 { .. } => {
|
||||
let mut decoder = lz4_flex::frame::FrameDecoder::new(data);
|
||||
let mut output = Vec::new();
|
||||
decoder.read_to_end(&mut output)?;
|
||||
Ok(output)
|
||||
}
|
||||
|
||||
EntropyCodec::Zstd { .. } => {
|
||||
Ok(zstd::decode_all(data)?)
|
||||
}
|
||||
|
||||
EntropyCodec::Brotli { .. } => {
|
||||
let mut output = Vec::new();
|
||||
brotli::BrotliDecompress(&mut data.as_ref(), &mut output)?;
|
||||
Ok(output)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Select optimal entropy codec based on requirements
|
||||
pub fn select_entropy_codec(
|
||||
size: usize,
|
||||
latency_budget: Duration,
|
||||
use_case: CompressionUseCase,
|
||||
) -> EntropyCodec {
|
||||
match use_case {
|
||||
CompressionUseCase::RealTimeNetwork => {
|
||||
// Prioritize speed
|
||||
if size < 1024 {
|
||||
EntropyCodec::None // Overhead not worth it
|
||||
} else {
|
||||
EntropyCodec::Lz4 { level: 1 }
|
||||
}
|
||||
}
|
||||
|
||||
CompressionUseCase::BatchNetwork => {
|
||||
// Balance speed and compression
|
||||
EntropyCodec::Zstd { level: 3 }
|
||||
}
|
||||
|
||||
CompressionUseCase::HotStorage => {
|
||||
// Fast decompression
|
||||
EntropyCodec::Lz4 { level: 9 }
|
||||
}
|
||||
|
||||
CompressionUseCase::ColdStorage => {
|
||||
// Maximum compression
|
||||
EntropyCodec::Brotli { level: 6 }
|
||||
}
|
||||
|
||||
CompressionUseCase::Archive => {
|
||||
// Maximum compression, slow is OK
|
||||
EntropyCodec::Brotli { level: 11 }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Tier 3: Batch Compression
|
||||
|
||||
```rust
|
||||
/// Batch-level compression optimizations
|
||||
pub struct BatchCompressor {
|
||||
/// Shared dictionary for string compression
|
||||
string_dict: DeltaDictionary,
|
||||
/// Value pattern dictionary
|
||||
value_patterns: PatternDictionary,
|
||||
/// Deduplication table
|
||||
dedup_table: DashMap<DeltaHash, DeltaId>,
|
||||
/// Configuration
|
||||
config: BatchCompressionConfig,
|
||||
}
|
||||
|
||||
impl BatchCompressor {
|
||||
/// Compress a batch of deltas
|
||||
pub fn compress_batch(&self, deltas: &[VectorDelta]) -> Result<CompressedBatch> {
|
||||
// Step 1: Deduplication
|
||||
let (unique_deltas, dedup_refs) = self.deduplicate(deltas);
|
||||
|
||||
// Step 2: Extract common patterns
|
||||
let patterns = self.extract_patterns(&unique_deltas);
|
||||
|
||||
// Step 3: Build batch-specific dictionary
|
||||
let batch_dict = self.build_batch_dictionary(&unique_deltas);
|
||||
|
||||
// Step 4: Encode deltas using patterns and dictionary
|
||||
let encoded: Vec<_> = unique_deltas.iter()
|
||||
.map(|d| self.encode_with_context(d, &patterns, &batch_dict))
|
||||
.collect();
|
||||
|
||||
// Step 5: Pack into batch format
|
||||
let packed = self.pack_batch(&encoded, &patterns, &batch_dict, &dedup_refs);
|
||||
|
||||
// Step 6: Apply entropy coding
|
||||
let compressed = self.config.entropy_codec.compress(&packed)?;
|
||||
|
||||
Ok(CompressedBatch {
|
||||
compressed_data: compressed,
|
||||
original_count: deltas.len(),
|
||||
unique_count: unique_deltas.len(),
|
||||
compression_ratio: deltas.len() as f32 * std::mem::size_of::<VectorDelta>() as f32
|
||||
/ compressed.len() as f32,
|
||||
})
|
||||
}
|
||||
|
||||
/// Deduplicate deltas (same vector, same operation)
|
||||
fn deduplicate(&self, deltas: &[VectorDelta]) -> (Vec<VectorDelta>, Vec<DedupRef>) {
|
||||
let mut unique = Vec::new();
|
||||
let mut refs = Vec::new();
|
||||
|
||||
for delta in deltas {
|
||||
let hash = compute_delta_hash(delta);
|
||||
|
||||
if let Some(existing_id) = self.dedup_table.get(&hash) {
|
||||
refs.push(DedupRef::Existing(*existing_id));
|
||||
} else {
|
||||
self.dedup_table.insert(hash, delta.delta_id.clone());
|
||||
refs.push(DedupRef::New(unique.len()));
|
||||
unique.push(delta.clone());
|
||||
}
|
||||
}
|
||||
|
||||
(unique, refs)
|
||||
}
|
||||
|
||||
/// Extract common patterns from deltas
|
||||
fn extract_patterns(&self, deltas: &[VectorDelta]) -> Vec<DeltaPattern> {
|
||||
// Find common index sets
|
||||
let mut index_freq: HashMap<Vec<u32>, u32> = HashMap::new();
|
||||
|
||||
for delta in deltas {
|
||||
if let DeltaOperation::Sparse { indices, .. } = &delta.operation {
|
||||
*index_freq.entry(indices.clone()).or_insert(0) += 1;
|
||||
}
|
||||
}
|
||||
|
||||
// Patterns that appear > threshold times
|
||||
index_freq.into_iter()
|
||||
.filter(|(_, count)| *count >= self.config.pattern_threshold)
|
||||
.map(|(indices, count)| DeltaPattern {
|
||||
indices,
|
||||
frequency: count,
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Compression Ratios and Speed
|
||||
|
||||
### Single Delta Compression
|
||||
|
||||
| Configuration | Ratio | Compress Time | Decompress Time |
|
||||
|---------------|-------|---------------|-----------------|
|
||||
| Encoding only | 1-10x | 5us | 2us |
|
||||
| + Float16 | 2-20x | 15us | 8us |
|
||||
| + Int8 | 4-40x | 20us | 10us |
|
||||
| + LZ4 | 6-50x | 50us | 20us |
|
||||
| + Zstd | 8-60x | 200us | 50us |
|
||||
|
||||
### Batch Compression (100 deltas)
|
||||
|
||||
| Configuration | Ratio | Compress Time | Decompress Time |
|
||||
|---------------|-------|---------------|-----------------|
|
||||
| Individual Zstd | 8x | 20ms | 5ms |
|
||||
| Batch + Dedup | 15x | 5ms | 2ms |
|
||||
| Batch + Patterns + Zstd | 25x | 8ms | 3ms |
|
||||
| Batch + Full Pipeline | 40x | 12ms | 4ms |
|
||||
|
||||
### Network vs Storage Tradeoffs
|
||||
|
||||
| Use Case | Target Ratio | Max Latency | Recommended |
|
||||
|----------|--------------|-------------|-------------|
|
||||
| Real-time sync | >3x | <1ms | Encode + LZ4 |
|
||||
| Batch sync | >10x | <100ms | Batch + Zstd |
|
||||
| Hot storage | >5x | <10ms | Encode + Zstd |
|
||||
| Cold storage | >20x | <1s | Full pipeline + Brotli |
|
||||
| Archive | >50x | N/A | Max compression |
|
||||
|
||||
---
|
||||
|
||||
## Considered Options
|
||||
|
||||
### Option 1: Single Codec (LZ4/Zstd)
|
||||
|
||||
**Description**: Apply one compression algorithm to everything.
|
||||
|
||||
**Pros**:
|
||||
- Simple implementation
|
||||
- Predictable performance
|
||||
- No decision overhead
|
||||
|
||||
**Cons**:
|
||||
- Suboptimal for varied data
|
||||
- Misses domain-specific opportunities
|
||||
- Either too slow or poor ratio
|
||||
|
||||
**Verdict**: Rejected - vectors benefit from tiered approach.
|
||||
|
||||
### Option 2: Learned Compression
|
||||
|
||||
**Description**: ML model learns optimal compression.
|
||||
|
||||
**Pros**:
|
||||
- Potentially optimal compression
|
||||
- Adapts to data patterns
|
||||
|
||||
**Cons**:
|
||||
- Training complexity
|
||||
- Inference overhead
|
||||
- Hard to debug
|
||||
|
||||
**Verdict**: Deferred - consider for future version.
|
||||
|
||||
### Option 3: Delta-Specific Codecs
|
||||
|
||||
**Description**: Custom codec designed for vector deltas.
|
||||
|
||||
**Pros**:
|
||||
- Maximum compression for vectors
|
||||
- No general overhead
|
||||
|
||||
**Cons**:
|
||||
- Development effort
|
||||
- Maintenance burden
|
||||
- Limited reuse
|
||||
|
||||
**Verdict**: Partially adopted - value quantization is delta-specific.
|
||||
|
||||
### Option 4: Multi-Tier Pipeline (Selected)
|
||||
|
||||
**Description**: Layer encoding, quantization, and entropy coding.
|
||||
|
||||
**Pros**:
|
||||
- Each tier optimized for its purpose
|
||||
- Configurable tradeoffs
|
||||
- Reuses proven components
|
||||
|
||||
**Cons**:
|
||||
- Configuration complexity
|
||||
- Multiple code paths
|
||||
|
||||
**Verdict**: Adopted - best balance of compression and flexibility.
|
||||
|
||||
---
|
||||
|
||||
## Technical Specification
|
||||
|
||||
### Compression API
|
||||
|
||||
```rust
|
||||
/// Delta compression pipeline
|
||||
pub struct CompressionPipeline {
|
||||
/// Encoding configuration
|
||||
encoding: EncodingConfig,
|
||||
/// Quantization settings
|
||||
quantization: QuantizationConfig,
|
||||
/// Entropy codec
|
||||
entropy: EntropyCodec,
|
||||
/// Batch compression (optional)
|
||||
batch: Option<BatchCompressor>,
|
||||
}
|
||||
|
||||
impl CompressionPipeline {
|
||||
/// Compress a single delta
|
||||
pub fn compress(&self, delta: &VectorDelta) -> Result<CompressedDelta> {
|
||||
// Tier 0: Encoding
|
||||
let encoded = encode_delta(&delta.operation, &self.encoding);
|
||||
|
||||
// Tier 1: Quantization
|
||||
let quantized = quantize_encoded(&encoded, &self.quantization);
|
||||
|
||||
// Tier 2: Entropy coding
|
||||
let compressed = self.entropy.compress(&quantized.to_bytes())?;
|
||||
|
||||
Ok(CompressedDelta {
|
||||
delta_id: delta.delta_id.clone(),
|
||||
vector_id: delta.vector_id.clone(),
|
||||
metadata: compress_metadata(&delta, &self.encoding),
|
||||
compressed_data: compressed,
|
||||
original_size: estimated_delta_size(delta),
|
||||
})
|
||||
}
|
||||
|
||||
/// Decompress a single delta
|
||||
pub fn decompress(&self, compressed: &CompressedDelta) -> Result<VectorDelta> {
|
||||
// Reverse: entropy -> quantization -> encoding
|
||||
let decoded_bytes = self.entropy.decompress(&compressed.compressed_data)?;
|
||||
let dequantized = dequantize(&decoded_bytes, &self.quantization);
|
||||
let operation = decode_delta(&dequantized, &self.encoding)?;
|
||||
|
||||
Ok(VectorDelta {
|
||||
delta_id: compressed.delta_id.clone(),
|
||||
vector_id: compressed.vector_id.clone(),
|
||||
operation,
|
||||
..decompress_metadata(&compressed.metadata)?
|
||||
})
|
||||
}
|
||||
|
||||
/// Compress batch of deltas
|
||||
pub fn compress_batch(&self, deltas: &[VectorDelta]) -> Result<CompressedBatch> {
|
||||
match &self.batch {
|
||||
Some(batch_compressor) => batch_compressor.compress_batch(deltas),
|
||||
None => {
|
||||
// Fall back to individual compression
|
||||
let compressed: Vec<_> = deltas.iter()
|
||||
.map(|d| self.compress(d))
|
||||
.collect::<Result<_>>()?;
|
||||
Ok(CompressedBatch::from_individuals(compressed))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct CompressionConfig {
|
||||
/// Enable/disable tiers
|
||||
pub enable_quantization: bool,
|
||||
pub enable_entropy: bool,
|
||||
pub enable_batch: bool,
|
||||
|
||||
/// Quantization settings
|
||||
pub quantization: QuantizationConfig,
|
||||
|
||||
/// Entropy codec selection
|
||||
pub entropy_codec: EntropyCodec,
|
||||
|
||||
/// Batch compression settings
|
||||
pub batch_config: BatchCompressionConfig,
|
||||
|
||||
/// Compression level presets
|
||||
pub preset: CompressionPreset,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
|
||||
pub enum CompressionPreset {
|
||||
/// Minimize latency
|
||||
Fastest,
|
||||
/// Balance speed and ratio
|
||||
Balanced,
|
||||
/// Maximize compression
|
||||
Maximum,
|
||||
/// Custom configuration
|
||||
Custom,
|
||||
}
|
||||
|
||||
impl Default for CompressionConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
enable_quantization: true,
|
||||
enable_entropy: true,
|
||||
enable_batch: true,
|
||||
quantization: QuantizationConfig::default(),
|
||||
entropy_codec: EntropyCodec::Zstd { level: 3 },
|
||||
batch_config: BatchCompressionConfig::default(),
|
||||
preset: CompressionPreset::Balanced,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **High Compression**: 5-50x reduction in storage and network
|
||||
2. **Configurable**: Choose speed vs ratio tradeoff
|
||||
3. **Adaptive**: Automatic format selection
|
||||
4. **Streaming**: Works with real-time delta flows
|
||||
5. **WASM Compatible**: All codecs work in browser
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Probability | Impact | Mitigation |
|
||||
|------|-------------|--------|------------|
|
||||
| Compression overhead | Medium | Medium | Fast path for small deltas |
|
||||
| Quality loss | Low | High | Lossless option always available |
|
||||
| Codec incompatibility | Low | Medium | Version headers, fallback |
|
||||
| Memory pressure | Medium | Medium | Streaming decompression |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
1. Lemire, D., & Boytsov, L. "Decoding billions of integers per second through vectorization."
|
||||
2. LZ4 Frame Format. https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md
|
||||
3. Zstandard Compression. https://facebook.github.io/zstd/
|
||||
4. ADR-DB-002: Delta Encoding Format
|
||||
|
||||
---
|
||||
|
||||
## Related Decisions
|
||||
|
||||
- **ADR-DB-001**: Delta Behavior Core Architecture
|
||||
- **ADR-DB-002**: Delta Encoding Format
|
||||
- **ADR-DB-003**: Delta Propagation Protocol
|
||||
Reference in New Issue
Block a user