Files
wifi-densepose/crates/ruvllm-wasm/docs/MICRO_LORA.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

378 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# MicroLoRA - Browser-Compatible Lightweight LoRA Adaptation
MicroLoRA provides ultra-lightweight LoRA (Low-Rank Adaptation) for real-time adaptation of language models directly in web browsers.
## Features
- **Tiny Memory Footprint**: Rank 1-4 adapters use <50KB per adapter
- **Pure WASM**: No threading, no file I/O, fully browser-compatible
- **Real-time Adaptation**: Update weights based on user feedback with <1ms latency
- **Serialization**: JSON-based persistence for localStorage/IndexedDB
- **TypeScript-Friendly**: Full type definitions with getter/setter patterns
## Architecture
```
┌─────────────────┐
│ Base LLM │
│ (frozen) │
└────────┬────────┘
├──────────┐
│ │
┌────────▼────────┐ │
│ Input │ │
│ (768-dim) │ │
└────────┬────────┘ │
│ │
▼ │
┌─────────────────┐ │
│ LoRA A │ │ Down projection
│ (768 x 2) │ │ (in_features x rank)
└────────┬────────┘ │
│ │
▼ │
┌─────────────────┐ │
│ Intermediate │ │
│ (2-dim) │ │
└────────┬────────┘ │
│ │
▼ │
┌─────────────────┐ │
│ LoRA B │ │ Up projection
│ (2 x 768) │ │ (rank x out_features)
└────────┬────────┘ │
│ │
▼ │
┌─────────────────┐ │
│ LoRA Output │ │ Scaled by (alpha / rank)
│ (768-dim) │ │
└────────┬────────┘ │
│ │
└──────────┤
┌──────────▼───────┐
│ Final Output │
│ (base + LoRA) │
└──────────────────┘
```
## Quick Start
### Basic Usage
```javascript
import init, { MicroLoraWasm, MicroLoraConfigWasm, AdaptFeedbackWasm } from 'ruvllm-wasm';
// Initialize WASM
await init();
// Create adapter config
const config = new MicroLoraConfigWasm();
config.rank = 2; // Rank 1-4 (2 recommended for browser)
config.alpha = 4.0; // Scaling factor
config.inFeatures = 768; // Match your model's hidden size
config.outFeatures = 768;
// Create the adapter
const lora = new MicroLoraWasm(config);
// Apply LoRA to hidden states
const hiddenState = new Float32Array(768);
const output = lora.apply(hiddenState);
```
### Real-time Adaptation
```javascript
// User provides feedback on model output
const feedback = new AdaptFeedbackWasm(0.8); // Quality score [0.0, 1.0]
feedback.learningRate = 0.01;
// Adapt weights based on feedback
lora.adapt(hiddenState, feedback);
// Apply updates (can batch multiple adapt calls)
lora.applyUpdates(0.01);
// Get statistics
const stats = lora.stats();
console.log(`Average quality: ${stats.avgQuality}`);
console.log(`Samples seen: ${stats.samplesSeen}`);
```
### Persistence
```javascript
// Save to localStorage
const json = lora.toJson();
localStorage.setItem('lora-state', json);
// Restore from localStorage
const saved = localStorage.getItem('lora-state');
const restored = MicroLoraWasm.fromJson(saved);
```
## API Reference
### MicroLoraConfigWasm
Configuration for the LoRA adapter.
**Properties:**
- `rank: number` - LoRA rank (1-4, clamped). Default: 2
- `alpha: number` - Scaling factor. Default: 4.0
- `inFeatures: number` - Input dimension. Default: 768
- `outFeatures: number` - Output dimension. Default: 768
**Methods:**
- `memoryBytes(): number` - Calculate memory footprint in bytes
- `computeScaling(): number` - Get computed scaling (alpha / rank)
### MicroLoraWasm
The main LoRA adapter.
**Constructor:**
- `new MicroLoraWasm(config: MicroLoraConfigWasm)`
**Methods:**
- `apply(input: Float32Array): Float32Array` - Apply LoRA transformation
- `adapt(input: Float32Array, feedback: AdaptFeedbackWasm): void` - Accumulate gradients
- `applyUpdates(learningRate: number): void` - Apply accumulated gradients
- `reset(): void` - Reset to initial state
- `stats(): MicroLoraStatsWasm` - Get adapter statistics
- `toJson(): string` - Serialize to JSON
- `fromJson(json: string): MicroLoraWasm` - Deserialize from JSON (static)
- `pendingUpdates(): number` - Get number of pending gradient updates
- `getConfig(): MicroLoraConfigWasm` - Get current configuration
### AdaptFeedbackWasm
Feedback for weight adaptation.
**Constructor:**
- `new AdaptFeedbackWasm(quality: number)` - Quality score [0.0, 1.0]
**Properties:**
- `quality: number` - Quality/reward signal [0.0, 1.0]
- `learningRate: number` - Learning rate. Default: 0.01
### MicroLoraStatsWasm
Adapter statistics.
**Properties:**
- `samplesSeen: number` - Total samples seen
- `avgQuality: number` - Average quality score
- `memoryBytes: number` - Memory usage in bytes
- `paramCount: number` - Total parameter count
**Methods:**
- `toJson(): string` - Convert to JSON string
## Memory Footprint
Memory usage for different configurations:
| Config | Memory | Parameters |
|--------|--------|------------|
| Rank 1, 768×768 | 6KB | 1,536 |
| Rank 2, 768×768 | 12KB | 3,072 |
| Rank 4, 768×768 | 24KB | 6,144 |
| Rank 2, 512×512 | 8KB | 2,048 |
Formula: `(in_features × rank + rank × out_features) × 4 bytes`
## Use Cases
### 1. Personalized Chat Interface
```javascript
// Adapt based on user thumbs up/down
async function handleUserFeedback(hiddenStates, wasHelpful) {
const feedback = new AdaptFeedbackWasm(wasHelpful ? 0.9 : 0.3);
lora.adapt(hiddenStates, feedback);
// Apply after every 5 interactions
if (interactionCount % 5 === 0) {
lora.applyUpdates(0.02);
// Persist to localStorage
localStorage.setItem('chat-lora', lora.toJson());
}
}
```
### 2. Domain-Specific Fine-tuning
```javascript
// Adapt to technical domain over time
const conversations = [
{ input: codeHelpQuery, quality: 0.85 },
{ input: technicalExplanation, quality: 0.92 },
// ...
];
for (const conv of conversations) {
const feedback = new AdaptFeedbackWasm(conv.quality);
lora.adapt(conv.input, feedback);
}
lora.applyUpdates(0.01);
```
### 3. Multi-User Adapters
```javascript
// Store separate adapters per user
function getUserLora(userId) {
const key = `lora-${userId}`;
const saved = localStorage.getItem(key);
if (saved) {
return MicroLoraWasm.fromJson(saved);
}
const config = new MicroLoraConfigWasm();
return new MicroLoraWasm(config);
}
function saveUserLora(userId, lora) {
localStorage.setItem(`lora-${userId}`, lora.toJson());
}
```
## Performance Tips
### 1. Batch Gradient Updates
```javascript
// ❌ Bad: Update after every sample
for (const sample of samples) {
lora.adapt(sample.input, sample.feedback);
lora.applyUpdates(0.01); // Expensive!
}
// ✅ Good: Batch updates
for (const sample of samples) {
lora.adapt(sample.input, sample.feedback);
}
lora.applyUpdates(0.01); // Once at the end
```
### 2. Choose Optimal Rank
- **Rank 1**: Fastest, minimal memory (~6KB), good for simple adaptations
- **Rank 2**: Best balance, recommended for most use cases (~12KB)
- **Rank 4**: More expressive, use when quality matters more than size (~24KB)
### 3. Learning Rate Guidelines
- Start with `0.01` for general use
- Increase to `0.02-0.05` for faster adaptation
- Decrease to `0.001-0.005` for fine-grained control
- Use adaptive rates based on quality variance
```javascript
const variance = computeQualityVariance(recentSamples);
const adaptiveLR = 0.01 * (1 + variance);
lora.applyUpdates(adaptiveLR);
```
## Comparison with Full LoRA
| Feature | MicroLoRA | Standard LoRA |
|---------|-----------|---------------|
| Memory | 6-24KB | 50-500KB |
| Rank | 1-4 | 8-64 |
| Adaptation | Real-time (<1ms) | Batch (>100ms) |
| Threading | None | Multi-threaded |
| Platform | Browser only | Any |
| Gradients | Simplified | Full backprop |
## Browser Compatibility
Requires:
- WebAssembly support
- Float32Array support
- localStorage for persistence (optional)
Tested on:
- Chrome 90+
- Firefox 88+
- Safari 14+
- Edge 90+
## Advanced: Integration with Base Model
```javascript
async function generateWithLoRA(prompt, lora) {
// 1. Get base model output and hidden states
const { output, hiddenStates } = await baseModel.generate(prompt);
// 2. Apply LoRA transformation to hidden states
const loraOutput = lora.apply(hiddenStates);
// 3. Combine (additive)
const finalHidden = hiddenStates.map((h, i) => h + loraOutput[i]);
// 4. Project to tokens
const tokens = await baseModel.projectToTokens(finalHidden);
return tokens;
}
```
## Troubleshooting
### High Memory Usage
```javascript
// Check actual memory usage
const stats = lora.stats();
console.log(`Memory: ${stats.memoryBytes} bytes`);
// If too high, reduce rank
config.rank = 1; // Instead of 2 or 4
```
### Slow Adaptation
```javascript
// Increase learning rate
feedback.learningRate = 0.05; // Instead of 0.01
// Or apply updates more frequently
if (sampleCount % 3 === 0) { // Instead of % 10
lora.applyUpdates(0.02);
}
```
### Quality Not Improving
```javascript
// Check if feedback is balanced
const stats = lora.stats();
if (stats.avgQuality < 0.4 || stats.avgQuality > 0.9) {
console.warn('Feedback may be too one-sided');
}
// Add quality normalization
const normalizedQuality = (rawQuality - minQuality) / (maxQuality - minQuality);
feedback.quality = normalizedQuality;
```
## Examples
See `examples/micro_lora_example.ts` for complete working examples including:
- Basic usage
- Online learning loop
- Serialization/deserialization
- Browser storage integration
- Multi-user scenarios
## License
MIT License - see LICENSE file for details