wifi-densepose/crates/ruvllm-wasm/docs/MICRO_LORA.md

# MicroLoRA - Browser-Compatible Lightweight LoRA Adaptation

MicroLoRA provides ultra-lightweight LoRA (Low-Rank Adaptation) for real-time adaptation of language models directly in web browsers.

## Features

- **Tiny Memory Footprint**: Rank 1-4 adapters use <50KB per adapter
- **Pure WASM**: No threading, no file I/O, fully browser-compatible
- **Real-time Adaptation**: Update weights based on user feedback with <1ms latency
- **Serialization**: JSON-based persistence for localStorage/IndexedDB
- **TypeScript-Friendly**: Full type definitions with getter/setter patterns

## Architecture

```
┌─────────────────┐
│  Base LLM       │
│  (frozen)       │
└────────┬────────┘
         │
         ├──────────┐
         │          │
┌────────▼────────┐ │
│  Input          │ │
│  (768-dim)      │ │
└────────┬────────┘ │
         │          │
         ▼          │
┌─────────────────┐ │
│  LoRA A         │ │  Down projection
│  (768 x 2)      │ │  (in_features x rank)
└────────┬────────┘ │
         │          │
         ▼          │
┌─────────────────┐ │
│  Intermediate   │ │
│  (2-dim)        │ │
└────────┬────────┘ │
         │          │
         ▼          │
┌─────────────────┐ │
│  LoRA B         │ │  Up projection
│  (2 x 768)      │ │  (rank x out_features)
└────────┬────────┘ │
         │          │
         ▼          │
┌─────────────────┐ │
│  LoRA Output    │ │  Scaled by (alpha / rank)
│  (768-dim)      │ │
└────────┬────────┘ │
         │          │
         └──────────┤
                    │
         ┌──────────▼───────┐
         │  Final Output    │
         │  (base + LoRA)   │
         └──────────────────┘
```

## Quick Start

### Basic Usage

```javascript
import init, { MicroLoraWasm, MicroLoraConfigWasm, AdaptFeedbackWasm } from 'ruvllm-wasm';

// Initialize WASM
await init();

// Create adapter config
const config = new MicroLoraConfigWasm();
config.rank = 2;              // Rank 1-4 (2 recommended for browser)
config.alpha = 4.0;           // Scaling factor
config.inFeatures = 768;      // Match your model's hidden size
config.outFeatures = 768;

// Create the adapter
const lora = new MicroLoraWasm(config);

// Apply LoRA to hidden states
const hiddenState = new Float32Array(768);
const output = lora.apply(hiddenState);
```

### Real-time Adaptation

```javascript
// User provides feedback on model output
const feedback = new AdaptFeedbackWasm(0.8); // Quality score [0.0, 1.0]
feedback.learningRate = 0.01;

// Adapt weights based on feedback
lora.adapt(hiddenState, feedback);

// Apply updates (can batch multiple adapt calls)
lora.applyUpdates(0.01);

// Get statistics
const stats = lora.stats();
console.log(`Average quality: ${stats.avgQuality}`);
console.log(`Samples seen: ${stats.samplesSeen}`);
```

### Persistence

```javascript
// Save to localStorage
const json = lora.toJson();
localStorage.setItem('lora-state', json);

// Restore from localStorage
const saved = localStorage.getItem('lora-state');
const restored = MicroLoraWasm.fromJson(saved);
```

## API Reference

### MicroLoraConfigWasm

Configuration for the LoRA adapter.

**Properties:**
- `rank: number` - LoRA rank (1-4, clamped). Default: 2
- `alpha: number` - Scaling factor. Default: 4.0
- `inFeatures: number` - Input dimension. Default: 768
- `outFeatures: number` - Output dimension. Default: 768

**Methods:**
- `memoryBytes(): number` - Calculate memory footprint in bytes
- `computeScaling(): number` - Get computed scaling (alpha / rank)

### MicroLoraWasm

The main LoRA adapter.

**Constructor:**
- `new MicroLoraWasm(config: MicroLoraConfigWasm)`

**Methods:**
- `apply(input: Float32Array): Float32Array` - Apply LoRA transformation
- `adapt(input: Float32Array, feedback: AdaptFeedbackWasm): void` - Accumulate gradients
- `applyUpdates(learningRate: number): void` - Apply accumulated gradients
- `reset(): void` - Reset to initial state
- `stats(): MicroLoraStatsWasm` - Get adapter statistics
- `toJson(): string` - Serialize to JSON
- `fromJson(json: string): MicroLoraWasm` - Deserialize from JSON (static)
- `pendingUpdates(): number` - Get number of pending gradient updates
- `getConfig(): MicroLoraConfigWasm` - Get current configuration

### AdaptFeedbackWasm

Feedback for weight adaptation.

**Constructor:**
- `new AdaptFeedbackWasm(quality: number)` - Quality score [0.0, 1.0]

**Properties:**
- `quality: number` - Quality/reward signal [0.0, 1.0]
- `learningRate: number` - Learning rate. Default: 0.01

### MicroLoraStatsWasm

Adapter statistics.

**Properties:**
- `samplesSeen: number` - Total samples seen
- `avgQuality: number` - Average quality score
- `memoryBytes: number` - Memory usage in bytes
- `paramCount: number` - Total parameter count

**Methods:**
- `toJson(): string` - Convert to JSON string

## Memory Footprint

Memory usage for different configurations:

| Config | Memory | Parameters |
|--------|--------|------------|
| Rank 1, 768×768 | 6KB | 1,536 |
| Rank 2, 768×768 | 12KB | 3,072 |
| Rank 4, 768×768 | 24KB | 6,144 |
| Rank 2, 512×512 | 8KB | 2,048 |

Formula: `(in_features × rank + rank × out_features) × 4 bytes`

## Use Cases

### 1. Personalized Chat Interface

```javascript
// Adapt based on user thumbs up/down
async function handleUserFeedback(hiddenStates, wasHelpful) {
    const feedback = new AdaptFeedbackWasm(wasHelpful ? 0.9 : 0.3);
    lora.adapt(hiddenStates, feedback);

    // Apply after every 5 interactions
    if (interactionCount % 5 === 0) {
        lora.applyUpdates(0.02);

        // Persist to localStorage
        localStorage.setItem('chat-lora', lora.toJson());
    }
}
```

### 2. Domain-Specific Fine-tuning

```javascript
// Adapt to technical domain over time
const conversations = [
    { input: codeHelpQuery, quality: 0.85 },
    { input: technicalExplanation, quality: 0.92 },
    // ...
];

for (const conv of conversations) {
    const feedback = new AdaptFeedbackWasm(conv.quality);
    lora.adapt(conv.input, feedback);
}

lora.applyUpdates(0.01);
```

### 3. Multi-User Adapters

```javascript
// Store separate adapters per user
function getUserLora(userId) {
    const key = `lora-${userId}`;
    const saved = localStorage.getItem(key);

    if (saved) {
        return MicroLoraWasm.fromJson(saved);
    }

    const config = new MicroLoraConfigWasm();
    return new MicroLoraWasm(config);
}

function saveUserLora(userId, lora) {
    localStorage.setItem(`lora-${userId}`, lora.toJson());
}
```

## Performance Tips

### 1. Batch Gradient Updates

```javascript
// ❌ Bad: Update after every sample
for (const sample of samples) {
    lora.adapt(sample.input, sample.feedback);
    lora.applyUpdates(0.01); // Expensive!
}

// ✅ Good: Batch updates
for (const sample of samples) {
    lora.adapt(sample.input, sample.feedback);
}
lora.applyUpdates(0.01); // Once at the end
```

### 2. Choose Optimal Rank

- **Rank 1**: Fastest, minimal memory (~6KB), good for simple adaptations
- **Rank 2**: Best balance, recommended for most use cases (~12KB)
- **Rank 4**: More expressive, use when quality matters more than size (~24KB)

### 3. Learning Rate Guidelines

- Start with `0.01` for general use
- Increase to `0.02-0.05` for faster adaptation
- Decrease to `0.001-0.005` for fine-grained control
- Use adaptive rates based on quality variance

```javascript
const variance = computeQualityVariance(recentSamples);
const adaptiveLR = 0.01 * (1 + variance);
lora.applyUpdates(adaptiveLR);
```

## Comparison with Full LoRA

| Feature | MicroLoRA | Standard LoRA |
|---------|-----------|---------------|
| Memory | 6-24KB | 50-500KB |
| Rank | 1-4 | 8-64 |
| Adaptation | Real-time (<1ms) | Batch (>100ms) |
| Threading | None | Multi-threaded |
| Platform | Browser only | Any |
| Gradients | Simplified | Full backprop |

## Browser Compatibility

Requires:
- WebAssembly support
- Float32Array support
- localStorage for persistence (optional)

Tested on:
- Chrome 90+
- Firefox 88+
- Safari 14+
- Edge 90+

## Advanced: Integration with Base Model

```javascript
async function generateWithLoRA(prompt, lora) {
    // 1. Get base model output and hidden states
    const { output, hiddenStates } = await baseModel.generate(prompt);

    // 2. Apply LoRA transformation to hidden states
    const loraOutput = lora.apply(hiddenStates);

    // 3. Combine (additive)
    const finalHidden = hiddenStates.map((h, i) => h + loraOutput[i]);

    // 4. Project to tokens
    const tokens = await baseModel.projectToTokens(finalHidden);

    return tokens;
}
```

## Troubleshooting

### High Memory Usage

```javascript
// Check actual memory usage
const stats = lora.stats();
console.log(`Memory: ${stats.memoryBytes} bytes`);

// If too high, reduce rank
config.rank = 1; // Instead of 2 or 4
```

### Slow Adaptation

```javascript
// Increase learning rate
feedback.learningRate = 0.05; // Instead of 0.01

// Or apply updates more frequently
if (sampleCount % 3 === 0) { // Instead of % 10
    lora.applyUpdates(0.02);
}
```

### Quality Not Improving

```javascript
// Check if feedback is balanced
const stats = lora.stats();
if (stats.avgQuality < 0.4 || stats.avgQuality > 0.9) {
    console.warn('Feedback may be too one-sided');
}

// Add quality normalization
const normalizedQuality = (rawQuality - minQuality) / (maxQuality - minQuality);
feedback.quality = normalizedQuality;
```

## Examples

See `examples/micro_lora_example.ts` for complete working examples including:
- Basic usage
- Online learning loop
- Serialization/deserialization
- Browser storage integration
- Multi-user scenarios

## License

MIT License - see LICENSE file for details