Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

10 KiB

Raw Blame History

MicroLoRA - Browser-Compatible Lightweight LoRA Adaptation

MicroLoRA provides ultra-lightweight LoRA (Low-Rank Adaptation) for real-time adaptation of language models directly in web browsers.

Features

Tiny Memory Footprint: Rank 1-4 adapters use <50KB per adapter
Pure WASM: No threading, no file I/O, fully browser-compatible
Real-time Adaptation: Update weights based on user feedback with <1ms latency
Serialization: JSON-based persistence for localStorage/IndexedDB
TypeScript-Friendly: Full type definitions with getter/setter patterns

Architecture

┌─────────────────┐
│  Base LLM       │
│  (frozen)       │
└────────┬────────┘
         │
         ├──────────┐
         │          │
┌────────▼────────┐ │
│  Input          │ │
│  (768-dim)      │ │
└────────┬────────┘ │
         │          │
         ▼          │
┌─────────────────┐ │
│  LoRA A         │ │  Down projection
│  (768 x 2)      │ │  (in_features x rank)
└────────┬────────┘ │
         │          │
         ▼          │
┌─────────────────┐ │
│  Intermediate   │ │
│  (2-dim)        │ │
└────────┬────────┘ │
         │          │
         ▼          │
┌─────────────────┐ │
│  LoRA B         │ │  Up projection
│  (2 x 768)      │ │  (rank x out_features)
└────────┬────────┘ │
         │          │
         ▼          │
┌─────────────────┐ │
│  LoRA Output    │ │  Scaled by (alpha / rank)
│  (768-dim)      │ │
└────────┬────────┘ │
         │          │
         └──────────┤
                    │
         ┌──────────▼───────┐
         │  Final Output    │
         │  (base + LoRA)   │
         └──────────────────┘

Quick Start

Basic Usage

import init, { MicroLoraWasm, MicroLoraConfigWasm, AdaptFeedbackWasm } from 'ruvllm-wasm';

// Initialize WASM
await init();

// Create adapter config
const config = new MicroLoraConfigWasm();
config.rank = 2;              // Rank 1-4 (2 recommended for browser)
config.alpha = 4.0;           // Scaling factor
config.inFeatures = 768;      // Match your model's hidden size
config.outFeatures = 768;

// Create the adapter
const lora = new MicroLoraWasm(config);

// Apply LoRA to hidden states
const hiddenState = new Float32Array(768);
const output = lora.apply(hiddenState);

Real-time Adaptation

// User provides feedback on model output
const feedback = new AdaptFeedbackWasm(0.8); // Quality score [0.0, 1.0]
feedback.learningRate = 0.01;

// Adapt weights based on feedback
lora.adapt(hiddenState, feedback);

// Apply updates (can batch multiple adapt calls)
lora.applyUpdates(0.01);

// Get statistics
const stats = lora.stats();
console.log(`Average quality: ${stats.avgQuality}`);
console.log(`Samples seen: ${stats.samplesSeen}`);

Persistence

// Save to localStorage
const json = lora.toJson();
localStorage.setItem('lora-state', json);

// Restore from localStorage
const saved = localStorage.getItem('lora-state');
const restored = MicroLoraWasm.fromJson(saved);

API Reference

MicroLoraConfigWasm

Configuration for the LoRA adapter.

Properties:

rank: number - LoRA rank (1-4, clamped). Default: 2
alpha: number - Scaling factor. Default: 4.0
inFeatures: number - Input dimension. Default: 768
outFeatures: number - Output dimension. Default: 768

Methods:

memoryBytes(): number - Calculate memory footprint in bytes
computeScaling(): number - Get computed scaling (alpha / rank)

MicroLoraWasm

The main LoRA adapter.

Constructor:

new MicroLoraWasm(config: MicroLoraConfigWasm)

Methods:

apply(input: Float32Array): Float32Array - Apply LoRA transformation
adapt(input: Float32Array, feedback: AdaptFeedbackWasm): void - Accumulate gradients
applyUpdates(learningRate: number): void - Apply accumulated gradients
reset(): void - Reset to initial state
stats(): MicroLoraStatsWasm - Get adapter statistics
toJson(): string - Serialize to JSON
fromJson(json: string): MicroLoraWasm - Deserialize from JSON (static)
pendingUpdates(): number - Get number of pending gradient updates
getConfig(): MicroLoraConfigWasm - Get current configuration

AdaptFeedbackWasm

Feedback for weight adaptation.

Constructor:

new AdaptFeedbackWasm(quality: number) - Quality score [0.0, 1.0]

Properties:

quality: number - Quality/reward signal [0.0, 1.0]
learningRate: number - Learning rate. Default: 0.01

MicroLoraStatsWasm

Adapter statistics.

Properties:

samplesSeen: number - Total samples seen
avgQuality: number - Average quality score
memoryBytes: number - Memory usage in bytes
paramCount: number - Total parameter count

Methods:

toJson(): string - Convert to JSON string

Memory Footprint

Memory usage for different configurations:

Config	Memory	Parameters
Rank 1, 768×768	6KB	1,536
Rank 2, 768×768	12KB	3,072
Rank 4, 768×768	24KB	6,144
Rank 2, 512×512	8KB	2,048

Formula: (in_features × rank + rank × out_features) × 4 bytes

Use Cases

1. Personalized Chat Interface

// Adapt based on user thumbs up/down
async function handleUserFeedback(hiddenStates, wasHelpful) {
    const feedback = new AdaptFeedbackWasm(wasHelpful ? 0.9 : 0.3);
    lora.adapt(hiddenStates, feedback);

    // Apply after every 5 interactions
    if (interactionCount % 5 === 0) {
        lora.applyUpdates(0.02);

        // Persist to localStorage
        localStorage.setItem('chat-lora', lora.toJson());
    }
}

2. Domain-Specific Fine-tuning

// Adapt to technical domain over time
const conversations = [
    { input: codeHelpQuery, quality: 0.85 },
    { input: technicalExplanation, quality: 0.92 },
    // ...
];

for (const conv of conversations) {
    const feedback = new AdaptFeedbackWasm(conv.quality);
    lora.adapt(conv.input, feedback);
}

lora.applyUpdates(0.01);

3. Multi-User Adapters

// Store separate adapters per user
function getUserLora(userId) {
    const key = `lora-${userId}`;
    const saved = localStorage.getItem(key);

    if (saved) {
        return MicroLoraWasm.fromJson(saved);
    }

    const config = new MicroLoraConfigWasm();
    return new MicroLoraWasm(config);
}

function saveUserLora(userId, lora) {
    localStorage.setItem(`lora-${userId}`, lora.toJson());
}

Performance Tips

1. Batch Gradient Updates

// ❌ Bad: Update after every sample
for (const sample of samples) {
    lora.adapt(sample.input, sample.feedback);
    lora.applyUpdates(0.01); // Expensive!
}

// ✅ Good: Batch updates
for (const sample of samples) {
    lora.adapt(sample.input, sample.feedback);
}
lora.applyUpdates(0.01); // Once at the end

2. Choose Optimal Rank

Rank 1: Fastest, minimal memory (~6KB), good for simple adaptations
Rank 2: Best balance, recommended for most use cases (~12KB)
Rank 4: More expressive, use when quality matters more than size (~24KB)

3. Learning Rate Guidelines

Start with 0.01 for general use
Increase to 0.02-0.05 for faster adaptation
Decrease to 0.001-0.005 for fine-grained control
Use adaptive rates based on quality variance

const variance = computeQualityVariance(recentSamples);
const adaptiveLR = 0.01 * (1 + variance);
lora.applyUpdates(adaptiveLR);

Comparison with Full LoRA

Feature	MicroLoRA	Standard LoRA
Memory	6-24KB	50-500KB
Rank	1-4	8-64
Adaptation	Real-time (<1ms)	Batch (>100ms)
Threading	None	Multi-threaded
Platform	Browser only	Any
Gradients	Simplified	Full backprop

Browser Compatibility

Requires:

WebAssembly support
Float32Array support
localStorage for persistence (optional)

Tested on:

Chrome 90+
Firefox 88+
Safari 14+
Edge 90+

Advanced: Integration with Base Model

async function generateWithLoRA(prompt, lora) {
    // 1. Get base model output and hidden states
    const { output, hiddenStates } = await baseModel.generate(prompt);

    // 2. Apply LoRA transformation to hidden states
    const loraOutput = lora.apply(hiddenStates);

    // 3. Combine (additive)
    const finalHidden = hiddenStates.map((h, i) => h + loraOutput[i]);

    // 4. Project to tokens
    const tokens = await baseModel.projectToTokens(finalHidden);

    return tokens;
}

Troubleshooting

High Memory Usage

// Check actual memory usage
const stats = lora.stats();
console.log(`Memory: ${stats.memoryBytes} bytes`);

// If too high, reduce rank
config.rank = 1; // Instead of 2 or 4

Slow Adaptation

// Increase learning rate
feedback.learningRate = 0.05; // Instead of 0.01

// Or apply updates more frequently
if (sampleCount % 3 === 0) { // Instead of % 10
    lora.applyUpdates(0.02);
}

Quality Not Improving

// Check if feedback is balanced
const stats = lora.stats();
if (stats.avgQuality < 0.4 || stats.avgQuality > 0.9) {
    console.warn('Feedback may be too one-sided');
}

// Add quality normalization
const normalizedQuality = (rawQuality - minQuality) / (maxQuality - minQuality);
feedback.quality = normalizedQuality;

Examples

See examples/micro_lora_example.ts for complete working examples including:

Basic usage
Online learning loop
Serialization/deserialization
Browser storage integration
Multi-user scenarios

License

MIT License - see LICENSE file for details

10 KiB Raw Blame History Unescape Escape

MicroLoRA - Browser-Compatible Lightweight LoRA Adaptation

Features

Architecture

Quick Start

Basic Usage

Real-time Adaptation

Persistence

API Reference

MicroLoraConfigWasm

MicroLoraWasm

AdaptFeedbackWasm

MicroLoraStatsWasm

Memory Footprint

Use Cases

1. Personalized Chat Interface

2. Domain-Specific Fine-tuning

3. Multi-User Adapters

Performance Tips

1. Batch Gradient Updates

2. Choose Optimal Rank

3. Learning Rate Guidelines

Comparison with Full LoRA

Browser Compatibility

Advanced: Integration with Base Model

Troubleshooting

High Memory Usage

Slow Adaptation

Quality Not Improving

Examples

License

10 KiB

Raw Blame History