Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,377 @@
# MicroLoRA - Browser-Compatible Lightweight LoRA Adaptation
MicroLoRA provides ultra-lightweight LoRA (Low-Rank Adaptation) for real-time adaptation of language models directly in web browsers.
## Features
- **Tiny Memory Footprint**: Rank 1-4 adapters use <50KB per adapter
- **Pure WASM**: No threading, no file I/O, fully browser-compatible
- **Real-time Adaptation**: Update weights based on user feedback with <1ms latency
- **Serialization**: JSON-based persistence for localStorage/IndexedDB
- **TypeScript-Friendly**: Full type definitions with getter/setter patterns
## Architecture
```
┌─────────────────┐
│ Base LLM │
│ (frozen) │
└────────┬────────┘
├──────────┐
│ │
┌────────▼────────┐ │
│ Input │ │
│ (768-dim) │ │
└────────┬────────┘ │
│ │
▼ │
┌─────────────────┐ │
│ LoRA A │ │ Down projection
│ (768 x 2) │ │ (in_features x rank)
└────────┬────────┘ │
│ │
▼ │
┌─────────────────┐ │
│ Intermediate │ │
│ (2-dim) │ │
└────────┬────────┘ │
│ │
▼ │
┌─────────────────┐ │
│ LoRA B │ │ Up projection
│ (2 x 768) │ │ (rank x out_features)
└────────┬────────┘ │
│ │
▼ │
┌─────────────────┐ │
│ LoRA Output │ │ Scaled by (alpha / rank)
│ (768-dim) │ │
└────────┬────────┘ │
│ │
└──────────┤
┌──────────▼───────┐
│ Final Output │
│ (base + LoRA) │
└──────────────────┘
```
## Quick Start
### Basic Usage
```javascript
import init, { MicroLoraWasm, MicroLoraConfigWasm, AdaptFeedbackWasm } from 'ruvllm-wasm';
// Initialize WASM
await init();
// Create adapter config
const config = new MicroLoraConfigWasm();
config.rank = 2; // Rank 1-4 (2 recommended for browser)
config.alpha = 4.0; // Scaling factor
config.inFeatures = 768; // Match your model's hidden size
config.outFeatures = 768;
// Create the adapter
const lora = new MicroLoraWasm(config);
// Apply LoRA to hidden states
const hiddenState = new Float32Array(768);
const output = lora.apply(hiddenState);
```
### Real-time Adaptation
```javascript
// User provides feedback on model output
const feedback = new AdaptFeedbackWasm(0.8); // Quality score [0.0, 1.0]
feedback.learningRate = 0.01;
// Adapt weights based on feedback
lora.adapt(hiddenState, feedback);
// Apply updates (can batch multiple adapt calls)
lora.applyUpdates(0.01);
// Get statistics
const stats = lora.stats();
console.log(`Average quality: ${stats.avgQuality}`);
console.log(`Samples seen: ${stats.samplesSeen}`);
```
### Persistence
```javascript
// Save to localStorage
const json = lora.toJson();
localStorage.setItem('lora-state', json);
// Restore from localStorage
const saved = localStorage.getItem('lora-state');
const restored = MicroLoraWasm.fromJson(saved);
```
## API Reference
### MicroLoraConfigWasm
Configuration for the LoRA adapter.
**Properties:**
- `rank: number` - LoRA rank (1-4, clamped). Default: 2
- `alpha: number` - Scaling factor. Default: 4.0
- `inFeatures: number` - Input dimension. Default: 768
- `outFeatures: number` - Output dimension. Default: 768
**Methods:**
- `memoryBytes(): number` - Calculate memory footprint in bytes
- `computeScaling(): number` - Get computed scaling (alpha / rank)
### MicroLoraWasm
The main LoRA adapter.
**Constructor:**
- `new MicroLoraWasm(config: MicroLoraConfigWasm)`
**Methods:**
- `apply(input: Float32Array): Float32Array` - Apply LoRA transformation
- `adapt(input: Float32Array, feedback: AdaptFeedbackWasm): void` - Accumulate gradients
- `applyUpdates(learningRate: number): void` - Apply accumulated gradients
- `reset(): void` - Reset to initial state
- `stats(): MicroLoraStatsWasm` - Get adapter statistics
- `toJson(): string` - Serialize to JSON
- `fromJson(json: string): MicroLoraWasm` - Deserialize from JSON (static)
- `pendingUpdates(): number` - Get number of pending gradient updates
- `getConfig(): MicroLoraConfigWasm` - Get current configuration
### AdaptFeedbackWasm
Feedback for weight adaptation.
**Constructor:**
- `new AdaptFeedbackWasm(quality: number)` - Quality score [0.0, 1.0]
**Properties:**
- `quality: number` - Quality/reward signal [0.0, 1.0]
- `learningRate: number` - Learning rate. Default: 0.01
### MicroLoraStatsWasm
Adapter statistics.
**Properties:**
- `samplesSeen: number` - Total samples seen
- `avgQuality: number` - Average quality score
- `memoryBytes: number` - Memory usage in bytes
- `paramCount: number` - Total parameter count
**Methods:**
- `toJson(): string` - Convert to JSON string
## Memory Footprint
Memory usage for different configurations:
| Config | Memory | Parameters |
|--------|--------|------------|
| Rank 1, 768×768 | 6KB | 1,536 |
| Rank 2, 768×768 | 12KB | 3,072 |
| Rank 4, 768×768 | 24KB | 6,144 |
| Rank 2, 512×512 | 8KB | 2,048 |
Formula: `(in_features × rank + rank × out_features) × 4 bytes`
## Use Cases
### 1. Personalized Chat Interface
```javascript
// Adapt based on user thumbs up/down
async function handleUserFeedback(hiddenStates, wasHelpful) {
const feedback = new AdaptFeedbackWasm(wasHelpful ? 0.9 : 0.3);
lora.adapt(hiddenStates, feedback);
// Apply after every 5 interactions
if (interactionCount % 5 === 0) {
lora.applyUpdates(0.02);
// Persist to localStorage
localStorage.setItem('chat-lora', lora.toJson());
}
}
```
### 2. Domain-Specific Fine-tuning
```javascript
// Adapt to technical domain over time
const conversations = [
{ input: codeHelpQuery, quality: 0.85 },
{ input: technicalExplanation, quality: 0.92 },
// ...
];
for (const conv of conversations) {
const feedback = new AdaptFeedbackWasm(conv.quality);
lora.adapt(conv.input, feedback);
}
lora.applyUpdates(0.01);
```
### 3. Multi-User Adapters
```javascript
// Store separate adapters per user
function getUserLora(userId) {
const key = `lora-${userId}`;
const saved = localStorage.getItem(key);
if (saved) {
return MicroLoraWasm.fromJson(saved);
}
const config = new MicroLoraConfigWasm();
return new MicroLoraWasm(config);
}
function saveUserLora(userId, lora) {
localStorage.setItem(`lora-${userId}`, lora.toJson());
}
```
## Performance Tips
### 1. Batch Gradient Updates
```javascript
// ❌ Bad: Update after every sample
for (const sample of samples) {
lora.adapt(sample.input, sample.feedback);
lora.applyUpdates(0.01); // Expensive!
}
// ✅ Good: Batch updates
for (const sample of samples) {
lora.adapt(sample.input, sample.feedback);
}
lora.applyUpdates(0.01); // Once at the end
```
### 2. Choose Optimal Rank
- **Rank 1**: Fastest, minimal memory (~6KB), good for simple adaptations
- **Rank 2**: Best balance, recommended for most use cases (~12KB)
- **Rank 4**: More expressive, use when quality matters more than size (~24KB)
### 3. Learning Rate Guidelines
- Start with `0.01` for general use
- Increase to `0.02-0.05` for faster adaptation
- Decrease to `0.001-0.005` for fine-grained control
- Use adaptive rates based on quality variance
```javascript
const variance = computeQualityVariance(recentSamples);
const adaptiveLR = 0.01 * (1 + variance);
lora.applyUpdates(adaptiveLR);
```
## Comparison with Full LoRA
| Feature | MicroLoRA | Standard LoRA |
|---------|-----------|---------------|
| Memory | 6-24KB | 50-500KB |
| Rank | 1-4 | 8-64 |
| Adaptation | Real-time (<1ms) | Batch (>100ms) |
| Threading | None | Multi-threaded |
| Platform | Browser only | Any |
| Gradients | Simplified | Full backprop |
## Browser Compatibility
Requires:
- WebAssembly support
- Float32Array support
- localStorage for persistence (optional)
Tested on:
- Chrome 90+
- Firefox 88+
- Safari 14+
- Edge 90+
## Advanced: Integration with Base Model
```javascript
async function generateWithLoRA(prompt, lora) {
// 1. Get base model output and hidden states
const { output, hiddenStates } = await baseModel.generate(prompt);
// 2. Apply LoRA transformation to hidden states
const loraOutput = lora.apply(hiddenStates);
// 3. Combine (additive)
const finalHidden = hiddenStates.map((h, i) => h + loraOutput[i]);
// 4. Project to tokens
const tokens = await baseModel.projectToTokens(finalHidden);
return tokens;
}
```
## Troubleshooting
### High Memory Usage
```javascript
// Check actual memory usage
const stats = lora.stats();
console.log(`Memory: ${stats.memoryBytes} bytes`);
// If too high, reduce rank
config.rank = 1; // Instead of 2 or 4
```
### Slow Adaptation
```javascript
// Increase learning rate
feedback.learningRate = 0.05; // Instead of 0.01
// Or apply updates more frequently
if (sampleCount % 3 === 0) { // Instead of % 10
lora.applyUpdates(0.02);
}
```
### Quality Not Improving
```javascript
// Check if feedback is balanced
const stats = lora.stats();
if (stats.avgQuality < 0.4 || stats.avgQuality > 0.9) {
console.warn('Feedback may be too one-sided');
}
// Add quality normalization
const normalizedQuality = (rawQuality - minQuality) / (maxQuality - minQuality);
feedback.quality = normalizedQuality;
```
## Examples
See `examples/micro_lora_example.ts` for complete working examples including:
- Basic usage
- Online learning loop
- Serialization/deserialization
- Browser storage integration
- Multi-user scenarios
## License
MIT License - see LICENSE file for details