wifi-densepose/examples/edge-net/docs/architecture/MODEL_OPTIMIZATION_DISTRIBUTION.md

# RuVector Edge-Net Model Optimization and Distribution System

## Architecture Document v1.0

### Executive Summary

This document defines a comprehensive model optimization and distribution system for RuVector Edge-Net, enabling browser-based AI inference with optimized models delivered via CDN and IPFS. The system supports end-user fine-tuning through MicroLoRA adapters and provides a complete CLI/SDK for model management.

---

## 1. System Architecture Overview

```
+==============================================================================+
|                    MODEL OPTIMIZATION & DISTRIBUTION SYSTEM                   |
+==============================================================================+
|                                                                              |
|  +---------------------------+     +---------------------------+             |
|  |   OPTIMIZATION PIPELINE   |     |   DISTRIBUTION SYSTEM     |             |
|  +---------------------------+     +---------------------------+             |
|  |                           |     |                           |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |  | Quantization Engine |  |     |  | Google Cloud CDN    |  |             |
|  |  | - INT8/INT4/FP16    |  |     |  | - Primary delivery  |  |             |
|  |  | - GPTQ/AWQ/GGML     |  |---->|  | - Global edge PoPs  |  |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |           |               |     |           |               |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |  | Pruning & Distill   |  |     |  | IPFS Gateway        |  |             |
|  |  | - Unstructured      |  |     |  | - Decentralized     |  |             |
|  |  | - Knowledge distill |  |     |  | - Content-addressed |  |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |           |               |     |           |               |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |  | ONNX Export         |  |     |  | Model Registry      |  |             |
|  |  | - Graph optimization|  |     |  | - Versioning        |  |             |
|  |  | - WASM compatible   |  |     |  | - Checksums/Sigs    |  |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |                           |     |                           |             |
|  +---------------------------+     +---------------------------+             |
|                                                                              |
|  +---------------------------+     +---------------------------+             |
|  |   MICROLORA SYSTEM        |     |   CLI & SDK               |             |
|  +---------------------------+     +---------------------------+             |
|  |                           |     |                           |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |  | Browser Training    |  |     |  | edge-net-models CLI |  |             |
|  |  | - WebGPU/WASM       |  |     |  | - pull/push/list    |  |             |
|  |  | - Rank 1-16 adapters|  |<--->|  | - optimize/verify   |  |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |           |               |     |           |               |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |  | Adapter Merging     |  |     |  | JavaScript SDK      |  |             |
|  |  | - TIES-Merging      |  |     |  | - ModelLoader       |  |             |
|  |  | - DARE weights      |  |     |  | - Cache/Fallbacks   |  |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |           |               |     |           |               |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |  | P2P Adapter Sharing |  |     |  | Model Loader        |  |             |
|  |  | - Reputation-gated  |  |     |  | - Progressive load  |  |             |
|  |  | - Domain-specific   |  |     |  | - Streaming decode  |  |             |
|  |  +---------------------+  |     |  +---------------------+  |             |
|  |                           |     |                           |             |
|  +---------------------------+     +---------------------------+             |
|                                                                              |
+==============================================================================+
```

---

## 2. Supported Models

### 2.1 Language Models (Text Generation)

| Model | Params | Quantized Size | Use Case |
|-------|--------|----------------|----------|
| **Phi-1.5** | 1.3B | 700MB (Q4) | Code generation, reasoning |
| **Phi-2** | 2.7B | 1.4GB (Q4) | General text, complex tasks |
| **Qwen-0.5B** | 0.5B | 280MB (Q4) | Fast inference, edge |
| **Gemma-2B** | 2B | 1.1GB (Q4) | General purpose |

### 2.2 Embedding Models (Semantic Search)

| Model | Dims | Quantized Size | Use Case |
|-------|------|----------------|----------|
| **MiniLM-L6-v2** | 384 | 23MB (Q8) | Fast semantic search |
| **E5-small** | 384 | 34MB (Q8) | High-quality embeddings |
| **BGE-small-en** | 384 | 34MB (Q8) | Retrieval-optimized |
| **GTE-small** | 384 | 34MB (Q8) | General text embeddings |

### 2.3 Vision Models (Future)

| Model | Params | Quantized Size | Use Case |
|-------|--------|----------------|----------|
| **SigLIP-small** | 86M | 45MB (Q8) | Image understanding |
| **MobileCLIP** | 65M | 35MB (Q8) | Fast image-text matching |

---

## 3. Model Optimization Pipeline

### 3.1 Quantization Engine

```
+--------------------------------------------------------------------------+
|                         QUANTIZATION PIPELINE                             |
+--------------------------------------------------------------------------+
|                                                                          |
|   Source Model (FP32/FP16)                                               |
|         |                                                                |
|         v                                                                |
|   +----------------+     +----------------+     +----------------+       |
|   | Calibration    |---->| Quantization   |---->| Validation     |       |
|   | Dataset        |     | Algorithm      |     | (Perplexity)   |       |
|   +----------------+     +----------------+     +----------------+       |
|         |                      |                      |                  |
|         v                      v                      v                  |
|   +-----------+          +-----------+          +-----------+           |
|   | WikiText  |          | INT8      |          | PPL < 10  |           |
|   | C4        |          | INT4      |          | Accuracy  |           |
|   | Custom    |          | FP16      |          | Latency   |           |
|   +-----------+          +-----------+          +-----------+           |
|                                |                                         |
|                                v                                         |
|                    +------------------------+                            |
|                    | Optimized ONNX Model   |                            |
|                    | - Quantized weights    |                            |
|                    | - Fused operators      |                            |
|                    | - WASM compatible      |                            |
|                    +------------------------+                            |
|                                                                          |
+--------------------------------------------------------------------------+
```

### 3.2 Quantization Formats

| Format | Bits | Memory Reduction | Speed | Quality |
|--------|------|------------------|-------|---------|
| **FP16** | 16 | 2x | 1.2x | ~100% |
| **INT8** | 8 | 4x | 1.5x | 99%+ |
| **INT4** | 4 | 8x | 2x | 95-98% |
| **GPTQ** | 4 | 8x | 1.8x | 98%+ |
| **AWQ** | 4 | 8x | 2x | 97%+ |
| **GGML Q4_K** | 4 | 8x | 2.2x | 96%+ |

### 3.3 Optimization Configuration

```typescript
interface OptimizationConfig {
  // Quantization settings
  quantization: {
    format: 'INT8' | 'INT4' | 'FP16' | 'GPTQ' | 'AWQ';
    calibrationSamples: number;        // 512 recommended
    groupSize: number;                  // 128 for 4-bit
    dampingFactor: number;              // 0.01 for GPTQ
    useActivationQuantization: boolean; // Dynamic INT8 activations
  };

  // Pruning settings (optional)
  pruning?: {
    method: 'magnitude' | 'movement' | 'sparsegpt';
    targetSparsity: number;             // 0.0-0.9
    blockSize: number;                  // 4 or 8
  };

  // ONNX export settings
  onnx: {
    opset: number;                      // 17 recommended
    dynamicAxes: boolean;               // For variable seq length
    simplify: boolean;                  // Graph optimization
    optimizationLevel: 'basic' | 'extended' | 'all';
  };

  // Target runtime
  target: 'browser' | 'node' | 'edge';

  // Output artifacts
  outputs: {
    model: boolean;                     // Main ONNX model
    tokenizer: boolean;                 // Tokenizer config
    config: boolean;                    // Model config
    adapters: boolean;                  // LoRA adapter slots
  };
}
```

---

## 4. Distribution System

### 4.1 Storage Architecture

```
+--------------------------------------------------------------------------+
|                      MULTI-TIER STORAGE ARCHITECTURE                      |
+--------------------------------------------------------------------------+
|                                                                          |
|   +------------------------+                                             |
|   |    MODEL REGISTRY      |  (PostgreSQL + Redis)                       |
|   |    - Metadata          |                                             |
|   |    - Versions          |                                             |
|   |    - Checksums         |                                             |
|   |    - Signatures        |                                             |
|   +------------------------+                                             |
|              |                                                           |
|              v                                                           |
|   +------------------------+    +------------------------+               |
|   |  PRIMARY: GCS CDN      |    |  BACKUP: IPFS          |               |
|   |  - Fastest delivery    |    |  - Decentralized       |               |
|   |  - Global edge caching |<-->|  - Content-addressed   |               |
|   |  - 99.99% SLA          |    |  - Censorship-resistant|               |
|   +------------------------+    +------------------------+               |
|              |                              |                            |
|              v                              v                            |
|   +------------------------+    +------------------------+               |
|   |  Browser Cache         |    |  P2P Cache (WebRTC)    |               |
|   |  - IndexedDB           |    |  - Peer-to-peer        |               |
|   |  - Service Worker      |    |  - BitTorrent-style    |               |
|   |  - Cache API           |    |  - Gossip protocol     |               |
|   +------------------------+    +------------------------+               |
|                                                                          |
+--------------------------------------------------------------------------+
```

### 4.2 CDN Configuration (Google Cloud Storage)

```yaml
# bucket: edge-net-models.storage.googleapis.com
storage:
  bucket: edge-net-models
  region: us-central1
  class: STANDARD  # Multi-region for hot data

cdn:
  enabled: true
  cacheMode: CACHE_ALL_STATIC
  defaultTtl: 86400        # 24 hours
  maxTtl: 2592000          # 30 days
  negativeCaching: true
  signedUrls: true         # For private models

compression:
  enabled: true
  types:
    - application/octet-stream  # ONNX models
    - application/json          # Tokenizers/configs

cors:
  origins:
    - "*"
  methods: ["GET", "HEAD", "OPTIONS"]
  responseHeaders:
    - Content-Type
    - Content-Length
    - Content-Encoding
    - X-Model-Version
    - X-Model-Checksum
```

### 4.3 IPFS Integration

```typescript
interface IPFSModelConfig {
  // IPFS gateway endpoints (fallback order)
  gateways: [
    'https://cloudflare-ipfs.com/ipfs/',
    'https://ipfs.io/ipfs/',
    'https://gateway.pinata.cloud/ipfs/',
    'https://w3s.link/ipfs/',
  ];

  // Pinning services
  pinning: {
    primary: 'pinata';    // Pinata Cloud
    backup: 'web3storage'; // web3.storage
  };

  // Content addressing
  cid: {
    version: 1;           // CIDv1
    codec: 'dag-pb';      // For chunked files
    hashAlg: 'sha2-256';
  };

  // Chunking for large models
  chunking: {
    enabled: true;
    chunkSize: 262144;    // 256KB chunks
    parallel: 4;          // Parallel downloads
  };
}
```

### 4.4 Model Manifest Format

```json
{
  "$schema": "https://edge-net.ruvector.dev/schemas/model-manifest-v1.json",
  "id": "phi-1.5-q4",
  "name": "Phi-1.5 INT4 Quantized",
  "version": "1.0.0",
  "type": "text-generation",
  "family": "phi",

  "architecture": {
    "type": "transformer",
    "layers": 24,
    "hiddenSize": 2048,
    "vocabSize": 51200,
    "contextLength": 2048
  },

  "optimization": {
    "quantization": "INT4",
    "format": "GPTQ",
    "groupSize": 128,
    "originalSize": "2.7GB",
    "optimizedSize": "700MB"
  },

  "artifacts": {
    "model": {
      "filename": "model.onnx",
      "size": 734003200,
      "sha256": "a3f8c2d1e4b5...",
      "signature": "MEUCIQDx..."
    },
    "tokenizer": {
      "filename": "tokenizer.json",
      "size": 2500000,
      "sha256": "b4e9d3c2a1f6..."
    },
    "config": {
      "filename": "config.json",
      "size": 1024,
      "sha256": "c5f0e4d3b2a7..."
    }
  },

  "distribution": {
    "primary": {
      "type": "gcs",
      "baseUrl": "https://storage.googleapis.com/edge-net-models/phi-1.5-q4/v1.0.0/",
      "region": "us-central1"
    },
    "fallback": {
      "type": "ipfs",
      "cid": "bafybeig5yh2lj3k4m5n6o7p8q9r0s1t2u3v4w5x6y7z8a9b0c1d2e3f4g5h6",
      "gateways": ["cloudflare-ipfs.com", "ipfs.io"]
    }
  },

  "requirements": {
    "minMemory": 1073741824,   // 1GB
    "simdRequired": false,
    "webgpuOptional": true,
    "browserSupport": ["chrome-88", "firefox-89", "safari-15"]
  },

  "adapters": {
    "baseRank": 8,
    "microRank": 2,
    "slots": ["reasoning", "coding", "chat", "custom"]
  },

  "benchmarks": {
    "perplexity": 9.2,
    "tokensPerSecond": {
      "browser": 12,
      "node": 45,
      "webgpu": 85
    },
    "memoryUsage": 850000000
  },

  "license": "MIT",
  "homepage": "https://github.com/ruvnet/ruvector",
  "publishedAt": "2026-01-03T00:00:00Z",
  "publishedBy": "ruvector-team"
}
```

---

## 5. MicroLoRA Customization System

### 5.1 Architecture Overview

```
+--------------------------------------------------------------------------+
|                         MICROLORA CUSTOMIZATION                           |
+--------------------------------------------------------------------------+
|                                                                          |
|   +----------------------------+                                         |
|   |     BROWSER TRAINING       |                                         |
|   +----------------------------+                                         |
|   |                            |                                         |
|   |  User Data                 |                                         |
|   |    |                       |                                         |
|   |    v                       |                                         |
|   |  +----------------------+  |                                         |
|   |  | Training Pipeline    |  |     +-------------------+               |
|   |  | - WebGPU/WASM        |  |     | Adapter Storage   |               |
|   |  | - Gradient accum     |  |---->| - IndexedDB       |               |
|   |  | - Low-rank updates   |  |     | - P2P sharing     |               |
|   |  +----------------------+  |     +-------------------+               |
|   |           |                |              |                          |
|   |           v                |              v                          |
|   |  +----------------------+  |     +-------------------+               |
|   |  | LoRA Adapter         |  |     | Adapter Pool      |               |
|   |  | - Rank 1-16          |  |     | - LRU eviction    |               |
|   |  | - ~1-10MB per adapter|  |<----| - 16 slots max    |               |
|   |  +----------------------+  |     +-------------------+               |
|   |           |                |              |                          |
|   +-----------|----------------+              |                          |
|               v                               v                          |
|   +----------------------------+    +-------------------+                |
|   |     ADAPTER MERGING        |    | P2P SHARING       |                |
|   +----------------------------+    +-------------------+                |
|   |                            |    |                   |                |
|   |  +----------------------+  |    | - Reputation gate |                |
|   |  | TIES-Merging         |  |    | - Domain-specific |                |
|   |  | - Sign agreement     |  |    | - Version control |                |
|   |  | - Trim & scale       |  |    | - Quality scores  |                |
|   |  +----------------------+  |    +-------------------+                |
|   |           |                |                                         |
|   |           v                |                                         |
|   |  +----------------------+  |                                         |
|   |  | Merged Adapter       |  |                                         |
|   |  | - Combined expertise |  |                                         |
|   |  | - Optimized weights  |  |                                         |
|   |  +----------------------+  |                                         |
|   |                            |                                         |
|   +----------------------------+                                         |
|                                                                          |
+--------------------------------------------------------------------------+
```

### 5.2 MicroLoRA Training Configuration

```typescript
interface MicroLoRAConfig {
  // Adapter architecture
  adapter: {
    rank: number;                // 1-16, typically 2-8
    alpha: number;               // Scaling factor (rank * 2 typical)
    dropout: number;             // 0.0-0.1
    targetModules: string[];     // ['q_proj', 'v_proj', 'k_proj', 'o_proj']
  };

  // Training hyperparameters
  training: {
    batchSize: number;           // 1-8 for browser
    gradientAccumulation: number;// 4-16 to simulate larger batch
    learningRate: number;        // 1e-4 to 1e-3
    warmupSteps: number;         // 10% of total
    maxSteps: number;            // 100-1000 for quick adapt
    optimizer: 'adamw' | 'sgd';
    scheduler: 'cosine' | 'linear' | 'constant';
  };

  // Memory optimization
  memory: {
    gradientCheckpointing: boolean;
    mixedPrecision: boolean;     // FP16 forward, FP32 gradients
    offloadOptimizer: boolean;   // Use IndexedDB for optimizer state
  };

  // Compute backend
  backend: 'wasm' | 'webgpu' | 'auto';

  // Data settings
  data: {
    maxLength: number;           // Max sequence length
    maskProbability: number;     // For MLM tasks
    shuffleBuffer: number;       // Data shuffling
  };
}
```

### 5.3 Adapter Format

```typescript
interface LoRAAdapter {
  // Metadata
  id: string;                    // UUID
  name: string;                  // Human-readable name
  description: string;
  version: string;
  createdAt: string;

  // Architecture
  baseModel: string;             // e.g., 'phi-1.5-q4'
  rank: number;
  alpha: number;
  targetModules: string[];

  // Weights (per target module)
  weights: {
    [moduleName: string]: {
      lora_A: Float32Array;      // Down projection
      lora_B: Float32Array;      // Up projection
    };
  };

  // Training info
  training: {
    steps: number;
    finalLoss: number;
    samples: number;
    duration: number;            // ms
  };

  // Quality metrics
  quality: {
    perplexityDelta: number;     // Change from base
    taskAccuracy?: number;       // If evaluated
    humanScore?: number;         // 1-5 rating
  };

  // Sharing
  sharing: {
    public: boolean;
    domain: string;              // 'general' | 'code' | 'legal' | etc.
    license: string;
  };

  // Integrity
  checksum: string;              // SHA-256 of weights
  signature?: string;            // Ed25519 signature
}
```

### 5.4 Adapter Merging Algorithms

```typescript
// TIES-Merging: Trim, Elect Sign, Disjoint Merge
interface TIESMergeConfig {
  method: 'ties';
  trimFraction: number;          // 0.1-0.3, remove low-magnitude deltas
  electSign: 'majority' | 'weighted';
  scalingCoef: number;           // 0.5-1.0
}

// DARE: Drop And Rescale
interface DAREMergeConfig {
  method: 'dare';
  dropRate: number;              // 0.1-0.5
  rescale: boolean;
}

// Task Arithmetic
interface TaskArithmeticConfig {
  method: 'task_arithmetic';
  weights: { [adapterId: string]: number };  // Weighted combination
}

function mergeAdapters(
  adapters: LoRAAdapter[],
  config: TIESMergeConfig | DAREMergeConfig | TaskArithmeticConfig
): LoRAAdapter {
  // Implementation based on method
  // Returns merged adapter with combined expertise
}
```

---

## 6. CLI: edge-net-models

### 6.1 Command Reference

```bash
# Installation
npm install -g @ruvector/edge-net-models

# ======== Model Management ========

# List available models
edge-net-models list
edge-net-models list --type embedding
edge-net-models list --quantization INT4

# Pull a model
edge-net-models pull phi-1.5-q4
edge-net-models pull phi-1.5-q4 --version 1.0.0
edge-net-models pull phi-1.5-q4 --source ipfs    # Use IPFS instead of CDN
edge-net-models pull phi-1.5-q4 --output ./models/

# Verify model integrity
edge-net-models verify phi-1.5-q4
edge-net-models verify phi-1.5-q4 --check-signature

# Show model info
edge-net-models info phi-1.5-q4
edge-net-models info phi-1.5-q4 --benchmarks
edge-net-models info phi-1.5-q4 --requirements

# ======== Model Optimization ========

# Quantize a model
edge-net-models optimize ./my-model.onnx \
  --quantization INT4 \
  --calibration-data ./calibration.jsonl \
  --output ./my-model-q4.onnx

# Full optimization pipeline
edge-net-models optimize ./my-model.onnx \
  --quantization INT4 \
  --pruning magnitude --sparsity 0.3 \
  --onnx-simplify \
  --target browser

# ======== Model Publishing ========

# Publish to registry
edge-net-models publish ./my-model-q4.onnx \
  --name "my-custom-model" \
  --version 1.0.0 \
  --type text-generation

# Sign model
edge-net-models sign ./my-model-q4.onnx \
  --key ./private-key.pem

# ======== Adapter Management ========

# List local adapters
edge-net-models adapters list

# Export adapter
edge-net-models adapters export my-adapter --output ./adapter.json

# Import adapter
edge-net-models adapters import ./adapter.json

# Share adapter to P2P network
edge-net-models adapters share my-adapter --domain coding

# Merge adapters
edge-net-models adapters merge adapter1 adapter2 \
  --method ties \
  --output merged-adapter

# ======== Cache Management ========

# Show cache info
edge-net-models cache info

# Clear cache
edge-net-models cache clear
edge-net-models cache clear --older-than 30d

# Prune unused models
edge-net-models cache prune

# ======== Configuration ========

# Show config
edge-net-models config show

# Set default source
edge-net-models config set defaultSource gcs
edge-net-models config set cacheDir ~/.edge-net/models
```

### 6.2 CLI Configuration File

```yaml
# ~/.edge-net/models.yaml
version: 1

# Default model source
defaultSource: gcs

# Cache settings
cache:
  directory: ~/.edge-net/models
  maxSize: 10GB
  autoCleanup: true

# CDN settings
cdn:
  endpoint: https://storage.googleapis.com/edge-net-models
  timeout: 30000
  retries: 3

# IPFS settings
ipfs:
  gateways:
    - https://cloudflare-ipfs.com/ipfs/
    - https://ipfs.io/ipfs/
  parallel: 4
  timeout: 60000

# Model preferences
preferences:
  preferQuantization: INT4
  preferSmallModels: true
  autoUpdate: true

# Signing key (for publishing)
signing:
  keyPath: ~/.edge-net/keys/private.pem
  keyId: ruvector-team
```

---

## 7. JavaScript SDK

### 7.1 ModelLoader API

```typescript
import { ModelLoader, ModelConfig } from '@ruvector/edge-net-models';

// ======== Basic Usage ========

// Create loader with default settings
const loader = new ModelLoader();

// Load a model
const model = await loader.load('phi-1.5-q4');

// Use the model
const output = await model.generate('Hello, world!', {
  maxTokens: 100,
  temperature: 0.7,
});

// ======== Advanced Configuration ========

const loader = new ModelLoader({
  // Cache settings
  cache: {
    enabled: true,
    storage: 'indexeddb',        // 'indexeddb' | 'memory' | 'filesystem'
    maxSize: 2 * 1024 * 1024 * 1024,  // 2GB
  },

  // Source preferences
  sources: {
    primary: 'gcs',              // Google Cloud Storage CDN
    fallback: ['ipfs', 'p2p'],   // Fallback sources
    timeout: 30000,
  },

  // Download settings
  download: {
    parallel: 4,                  // Parallel chunk downloads
    retries: 3,
    progressCallback: (progress) => {
      console.log(`${progress.loaded}/${progress.total} bytes`);
    },
  },

  // Compute backend
  backend: 'auto',               // 'wasm' | 'webgpu' | 'auto'

  // Memory management
  memory: {
    maxModels: 2,                // Max models in memory
    evictionPolicy: 'lru',       // 'lru' | 'fifo' | 'priority'
  },
});

// ======== Model Loading ========

// Load with specific version
const model = await loader.load('phi-1.5-q4', { version: '1.0.0' });

// Load with adapter
const model = await loader.load('phi-1.5-q4', {
  adapter: 'coding-assistant',
});

// Load multiple models
const [llm, embedder] = await Promise.all([
  loader.load('phi-1.5-q4'),
  loader.load('minilm-l6-v2-q8'),
]);

// ======== Caching ========

// Check if model is cached
const isCached = await loader.isCached('phi-1.5-q4');

// Get cached models
const cachedModels = await loader.getCachedModels();

// Preload model
await loader.preload('phi-1.5-q4');

// Clear specific model from cache
await loader.clearCache('phi-1.5-q4');

// Clear all cache
await loader.clearAllCache();

// ======== Model Info ========

// Get model manifest
const manifest = await loader.getManifest('phi-1.5-q4');

// Get available models
const models = await loader.listModels({
  type: 'text-generation',
  quantization: 'INT4',
});

// Check requirements
const compatible = await loader.checkCompatibility('phi-1.5-q4');
```

### 7.2 InferenceSession API

```typescript
import { InferenceSession, GenerateConfig } from '@ruvector/edge-net-models';

// ======== Text Generation ========

const session = await InferenceSession.create('phi-1.5-q4');

// Simple generation
const text = await session.generate('Write a haiku about AI:');

// Streaming generation
const stream = session.generateStream('Once upon a time');
for await (const token of stream) {
  process.stdout.write(token);
}

// Generation with config
const text = await session.generate('Explain quantum computing:', {
  maxTokens: 200,
  temperature: 0.7,
  topP: 0.9,
  topK: 40,
  repetitionPenalty: 1.1,
  stopSequences: ['\n\n', '###'],
});

// Chat completion
const response = await session.chat([
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is 2+2?' },
]);

// ======== Embeddings ========

const embedder = await InferenceSession.create('minilm-l6-v2-q8');

// Single embedding
const embedding = await embedder.embed('Hello world');

// Batch embeddings
const embeddings = await embedder.embedBatch([
  'Hello world',
  'How are you?',
  'Nice to meet you',
]);

// Similarity
const similarity = await embedder.similarity(
  'king is to queen',
  'man is to woman'
);

// ======== MicroLoRA Integration ========

// Enable adapter
await session.loadAdapter('coding-assistant');

// Train adapter on custom data
const trainer = session.createTrainer({
  rank: 4,
  learningRate: 1e-4,
  batchSize: 2,
});

await trainer.train([
  { input: 'Write a Python function', output: 'def hello():...' },
  { input: 'Create a class', output: 'class MyClass:...' },
]);

// Save adapter
const adapter = await trainer.getAdapter();
await adapter.save('my-coding-adapter');

// ======== Memory Management ========

// Get memory usage
const memoryInfo = session.getMemoryInfo();
console.log(`Model memory: ${memoryInfo.modelBytes / 1024 / 1024}MB`);
console.log(`KV cache: ${memoryInfo.kvCacheBytes / 1024 / 1024}MB`);

// Clear KV cache
session.clearKVCache();

// Dispose session
session.dispose();
```

### 7.3 Web Worker Integration

```typescript
import { ModelWorker } from '@ruvector/edge-net-models/worker';

// ======== Main Thread ========

const worker = new ModelWorker();

// Load model in worker
await worker.load('phi-1.5-q4');

// Generate in worker (non-blocking)
const result = await worker.generate('Hello!', { maxTokens: 50 });

// Streaming from worker
worker.onToken((token) => console.log(token));
await worker.generateStream('Once upon a time');

// Batch processing in worker
const results = await worker.batch([
  { id: '1', prompt: 'Hello' },
  { id: '2', prompt: 'World' },
]);

// Terminate worker
worker.terminate();

// ======== Worker Pool ========

import { ModelWorkerPool } from '@ruvector/edge-net-models/worker';

const pool = new ModelWorkerPool({
  workers: 4,           // Number of workers
  model: 'phi-1.5-q4',  // Pre-load model
});

// Parallel inference
const results = await Promise.all([
  pool.generate('Task 1'),
  pool.generate('Task 2'),
  pool.generate('Task 3'),
  pool.generate('Task 4'),
]);

pool.terminate();
```

---

## 8. File Structure

```
examples/edge-net/
|-- models/                          # Model optimization & distribution
|   |-- src/
|   |   |-- optimization/
|   |   |   |-- mod.rs               # Optimization pipeline
|   |   |   |-- quantization.rs      # INT8/INT4/FP16 quantization
|   |   |   |-- pruning.rs           # Magnitude/movement pruning
|   |   |   |-- distillation.rs      # Knowledge distillation
|   |   |   |-- onnx_export.rs       # ONNX conversion & optimization
|   |   |   |-- calibration.rs       # Calibration dataset handling
|   |   |   `-- validation.rs        # Quality validation
|   |   |
|   |   |-- distribution/
|   |   |   |-- mod.rs               # Distribution system
|   |   |   |-- gcs.rs               # Google Cloud Storage CDN
|   |   |   |-- ipfs.rs              # IPFS integration
|   |   |   |-- registry.rs          # Model registry
|   |   |   |-- manifest.rs          # Model manifest handling
|   |   |   |-- integrity.rs         # Checksums & signatures
|   |   |   `-- p2p.rs               # P2P model distribution
|   |   |
|   |   |-- lora/
|   |   |   |-- mod.rs               # MicroLoRA system
|   |   |   |-- adapter.rs           # LoRA adapter types
|   |   |   |-- training.rs          # Browser-based training
|   |   |   |-- merging.rs           # TIES/DARE merging
|   |   |   |-- pool.rs              # Adapter pool management
|   |   |   `-- sharing.rs           # P2P adapter sharing
|   |   |
|   |   |-- loader/
|   |   |   |-- mod.rs               # Model loader
|   |   |   |-- cache.rs             # Caching system
|   |   |   |-- streaming.rs         # Streaming decode
|   |   |   `-- fallback.rs          # Fallback handling
|   |   |
|   |   `-- lib.rs                   # Library entry point
|   |
|   |-- cli/
|   |   |-- src/
|   |   |   |-- main.rs              # CLI entry point
|   |   |   |-- commands/
|   |   |   |   |-- mod.rs           # Command handlers
|   |   |   |   |-- pull.rs          # Pull model
|   |   |   |   |-- push.rs          # Push model
|   |   |   |   |-- list.rs          # List models
|   |   |   |   |-- optimize.rs      # Optimize model
|   |   |   |   |-- verify.rs        # Verify integrity
|   |   |   |   |-- adapters.rs      # Adapter management
|   |   |   |   `-- cache.rs         # Cache management
|   |   |   |-- config.rs            # CLI configuration
|   |   |   `-- output.rs            # Output formatting
|   |   |-- Cargo.toml
|   |   `-- README.md
|   |
|   |-- sdk/
|   |   |-- package.json
|   |   |-- tsconfig.json
|   |   |-- src/
|   |   |   |-- index.ts             # SDK entry point
|   |   |   |-- loader.ts            # ModelLoader class
|   |   |   |-- session.ts           # InferenceSession class
|   |   |   |-- cache.ts             # Browser caching
|   |   |   |-- sources/
|   |   |   |   |-- gcs.ts           # GCS CDN source
|   |   |   |   |-- ipfs.ts          # IPFS source
|   |   |   |   `-- p2p.ts           # P2P source
|   |   |   |-- adapters/
|   |   |   |   |-- trainer.ts       # Browser training
|   |   |   |   |-- merger.ts        # Adapter merging
|   |   |   |   `-- pool.ts          # Adapter pool
|   |   |   |-- worker/
|   |   |   |   |-- index.ts         # Worker exports
|   |   |   |   |-- model-worker.ts  # Model worker class
|   |   |   |   |-- pool.ts          # Worker pool
|   |   |   |   `-- inference.worker.ts  # Worker implementation
|   |   |   `-- types.ts             # TypeScript types
|   |   `-- README.md
|   |
|   |-- manifests/                   # Model manifests
|   |   |-- phi-1.5-q4.json
|   |   |-- phi-2-q4.json
|   |   |-- qwen-0.5b-q4.json
|   |   |-- gemma-2b-q4.json
|   |   |-- minilm-l6-v2-q8.json
|   |   |-- e5-small-q8.json
|   |   `-- bge-small-q8.json
|   |
|   |-- Cargo.toml
|   `-- README.md
|
|-- src/
|   |-- ai/
|   |   |-- models/
|   |   |   |-- mod.rs               # Model integration
|   |   |   |-- inference.rs         # Inference engine
|   |   |   `-- tokenizer.rs         # Tokenization
|   |   |-- sona/
|   |   |   |-- lora.rs              # MicroLoRA (existing)
|   |   |   `-- ...
|   |   `-- ...
|   `-- ...
|
`-- ...
```

---

## 9. API Contracts

### 9.1 Model Registry API

```yaml
openapi: 3.0.3
info:
  title: Edge-Net Model Registry API
  version: 1.0.0

paths:
  /v1/models:
    get:
      summary: List available models
      parameters:
        - name: type
          in: query
          schema:
            type: string
            enum: [text-generation, embedding, vision]
        - name: quantization
          in: query
          schema:
            type: string
            enum: [FP16, INT8, INT4]
      responses:
        '200':
          description: List of models
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/ModelSummary'

  /v1/models/{modelId}:
    get:
      summary: Get model details
      parameters:
        - name: modelId
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Model manifest
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ModelManifest'

  /v1/models/{modelId}/download:
    get:
      summary: Get download URLs
      parameters:
        - name: modelId
          in: path
          required: true
          schema:
            type: string
        - name: source
          in: query
          schema:
            type: string
            enum: [gcs, ipfs]
            default: gcs
      responses:
        '200':
          description: Download URLs
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DownloadUrls'

  /v1/adapters:
    get:
      summary: List public adapters
      parameters:
        - name: baseModel
          in: query
          schema:
            type: string
        - name: domain
          in: query
          schema:
            type: string
      responses:
        '200':
          description: List of adapters
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/AdapterSummary'

    post:
      summary: Share adapter
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/AdapterUpload'
      responses:
        '201':
          description: Adapter shared
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/AdapterSummary'

components:
  schemas:
    ModelSummary:
      type: object
      properties:
        id:
          type: string
        name:
          type: string
        type:
          type: string
        quantization:
          type: string
        size:
          type: integer
        version:
          type: string

    ModelManifest:
      type: object
      # Full manifest as defined in section 4.4

    DownloadUrls:
      type: object
      properties:
        model:
          type: string
          format: uri
        tokenizer:
          type: string
          format: uri
        config:
          type: string
          format: uri
        signature:
          type: string
          format: uri

    AdapterSummary:
      type: object
      properties:
        id:
          type: string
        name:
          type: string
        baseModel:
          type: string
        rank:
          type: integer
        domain:
          type: string
        quality:
          type: number
        downloads:
          type: integer
```

### 9.2 P2P Adapter Protocol

```protobuf
syntax = "proto3";
package edgenet.adapters;

// Adapter discovery message
message AdapterAnnounce {
  string adapter_id = 1;
  string name = 2;
  string base_model = 3;
  uint32 rank = 4;
  string domain = 5;
  float quality_score = 6;
  bytes checksum = 7;
  bytes signature = 8;
  string peer_id = 9;
  uint64 timestamp = 10;
}

// Adapter request
message AdapterRequest {
  string adapter_id = 1;
  string requester_id = 2;
  uint64 timestamp = 3;
}

// Adapter response (chunked for large adapters)
message AdapterChunk {
  string adapter_id = 1;
  uint32 chunk_index = 2;
  uint32 total_chunks = 3;
  bytes data = 4;
  bytes chunk_hash = 5;
}

// Adapter quality report
message AdapterReport {
  string adapter_id = 1;
  string reporter_id = 2;
  float quality_score = 3;
  string feedback = 4;
  uint64 timestamp = 5;
  bytes signature = 6;
}
```

---

## 10. Security Considerations

### 10.1 Model Integrity

```
+--------------------------------------------------------------------------+
|                         MODEL INTEGRITY CHAIN                             |
+--------------------------------------------------------------------------+
|                                                                          |
|   Publisher                                                              |
|       |                                                                  |
|       v                                                                  |
|   +------------------------+                                             |
|   | 1. Generate Checksums  |                                             |
|   |    - SHA-256 per file  |                                             |
|   |    - Merkle root       |                                             |
|   +------------------------+                                             |
|              |                                                           |
|              v                                                           |
|   +------------------------+                                             |
|   | 2. Sign Manifest       |                                             |
|   |    - Ed25519 signature |                                             |
|   |    - Publisher identity|                                             |
|   +------------------------+                                             |
|              |                                                           |
|              v                                                           |
|   +------------------------+                                             |
|   | 3. Upload to CDN/IPFS  |                                             |
|   |    - Model artifacts   |                                             |
|   |    - Signed manifest   |                                             |
|   +------------------------+                                             |
|              |                                                           |
|              v                                                           |
|   Consumer                                                               |
|       |                                                                  |
|       v                                                                  |
|   +------------------------+                                             |
|   | 4. Download & Verify   |                                             |
|   |    - Check checksums   |                                             |
|   |    - Verify signature  |                                             |
|   |    - Validate manifest |                                             |
|   +------------------------+                                             |
|              |                                                           |
|              v                                                           |
|   +------------------------+                                             |
|   | 5. Load Model          |                                             |
|   |    - If valid: proceed |                                             |
|   |    - If invalid: abort |                                             |
|   +------------------------+                                             |
|                                                                          |
+--------------------------------------------------------------------------+
```

### 10.2 Adapter Security

- **Reputation-gated sharing**: Only nodes with reputation > 0.7 can share adapters
- **Quality thresholds**: Adapters must meet minimum quality scores
- **Signature verification**: All shared adapters must be signed
- **Sandboxed training**: Browser training runs in isolated context
- **Rate limiting**: Max 10 adapter shares per hour per node

### 10.3 WASM Sandbox

- Models run in isolated WASM sandbox
- No network access from inference
- Memory limits enforced
- Execution time limits
- No filesystem access

---

## 11. Performance Targets

| Metric | Target | Notes |
|--------|--------|-------|
| **Model Load Time** | < 5s (cached), < 30s (download) | For 1GB model |
| **First Token Latency** | < 100ms (WASM), < 50ms (WebGPU) | After model loaded |
| **Tokens/Second** | 10-15 (WASM), 50-100 (WebGPU) | Phi-1.5 Q4 |
| **Memory Usage** | < 1.5x model size | Including KV cache |
| **Adapter Training** | 100 samples in < 5 min | Rank-4 adapter |
| **Adapter Load Time** | < 100ms | From IndexedDB |
| **Cache Hit Rate** | > 95% | After initial load |
| **CDN Latency** | < 50ms | To nearest PoP |

---

## 12. Future Enhancements

### Phase 2 (Q2 2026)
- Vision model support (SigLIP, MobileCLIP)
- WebGPU acceleration for all models
- Federated adapter training across P2P network
- Model sharding for very large models

### Phase 3 (Q3 2026)
- Mixture-of-Experts (MoE) support
- Speculative decoding
- Cross-model adapter transfer
- On-device model distillation

### Phase 4 (Q4 2026)
- Multimodal models (text + image)
- Audio models (Whisper variants)
- Real-time streaming inference
- Hardware acceleration (WebNN)

---

## Appendix A: Quantization Details

### A.1 INT4 Quantization Algorithm

```python
def quantize_int4_gptq(weights, calibration_data):
    """
    GPTQ-style INT4 quantization with optimal rounding.

    1. Compute Hessian approximation from calibration data
    2. For each weight column:
       a. Quantize to nearest INT4 value
       b. Compute quantization error
       c. Update remaining weights to minimize error
    """
    H = compute_hessian(weights, calibration_data)

    for i in range(weights.shape[1]):
        # Optimal rounding
        q = round(weights[:, i] / scale)
        q = clip(q, -8, 7)  # INT4 range

        # Error compensation (Cholesky update)
        error = (weights[:, i] - q * scale) / H[i, i]
        weights[:, i+1:] -= error * H[i, i+1:]

    return quantized_weights, scales, zeros
```

### A.2 Calibration Dataset Format

```jsonl
{"text": "The quick brown fox jumps over the lazy dog."}
{"text": "Machine learning models can be quantized for efficiency."}
{"text": "WebAssembly enables near-native performance in browsers."}
...
```

---

## Appendix B: IPFS Content Addressing

### B.1 Model CID Computation

```javascript
// Compute CID for model file
import { CID } from 'multiformats/cid';
import { sha256 } from 'multiformats/hashes/sha2';
import * as dagPB from '@ipld/dag-pb';

async function computeModelCID(modelBytes) {
  // Chunk the model (256KB chunks)
  const chunks = chunkify(modelBytes, 262144);

  // Build Merkle DAG
  const links = [];
  for (const chunk of chunks) {
    const hash = await sha256.digest(chunk);
    const cid = CID.create(1, dagPB.code, hash);
    links.push({ Hash: cid, Tsize: chunk.length });
  }

  // Create root node
  const node = dagPB.createNode(new Uint8Array(), links);
  const rootHash = await sha256.digest(dagPB.encode(node));

  return CID.create(1, dagPB.code, rootHash);
}
```

---

## Appendix C: Signature Verification

### C.1 Ed25519 Signature Format

```typescript
interface ModelSignature {
  // Signed data
  manifest: {
    id: string;
    version: string;
    artifacts: {
      [name: string]: {
        sha256: string;
        size: number;
      };
    };
    timestamp: string;
  };

  // Signature
  signature: string;    // Base64-encoded Ed25519 signature
  publicKey: string;    // Base64-encoded public key
  keyId: string;        // Key identifier (e.g., 'ruvector-team')
}

// Verification
async function verifyModelSignature(sig: ModelSignature): Promise<boolean> {
  const publicKey = base64ToBytes(sig.publicKey);
  const signature = base64ToBytes(sig.signature);
  const message = new TextEncoder().encode(JSON.stringify(sig.manifest));

  return ed25519.verify(signature, message, publicKey);
}
```

---

*Document Version: 1.0*
*Last Updated: 2026-01-03*
*Authors: RuVector Team*