dearsky/wifi-densepose

Fork 0

Files

History

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

src

Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

package.json

Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

README.md

Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

tsconfig.json

Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

README.md

@ruvector/ruvllm-wasm

Run large language models directly in the browser using WebAssembly with optional WebGPU acceleration for faster inference.

Features

Browser-Native - No server required, runs entirely client-side
WebGPU Acceleration - 10-50x faster inference with GPU support
GGUF Models - Load quantized models for efficient browser inference
Streaming - Real-time token streaming for responsive UX
IndexedDB Caching - Cache models locally for instant reload
Privacy-First - All processing happens on-device
SIMD Support - Optimized WASM with SIMD instructions
Multi-Threading - Parallel inference with SharedArrayBuffer

Installation

npm install @ruvector/ruvllm-wasm

Quick Start

import { RuvLLMWasm, checkWebGPU } from '@ruvector/ruvllm-wasm';

// Check browser capabilities
const webgpu = await checkWebGPU();
console.log('WebGPU:', webgpu); // 'available' | 'unavailable' | 'not_supported'

// Create instance with WebGPU (if available)
const llm = await RuvLLMWasm.create({
  useWebGPU: true,
  memoryLimit: 4096, // 4GB max
});

// Load a model (with progress tracking)
await llm.loadModel('https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf', {
  onProgress: (loaded, total) => {
    console.log(`Loading: ${Math.round(loaded / total * 100)}%`);
  }
});

// Generate text
const result = await llm.generate('What is the capital of France?', {
  maxTokens: 100,
  temperature: 0.7,
});

console.log(result.text);
console.log(`${result.stats.tokensPerSecond.toFixed(1)} tokens/sec`);

Streaming Tokens

// Stream tokens as they're generated
await llm.generate('Tell me a story about a robot', {
  maxTokens: 200,
  stream: true,
}, (token, done) => {
  process.stdout.write(token);
  if (done) console.log('\n--- Done ---');
});

Chat Interface

import { ChatMessage } from '@ruvector/ruvllm-wasm';

const messages: ChatMessage[] = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is 2 + 2?' },
];

const response = await llm.chat(messages, {
  maxTokens: 100,
  temperature: 0.5,
});

console.log(response.text); // "2 + 2 equals 4."

React Hook Example

import { useState, useEffect } from 'react';
import { RuvLLMWasm, LoadingStatus } from '@ruvector/ruvllm-wasm';

function useLLM(modelUrl: string) {
  const [llm, setLLM] = useState<RuvLLMWasm | null>(null);
  const [status, setStatus] = useState<LoadingStatus>('idle');
  const [progress, setProgress] = useState(0);

  useEffect(() => {
    let instance: RuvLLMWasm;

    async function init() {
      instance = await RuvLLMWasm.create({ useWebGPU: true });
      setStatus('downloading');

      await instance.loadModel(modelUrl, {
        onProgress: (loaded, total) => setProgress(loaded / total),
      });

      setStatus('ready');
      setLLM(instance);
    }

    init();
    return () => instance?.unload();
  }, [modelUrl]);

  return { llm, status, progress };
}

// Usage
function ChatApp() {
  const { llm, status, progress } = useLLM('https://example.com/model.gguf');
  const [response, setResponse] = useState('');

  if (status !== 'ready') {
    return <div>Loading: {Math.round(progress * 100)}%</div>;
  }

  const generate = async () => {
    const result = await llm!.generate('Hello!', { maxTokens: 50 });
    setResponse(result.text);
  };

  return (
    <div>
      <button onClick={generate}>Generate</button>
      <p>{response}</p>
    </div>
  );
}

Browser Requirements

Feature	Required	Benefit
WebAssembly	Yes	Core execution
WebGPU	No (recommended)	10-50x faster
SharedArrayBuffer	No	Multi-threading
SIMD	No	2-4x faster math

Check Capabilities

import { getCapabilities } from '@ruvector/ruvllm-wasm';

const caps = await getCapabilities();
console.log(caps);
// {
//   webgpu: 'available',
//   sharedArrayBuffer: true,
//   simd: true,
//   crossOriginIsolated: true
// }

Enable SharedArrayBuffer

Add these headers to your server:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

API Reference

`RuvLLMWasm.create(options?)`

Create a new instance.

const llm = await RuvLLMWasm.create({
  useWebGPU: true,      // Enable WebGPU acceleration
  threads: 4,           // CPU threads (requires SharedArrayBuffer)
  memoryLimit: 4096,    // Max memory in MB
});

`loadModel(source, options?)`

Load a GGUF model.

await llm.loadModel(url, {
  onProgress: (loaded, total) => { /* ... */ }
});

`generate(prompt, config?, onToken?)`

Generate text completion.

const result = await llm.generate('Hello', {
  maxTokens: 100,
  temperature: 0.7,
  topP: 0.9,
  topK: 40,
  repetitionPenalty: 1.1,
  stopSequences: ['\n\n'],
  stream: true,
}, (token, done) => { /* ... */ });

`chat(messages, config?, onToken?)`

Chat completion with message history.

const result = await llm.chat([
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'Hi!' },
], { maxTokens: 100 });

`unload()`

Free memory and unload model.

llm.unload();

Recommended Models

Small models suitable for browser inference:

Model	Size	Use Case
TinyLlama-1.1B-Q4	~700 MB	General chat
Phi-2-Q4	~1.6 GB	Code, reasoning
Qwen2-0.5B-Q4	~400 MB	Fast responses
StableLM-Zephyr-3B-Q4	~2 GB	Quality chat

Performance Tips

Use WebGPU - Check support and enable for 10-50x speedup
Smaller models - Q4_K_M quantization balances quality/size
Cache models - IndexedDB caching avoids re-downloads
Limit context - Smaller context = faster inference
Stream tokens - Better UX with progressive output

@ruvector/ruvllm - Node.js LLM library
@ruvector/ruvllm-cli - CLI tool
ruvector - Vector database

Documentation

License

MIT OR Apache-2.0

Part of the RuVector ecosystem - High-performance vector database with self-learning capabilities.

README.md

@ruvector/ruvllm-wasm

Features

Installation

Quick Start

Streaming Tokens

Chat Interface

React Hook Example

Browser Requirements

Check Capabilities

Enable SharedArrayBuffer

API Reference

RuvLLMWasm.create(options?)

loadModel(source, options?)

generate(prompt, config?, onToken?)

chat(messages, config?, onToken?)

unload()

Recommended Models

Performance Tips

Related Packages

Documentation

License

`RuvLLMWasm.create(options?)`

`loadModel(source, options?)`

`generate(prompt, config?, onToken?)`

`chat(messages, config?, onToken?)`

`unload()`