Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/examples/google-cloud/README.md
+++ b/vendor/ruvector/examples/google-cloud/README.md
@@ -0,0 +1,549 @@
+# RuVector Cloud Run GPU Deployment
+
+High-performance vector database benchmarks and deployment on Google Cloud Run with GPU acceleration (NVIDIA L4).
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Prerequisites](#prerequisites)
+- [Quick Start](#quick-start)
+- [Step-by-Step Tutorial](#step-by-step-tutorial)
+- [Deployment Options](#deployment-options)
+- [Benchmarking](#benchmarking)
+- [Architecture](#architecture)
+- [API Reference](#api-reference)
+- [Troubleshooting](#troubleshooting)
+
+## Overview
+
+This example provides:
+
+- **GPU-Accelerated Benchmarks**: SIMD (AVX-512, AVX2, NEON) and CUDA optimized operations
+- **Cloud Run Deployment**: Scalable, serverless deployment with GPU support
+- **Multiple Deployment Models**:
+  - Single-node benchmark service
+  - Attention/GNN inference service
+  - Raft consensus cluster (3+ nodes)
+  - Primary-replica replication
+
+### Supported RuVector Capabilities
+
+| Capability | Description | Cloud Run Support |
+|------------|-------------|-------------------|
+| **Core Vector Search** | HNSW indexing, k-NN search | ✅ Full GPU |
+| **Attention Mechanisms** | Multi-head attention layers | ✅ Full GPU |
+| **GNN Inference** | Graph neural network forward pass | ✅ Full GPU |
+| **Raft Consensus** | Distributed consensus protocol | ✅ Multi-service |
+| **Replication** | Primary-replica data replication | ✅ Multi-service |
+| **Quantization** | INT8/PQ compression | ✅ GPU optimized |
+
+## Prerequisites
+
+### Required Tools
+
+```bash
+# Google Cloud CLI
+curl https://sdk.cloud.google.com | bash
+gcloud init
+
+# Docker
+# Install from: https://docs.docker.com/get-docker/
+
+# Rust (for local development)
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+```
+
+### GCP Setup
+
+```bash
+# Authenticate
+gcloud auth login
+
+# Set project
+gcloud config set project YOUR_PROJECT_ID
+
+# Enable required APIs
+gcloud services enable \
+    run.googleapis.com \
+    containerregistry.googleapis.com \
+    cloudbuild.googleapis.com \
+    compute.googleapis.com
+```
+
+## Quick Start
+
+### 1. One-Command Deployment
+
+```bash
+cd examples/google-cloud
+
+# Setup and deploy
+./deploy.sh setup
+./deploy.sh build Dockerfile.gpu latest
+./deploy.sh push latest
+./deploy.sh deploy latest true  # true = GPU enabled
+
+# Run benchmark
+./deploy.sh benchmark ruvector-benchmark quick
+```
+
+### 2. View Results
+
+```bash
+# Get service URL
+gcloud run services describe ruvector-benchmark \
+    --region=us-central1 \
+    --format='value(status.url)'
+
+# Test endpoints
+curl $URL/health
+curl $URL/info
+curl -X POST $URL/benchmark/quick
+```
+
+## Step-by-Step Tutorial
+
+### Step 1: Project Setup
+
+```bash
+# Clone the repository
+git clone https://github.com/ruvnet/ruvector.git
+cd ruvector/examples/google-cloud
+
+# Set environment variables
+export GCP_PROJECT_ID="your-project-id"
+export GCP_REGION="us-central1"
+
+# Run setup
+./deploy.sh setup
+```
+
+### Step 2: Build the Docker Image
+
+**Option A: Local Build (faster iteration)**
+
+```bash
+# Build locally
+./deploy.sh build Dockerfile.gpu latest
+
+# Push to Container Registry
+./deploy.sh push latest
+```
+
+**Option B: Cloud Build (no local Docker required)**
+
+```bash
+# Build in the cloud
+./deploy.sh build-cloud Dockerfile.gpu latest
+```
+
+### Step 3: Deploy to Cloud Run
+
+**Basic Deployment (with GPU)**
+
+```bash
+./deploy.sh deploy latest true
+```
+
+**Custom Configuration**
+
+```bash
+# High-memory configuration for large vector sets
+MEMORY=16Gi CPU=8 ./deploy.sh deploy latest true
+
+# Scale settings
+MIN_INSTANCES=1 MAX_INSTANCES=20 ./deploy.sh deploy latest true
+```
+
+### Step 4: Run Benchmarks
+
+```bash
+# Quick benchmark (128d, 10k vectors)
+./deploy.sh benchmark ruvector-benchmark quick
+
+# Distance computation benchmark
+./deploy.sh benchmark ruvector-benchmark distance
+
+# HNSW index benchmark
+./deploy.sh benchmark ruvector-benchmark hnsw
+
+# Full benchmark suite
+./deploy.sh benchmark ruvector-benchmark full
+```
+
+### Step 5: View Results
+
+```bash
+# Get all results
+./deploy.sh results ruvector-benchmark
+
+# View logs
+./deploy.sh logs ruvector-benchmark
+
+# Check service status
+./deploy.sh status
+```
+
+## Deployment Options
+
+### 1. Single-Node Benchmark Service
+
+Best for: Development, testing, single-user benchmarks
+
+```bash
+./deploy.sh deploy latest true
+```
+
+### 2. Attention/GNN Service
+
+Best for: Neural network inference, embedding generation
+
+```bash
+./deploy.sh deploy-attention latest
+```
+
+**Features:**
+- 16GB memory for large models
+- 3-layer GNN with 8 attention heads
+- Optimized for batch inference
+
+### 3. Raft Consensus Cluster
+
+Best for: High availability, consistent distributed state
+
+```bash
+# Deploy 3-node cluster
+CLUSTER_SIZE=3 ./deploy.sh deploy-raft
+
+# Deploy 5-node cluster for higher fault tolerance
+CLUSTER_SIZE=5 ./deploy.sh deploy-raft
+```
+
+**Architecture:**
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│   Node 1    │◄───►│   Node 2    │◄───►│   Node 3    │
+│  (Leader)   │     │  (Follower) │     │  (Follower) │
+└─────────────┘     └─────────────┘     └─────────────┘
+       │                  │                   │
+       └──────────────────┴───────────────────┘
+                    Raft Consensus
+```
+
+**Configuration:**
+```bash
+# Environment variables for Raft nodes
+RUVECTOR_NODE_ID=0              # Node identifier (0, 1, 2, ...)
+RUVECTOR_CLUSTER_SIZE=3         # Total cluster size
+RUVECTOR_RAFT_ELECTION_TIMEOUT=150  # Election timeout (ms)
+RUVECTOR_RAFT_HEARTBEAT_INTERVAL=50 # Heartbeat interval (ms)
+```
+
+### 4. Primary-Replica Replication
+
+Best for: Read scaling, geographic distribution
+
+```bash
+# Deploy with 3 replicas
+./deploy.sh deploy-replication 3
+```
+
+**Architecture:**
+```
+                    ┌─────────────┐
+          Writes───►│   Primary   │
+                    └──────┬──────┘
+                           │ Replication
+          ┌────────────────┼────────────────┐
+          ▼                ▼                ▼
+    ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
+    │  Replica 1  │  │  Replica 2  │  │  Replica 3  │
+    └─────────────┘  └─────────────┘  └─────────────┘
+          │                │                │
+          └────────────────┴────────────────┘
+                      Reads (load balanced)
+```
+
+**Configuration:**
+```bash
+# Primary node
+RUVECTOR_MODE=primary
+RUVECTOR_REPLICATION_FACTOR=3
+RUVECTOR_SYNC_MODE=async  # or "sync" for strong consistency
+
+# Replica nodes
+RUVECTOR_MODE=replica
+RUVECTOR_PRIMARY_URL=https://ruvector-primary-xxx.run.app
+```
+
+## Benchmarking
+
+### Available Benchmarks
+
+| Benchmark | Description | Dimensions | Vector Count |
+|-----------|-------------|------------|--------------|
+| `quick` | Fast sanity check | 128 | 10,000 |
+| `distance` | Distance computation | configurable | configurable |
+| `hnsw` | HNSW index search | configurable | configurable |
+| `gnn` | GNN forward pass | 256 | 10,000 nodes |
+| `cuda` | CUDA kernel perf | - | - |
+| `quantization` | INT8/PQ compression | configurable | configurable |
+
+### Running Benchmarks via API
+
+```bash
+# Quick benchmark
+curl -X POST https://YOUR-SERVICE-URL/benchmark/quick
+
+# Custom distance benchmark
+curl -X POST "https://YOUR-SERVICE-URL/benchmark/distance?dims=768&num_vectors=100000&batch_size=64"
+
+# Custom HNSW benchmark
+curl -X POST "https://YOUR-SERVICE-URL/benchmark/hnsw?dims=768&num_vectors=100000&k=10"
+
+# Full custom benchmark
+curl -X POST https://YOUR-SERVICE-URL/benchmark \
+    -H "Content-Type: application/json" \
+    -d '{
+        "dims": 768,
+        "num_vectors": 100000,
+        "num_queries": 1000,
+        "k": 10,
+        "benchmark_type": "hnsw"
+    }'
+```
+
+### Expected Performance
+
+**NVIDIA L4 GPU (Cloud Run default):**
+
+| Operation | Dimensions | Vectors | P99 Latency | QPS |
+|-----------|------------|---------|-------------|-----|
+| L2 Distance | 128 | 10k | 0.5ms | 2,000 |
+| L2 Distance | 768 | 100k | 5ms | 200 |
+| HNSW Search | 128 | 100k | 1ms | 1,000 |
+| HNSW Search | 768 | 1M | 10ms | 100 |
+| GNN Forward | 256 | 10k nodes | 15ms | 66 |
+
+### SIMD Capabilities
+
+The benchmark automatically detects and uses:
+
+| Architecture | SIMD | Vector Width | Speedup |
+|--------------|------|--------------|---------|
+| x86_64 | AVX-512 | 16 floats | 8-16x |
+| x86_64 | AVX2 | 8 floats | 4-8x |
+| x86_64 | SSE4.1 | 4 floats | 2-4x |
+| ARM64 | NEON | 4 floats | 2-4x |
+
+## Architecture
+
+### System Components
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        Cloud Run                                 │
+├─────────────────────────────────────────────────────────────────┤
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐ │
+│  │ HTTP Server │  │  Benchmark  │  │    SIMD/GPU Runtime     │ │
+│  │   (Axum)    │  │   Engine    │  │  AVX-512 │ CUDA │ NEON  │ │
+│  └──────┬──────┘  └──────┬──────┘  └────────────────┬────────┘ │
+│         │                │                          │          │
+│  ┌──────┴────────────────┴──────────────────────────┴────────┐ │
+│  │                    RuVector Core                          │ │
+│  │  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────────────┐  │ │
+│  │  │  HNSW  │  │  GNN   │  │ Quant  │  │  Attention     │  │ │
+│  │  │ Index  │  │ Layers │  │  INT8  │  │  Multi-Head    │  │ │
+│  │  └────────┘  └────────┘  └────────┘  └────────────────┘  │ │
+│  └───────────────────────────────────────────────────────────┘ │
+├─────────────────────────────────────────────────────────────────┤
+│                      NVIDIA L4 GPU                              │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### File Structure
+
+```
+examples/google-cloud/
+├── Cargo.toml              # Rust dependencies
+├── Dockerfile.gpu          # GPU-optimized Docker image
+├── cloudrun.yaml           # Cloud Run service configs
+├── deploy.sh               # Deployment automation
+├── README.md               # This file
+└── src/
+    ├── main.rs             # CLI entry point
+    ├── benchmark.rs        # Benchmark implementations
+    ├── simd.rs             # SIMD-optimized operations
+    ├── cuda.rs             # GPU/CUDA operations
+    ├── report.rs           # Report generation
+    └── server.rs           # HTTP server for Cloud Run
+```
+
+## API Reference
+
+### Endpoints
+
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/` | API info and available endpoints |
+| GET | `/health` | Health check |
+| GET | `/info` | System information (GPU, SIMD, memory) |
+| POST | `/benchmark` | Run custom benchmark |
+| POST | `/benchmark/quick` | Run quick benchmark |
+| POST | `/benchmark/distance` | Run distance benchmark |
+| POST | `/benchmark/hnsw` | Run HNSW benchmark |
+| GET | `/results` | Get all benchmark results |
+| POST | `/results/clear` | Clear stored results |
+
+### Health Check Response
+
+```json
+{
+    "status": "healthy",
+    "version": "0.1.0",
+    "gpu_available": true,
+    "gpu_name": "NVIDIA L4",
+    "simd_capability": "AVX2",
+    "uptime_secs": 3600
+}
+```
+
+### Benchmark Request
+
+```json
+{
+    "dims": 768,
+    "num_vectors": 100000,
+    "num_queries": 1000,
+    "k": 10,
+    "benchmark_type": "hnsw"
+}
+```
+
+### Benchmark Response
+
+```json
+{
+    "status": "success",
+    "message": "Benchmark completed",
+    "result": {
+        "name": "hnsw_768d_100000v",
+        "operation": "hnsw_search",
+        "dimensions": 768,
+        "num_vectors": 100000,
+        "mean_time_ms": 2.5,
+        "p50_ms": 2.1,
+        "p95_ms": 3.8,
+        "p99_ms": 5.2,
+        "qps": 400.0,
+        "memory_mb": 585.9,
+        "gpu_enabled": true
+    }
+}
+```
+
+## Troubleshooting
+
+### Common Issues
+
+**1. GPU not detected**
+
+```bash
+# Check GPU availability
+gcloud run services describe ruvector-benchmark \
+    --region=us-central1 \
+    --format='yaml(spec.template.metadata.annotations)'
+
+# Ensure GPU annotations are present:
+# run.googleapis.com/gpu-type: nvidia-l4
+# run.googleapis.com/gpu-count: "1"
+```
+
+**2. Container fails to start**
+
+```bash
+# Check logs
+./deploy.sh logs ruvector-benchmark 200
+
+# Common causes:
+# - Missing CUDA libraries (use nvidia/cuda base image)
+# - Memory limit too low (increase MEMORY env var)
+# - Health check failing (check /health endpoint)
+```
+
+**3. Slow cold starts**
+
+```bash
+# Set minimum instances
+MIN_INSTANCES=1 ./deploy.sh deploy latest true
+
+# Enable startup CPU boost (already in cloudrun.yaml)
+```
+
+**4. Out of memory**
+
+```bash
+# Increase memory allocation
+MEMORY=16Gi ./deploy.sh deploy latest true
+
+# Or reduce vector count in benchmark
+curl -X POST "$URL/benchmark?num_vectors=50000"
+```
+
+### Performance Optimization
+
+1. **Enable CPU boost for cold starts**
+   ```yaml
+   run.googleapis.com/startup-cpu-boost: "true"
+   ```
+
+2. **Disable CPU throttling**
+   ```yaml
+   run.googleapis.com/cpu-throttling: "false"
+   ```
+
+3. **Use Gen2 execution environment**
+   ```yaml
+   run.googleapis.com/execution-environment: gen2
+   ```
+
+4. **Tune concurrency based on workload**
+   - CPU-bound: Lower concurrency (10-20)
+   - Memory-bound: Medium concurrency (50-80)
+   - I/O-bound: Higher concurrency (100+)
+
+### Cleanup
+
+```bash
+# Remove all RuVector services
+./deploy.sh cleanup
+
+# Remove specific service
+gcloud run services delete ruvector-benchmark --region=us-central1
+
+# Remove container images
+gcloud container images delete gcr.io/PROJECT_ID/ruvector-benchmark
+```
+
+## Cost Estimation
+
+| Configuration | vCPU | Memory | GPU | Cost/hour |
+|---------------|------|--------|-----|-----------|
+| Basic | 2 | 4GB | None | ~$0.10 |
+| GPU Standard | 4 | 8GB | L4 | ~$0.80 |
+| GPU High-Mem | 8 | 16GB | L4 | ~$1.20 |
+| Raft Cluster (3) | 6 | 12GB | None | ~$0.30 |
+
+*Costs are approximate and vary by region. See [Cloud Run Pricing](https://cloud.google.com/run/pricing).*
+
+## Contributing
+
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Run benchmarks to verify performance
+5. Submit a pull request
+
+## License
+
+MIT License - see [LICENSE](../../LICENSE) for details.