Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
549
vendor/ruvector/examples/google-cloud/README.md
vendored
Normal file
549
vendor/ruvector/examples/google-cloud/README.md
vendored
Normal file
@@ -0,0 +1,549 @@
|
||||
# RuVector Cloud Run GPU Deployment
|
||||
|
||||
High-performance vector database benchmarks and deployment on Google Cloud Run with GPU acceleration (NVIDIA L4).
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Prerequisites](#prerequisites)
|
||||
- [Quick Start](#quick-start)
|
||||
- [Step-by-Step Tutorial](#step-by-step-tutorial)
|
||||
- [Deployment Options](#deployment-options)
|
||||
- [Benchmarking](#benchmarking)
|
||||
- [Architecture](#architecture)
|
||||
- [API Reference](#api-reference)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
|
||||
## Overview
|
||||
|
||||
This example provides:
|
||||
|
||||
- **GPU-Accelerated Benchmarks**: SIMD (AVX-512, AVX2, NEON) and CUDA optimized operations
|
||||
- **Cloud Run Deployment**: Scalable, serverless deployment with GPU support
|
||||
- **Multiple Deployment Models**:
|
||||
- Single-node benchmark service
|
||||
- Attention/GNN inference service
|
||||
- Raft consensus cluster (3+ nodes)
|
||||
- Primary-replica replication
|
||||
|
||||
### Supported RuVector Capabilities
|
||||
|
||||
| Capability | Description | Cloud Run Support |
|
||||
|------------|-------------|-------------------|
|
||||
| **Core Vector Search** | HNSW indexing, k-NN search | ✅ Full GPU |
|
||||
| **Attention Mechanisms** | Multi-head attention layers | ✅ Full GPU |
|
||||
| **GNN Inference** | Graph neural network forward pass | ✅ Full GPU |
|
||||
| **Raft Consensus** | Distributed consensus protocol | ✅ Multi-service |
|
||||
| **Replication** | Primary-replica data replication | ✅ Multi-service |
|
||||
| **Quantization** | INT8/PQ compression | ✅ GPU optimized |
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required Tools
|
||||
|
||||
```bash
|
||||
# Google Cloud CLI
|
||||
curl https://sdk.cloud.google.com | bash
|
||||
gcloud init
|
||||
|
||||
# Docker
|
||||
# Install from: https://docs.docker.com/get-docker/
|
||||
|
||||
# Rust (for local development)
|
||||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
||||
```
|
||||
|
||||
### GCP Setup
|
||||
|
||||
```bash
|
||||
# Authenticate
|
||||
gcloud auth login
|
||||
|
||||
# Set project
|
||||
gcloud config set project YOUR_PROJECT_ID
|
||||
|
||||
# Enable required APIs
|
||||
gcloud services enable \
|
||||
run.googleapis.com \
|
||||
containerregistry.googleapis.com \
|
||||
cloudbuild.googleapis.com \
|
||||
compute.googleapis.com
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. One-Command Deployment
|
||||
|
||||
```bash
|
||||
cd examples/google-cloud
|
||||
|
||||
# Setup and deploy
|
||||
./deploy.sh setup
|
||||
./deploy.sh build Dockerfile.gpu latest
|
||||
./deploy.sh push latest
|
||||
./deploy.sh deploy latest true # true = GPU enabled
|
||||
|
||||
# Run benchmark
|
||||
./deploy.sh benchmark ruvector-benchmark quick
|
||||
```
|
||||
|
||||
### 2. View Results
|
||||
|
||||
```bash
|
||||
# Get service URL
|
||||
gcloud run services describe ruvector-benchmark \
|
||||
--region=us-central1 \
|
||||
--format='value(status.url)'
|
||||
|
||||
# Test endpoints
|
||||
curl $URL/health
|
||||
curl $URL/info
|
||||
curl -X POST $URL/benchmark/quick
|
||||
```
|
||||
|
||||
## Step-by-Step Tutorial
|
||||
|
||||
### Step 1: Project Setup
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/ruvnet/ruvector.git
|
||||
cd ruvector/examples/google-cloud
|
||||
|
||||
# Set environment variables
|
||||
export GCP_PROJECT_ID="your-project-id"
|
||||
export GCP_REGION="us-central1"
|
||||
|
||||
# Run setup
|
||||
./deploy.sh setup
|
||||
```
|
||||
|
||||
### Step 2: Build the Docker Image
|
||||
|
||||
**Option A: Local Build (faster iteration)**
|
||||
|
||||
```bash
|
||||
# Build locally
|
||||
./deploy.sh build Dockerfile.gpu latest
|
||||
|
||||
# Push to Container Registry
|
||||
./deploy.sh push latest
|
||||
```
|
||||
|
||||
**Option B: Cloud Build (no local Docker required)**
|
||||
|
||||
```bash
|
||||
# Build in the cloud
|
||||
./deploy.sh build-cloud Dockerfile.gpu latest
|
||||
```
|
||||
|
||||
### Step 3: Deploy to Cloud Run
|
||||
|
||||
**Basic Deployment (with GPU)**
|
||||
|
||||
```bash
|
||||
./deploy.sh deploy latest true
|
||||
```
|
||||
|
||||
**Custom Configuration**
|
||||
|
||||
```bash
|
||||
# High-memory configuration for large vector sets
|
||||
MEMORY=16Gi CPU=8 ./deploy.sh deploy latest true
|
||||
|
||||
# Scale settings
|
||||
MIN_INSTANCES=1 MAX_INSTANCES=20 ./deploy.sh deploy latest true
|
||||
```
|
||||
|
||||
### Step 4: Run Benchmarks
|
||||
|
||||
```bash
|
||||
# Quick benchmark (128d, 10k vectors)
|
||||
./deploy.sh benchmark ruvector-benchmark quick
|
||||
|
||||
# Distance computation benchmark
|
||||
./deploy.sh benchmark ruvector-benchmark distance
|
||||
|
||||
# HNSW index benchmark
|
||||
./deploy.sh benchmark ruvector-benchmark hnsw
|
||||
|
||||
# Full benchmark suite
|
||||
./deploy.sh benchmark ruvector-benchmark full
|
||||
```
|
||||
|
||||
### Step 5: View Results
|
||||
|
||||
```bash
|
||||
# Get all results
|
||||
./deploy.sh results ruvector-benchmark
|
||||
|
||||
# View logs
|
||||
./deploy.sh logs ruvector-benchmark
|
||||
|
||||
# Check service status
|
||||
./deploy.sh status
|
||||
```
|
||||
|
||||
## Deployment Options
|
||||
|
||||
### 1. Single-Node Benchmark Service
|
||||
|
||||
Best for: Development, testing, single-user benchmarks
|
||||
|
||||
```bash
|
||||
./deploy.sh deploy latest true
|
||||
```
|
||||
|
||||
### 2. Attention/GNN Service
|
||||
|
||||
Best for: Neural network inference, embedding generation
|
||||
|
||||
```bash
|
||||
./deploy.sh deploy-attention latest
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- 16GB memory for large models
|
||||
- 3-layer GNN with 8 attention heads
|
||||
- Optimized for batch inference
|
||||
|
||||
### 3. Raft Consensus Cluster
|
||||
|
||||
Best for: High availability, consistent distributed state
|
||||
|
||||
```bash
|
||||
# Deploy 3-node cluster
|
||||
CLUSTER_SIZE=3 ./deploy.sh deploy-raft
|
||||
|
||||
# Deploy 5-node cluster for higher fault tolerance
|
||||
CLUSTER_SIZE=5 ./deploy.sh deploy-raft
|
||||
```
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Node 1 │◄───►│ Node 2 │◄───►│ Node 3 │
|
||||
│ (Leader) │ │ (Follower) │ │ (Follower) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│ │ │
|
||||
└──────────────────┴───────────────────┘
|
||||
Raft Consensus
|
||||
```
|
||||
|
||||
**Configuration:**
|
||||
```bash
|
||||
# Environment variables for Raft nodes
|
||||
RUVECTOR_NODE_ID=0 # Node identifier (0, 1, 2, ...)
|
||||
RUVECTOR_CLUSTER_SIZE=3 # Total cluster size
|
||||
RUVECTOR_RAFT_ELECTION_TIMEOUT=150 # Election timeout (ms)
|
||||
RUVECTOR_RAFT_HEARTBEAT_INTERVAL=50 # Heartbeat interval (ms)
|
||||
```
|
||||
|
||||
### 4. Primary-Replica Replication
|
||||
|
||||
Best for: Read scaling, geographic distribution
|
||||
|
||||
```bash
|
||||
# Deploy with 3 replicas
|
||||
./deploy.sh deploy-replication 3
|
||||
```
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
┌─────────────┐
|
||||
Writes───►│ Primary │
|
||||
└──────┬──────┘
|
||||
│ Replication
|
||||
┌────────────────┼────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Replica 1 │ │ Replica 2 │ │ Replica 3 │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│ │ │
|
||||
└────────────────┴────────────────┘
|
||||
Reads (load balanced)
|
||||
```
|
||||
|
||||
**Configuration:**
|
||||
```bash
|
||||
# Primary node
|
||||
RUVECTOR_MODE=primary
|
||||
RUVECTOR_REPLICATION_FACTOR=3
|
||||
RUVECTOR_SYNC_MODE=async # or "sync" for strong consistency
|
||||
|
||||
# Replica nodes
|
||||
RUVECTOR_MODE=replica
|
||||
RUVECTOR_PRIMARY_URL=https://ruvector-primary-xxx.run.app
|
||||
```
|
||||
|
||||
## Benchmarking
|
||||
|
||||
### Available Benchmarks
|
||||
|
||||
| Benchmark | Description | Dimensions | Vector Count |
|
||||
|-----------|-------------|------------|--------------|
|
||||
| `quick` | Fast sanity check | 128 | 10,000 |
|
||||
| `distance` | Distance computation | configurable | configurable |
|
||||
| `hnsw` | HNSW index search | configurable | configurable |
|
||||
| `gnn` | GNN forward pass | 256 | 10,000 nodes |
|
||||
| `cuda` | CUDA kernel perf | - | - |
|
||||
| `quantization` | INT8/PQ compression | configurable | configurable |
|
||||
|
||||
### Running Benchmarks via API
|
||||
|
||||
```bash
|
||||
# Quick benchmark
|
||||
curl -X POST https://YOUR-SERVICE-URL/benchmark/quick
|
||||
|
||||
# Custom distance benchmark
|
||||
curl -X POST "https://YOUR-SERVICE-URL/benchmark/distance?dims=768&num_vectors=100000&batch_size=64"
|
||||
|
||||
# Custom HNSW benchmark
|
||||
curl -X POST "https://YOUR-SERVICE-URL/benchmark/hnsw?dims=768&num_vectors=100000&k=10"
|
||||
|
||||
# Full custom benchmark
|
||||
curl -X POST https://YOUR-SERVICE-URL/benchmark \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"dims": 768,
|
||||
"num_vectors": 100000,
|
||||
"num_queries": 1000,
|
||||
"k": 10,
|
||||
"benchmark_type": "hnsw"
|
||||
}'
|
||||
```
|
||||
|
||||
### Expected Performance
|
||||
|
||||
**NVIDIA L4 GPU (Cloud Run default):**
|
||||
|
||||
| Operation | Dimensions | Vectors | P99 Latency | QPS |
|
||||
|-----------|------------|---------|-------------|-----|
|
||||
| L2 Distance | 128 | 10k | 0.5ms | 2,000 |
|
||||
| L2 Distance | 768 | 100k | 5ms | 200 |
|
||||
| HNSW Search | 128 | 100k | 1ms | 1,000 |
|
||||
| HNSW Search | 768 | 1M | 10ms | 100 |
|
||||
| GNN Forward | 256 | 10k nodes | 15ms | 66 |
|
||||
|
||||
### SIMD Capabilities
|
||||
|
||||
The benchmark automatically detects and uses:
|
||||
|
||||
| Architecture | SIMD | Vector Width | Speedup |
|
||||
|--------------|------|--------------|---------|
|
||||
| x86_64 | AVX-512 | 16 floats | 8-16x |
|
||||
| x86_64 | AVX2 | 8 floats | 4-8x |
|
||||
| x86_64 | SSE4.1 | 4 floats | 2-4x |
|
||||
| ARM64 | NEON | 4 floats | 2-4x |
|
||||
|
||||
## Architecture
|
||||
|
||||
### System Components
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Cloud Run │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ HTTP Server │ │ Benchmark │ │ SIMD/GPU Runtime │ │
|
||||
│ │ (Axum) │ │ Engine │ │ AVX-512 │ CUDA │ NEON │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └────────────────┬────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ┌──────┴────────────────┴──────────────────────────┴────────┐ │
|
||||
│ │ RuVector Core │ │
|
||||
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────────────┐ │ │
|
||||
│ │ │ HNSW │ │ GNN │ │ Quant │ │ Attention │ │ │
|
||||
│ │ │ Index │ │ Layers │ │ INT8 │ │ Multi-Head │ │ │
|
||||
│ │ └────────┘ └────────┘ └────────┘ └────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ NVIDIA L4 GPU │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### File Structure
|
||||
|
||||
```
|
||||
examples/google-cloud/
|
||||
├── Cargo.toml # Rust dependencies
|
||||
├── Dockerfile.gpu # GPU-optimized Docker image
|
||||
├── cloudrun.yaml # Cloud Run service configs
|
||||
├── deploy.sh # Deployment automation
|
||||
├── README.md # This file
|
||||
└── src/
|
||||
├── main.rs # CLI entry point
|
||||
├── benchmark.rs # Benchmark implementations
|
||||
├── simd.rs # SIMD-optimized operations
|
||||
├── cuda.rs # GPU/CUDA operations
|
||||
├── report.rs # Report generation
|
||||
└── server.rs # HTTP server for Cloud Run
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/` | API info and available endpoints |
|
||||
| GET | `/health` | Health check |
|
||||
| GET | `/info` | System information (GPU, SIMD, memory) |
|
||||
| POST | `/benchmark` | Run custom benchmark |
|
||||
| POST | `/benchmark/quick` | Run quick benchmark |
|
||||
| POST | `/benchmark/distance` | Run distance benchmark |
|
||||
| POST | `/benchmark/hnsw` | Run HNSW benchmark |
|
||||
| GET | `/results` | Get all benchmark results |
|
||||
| POST | `/results/clear` | Clear stored results |
|
||||
|
||||
### Health Check Response
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"version": "0.1.0",
|
||||
"gpu_available": true,
|
||||
"gpu_name": "NVIDIA L4",
|
||||
"simd_capability": "AVX2",
|
||||
"uptime_secs": 3600
|
||||
}
|
||||
```
|
||||
|
||||
### Benchmark Request
|
||||
|
||||
```json
|
||||
{
|
||||
"dims": 768,
|
||||
"num_vectors": 100000,
|
||||
"num_queries": 1000,
|
||||
"k": 10,
|
||||
"benchmark_type": "hnsw"
|
||||
}
|
||||
```
|
||||
|
||||
### Benchmark Response
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"message": "Benchmark completed",
|
||||
"result": {
|
||||
"name": "hnsw_768d_100000v",
|
||||
"operation": "hnsw_search",
|
||||
"dimensions": 768,
|
||||
"num_vectors": 100000,
|
||||
"mean_time_ms": 2.5,
|
||||
"p50_ms": 2.1,
|
||||
"p95_ms": 3.8,
|
||||
"p99_ms": 5.2,
|
||||
"qps": 400.0,
|
||||
"memory_mb": 585.9,
|
||||
"gpu_enabled": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**1. GPU not detected**
|
||||
|
||||
```bash
|
||||
# Check GPU availability
|
||||
gcloud run services describe ruvector-benchmark \
|
||||
--region=us-central1 \
|
||||
--format='yaml(spec.template.metadata.annotations)'
|
||||
|
||||
# Ensure GPU annotations are present:
|
||||
# run.googleapis.com/gpu-type: nvidia-l4
|
||||
# run.googleapis.com/gpu-count: "1"
|
||||
```
|
||||
|
||||
**2. Container fails to start**
|
||||
|
||||
```bash
|
||||
# Check logs
|
||||
./deploy.sh logs ruvector-benchmark 200
|
||||
|
||||
# Common causes:
|
||||
# - Missing CUDA libraries (use nvidia/cuda base image)
|
||||
# - Memory limit too low (increase MEMORY env var)
|
||||
# - Health check failing (check /health endpoint)
|
||||
```
|
||||
|
||||
**3. Slow cold starts**
|
||||
|
||||
```bash
|
||||
# Set minimum instances
|
||||
MIN_INSTANCES=1 ./deploy.sh deploy latest true
|
||||
|
||||
# Enable startup CPU boost (already in cloudrun.yaml)
|
||||
```
|
||||
|
||||
**4. Out of memory**
|
||||
|
||||
```bash
|
||||
# Increase memory allocation
|
||||
MEMORY=16Gi ./deploy.sh deploy latest true
|
||||
|
||||
# Or reduce vector count in benchmark
|
||||
curl -X POST "$URL/benchmark?num_vectors=50000"
|
||||
```
|
||||
|
||||
### Performance Optimization
|
||||
|
||||
1. **Enable CPU boost for cold starts**
|
||||
```yaml
|
||||
run.googleapis.com/startup-cpu-boost: "true"
|
||||
```
|
||||
|
||||
2. **Disable CPU throttling**
|
||||
```yaml
|
||||
run.googleapis.com/cpu-throttling: "false"
|
||||
```
|
||||
|
||||
3. **Use Gen2 execution environment**
|
||||
```yaml
|
||||
run.googleapis.com/execution-environment: gen2
|
||||
```
|
||||
|
||||
4. **Tune concurrency based on workload**
|
||||
- CPU-bound: Lower concurrency (10-20)
|
||||
- Memory-bound: Medium concurrency (50-80)
|
||||
- I/O-bound: Higher concurrency (100+)
|
||||
|
||||
### Cleanup
|
||||
|
||||
```bash
|
||||
# Remove all RuVector services
|
||||
./deploy.sh cleanup
|
||||
|
||||
# Remove specific service
|
||||
gcloud run services delete ruvector-benchmark --region=us-central1
|
||||
|
||||
# Remove container images
|
||||
gcloud container images delete gcr.io/PROJECT_ID/ruvector-benchmark
|
||||
```
|
||||
|
||||
## Cost Estimation
|
||||
|
||||
| Configuration | vCPU | Memory | GPU | Cost/hour |
|
||||
|---------------|------|--------|-----|-----------|
|
||||
| Basic | 2 | 4GB | None | ~$0.10 |
|
||||
| GPU Standard | 4 | 8GB | L4 | ~$0.80 |
|
||||
| GPU High-Mem | 8 | 16GB | L4 | ~$1.20 |
|
||||
| Raft Cluster (3) | 6 | 12GB | None | ~$0.30 |
|
||||
|
||||
*Costs are approximate and vary by region. See [Cloud Run Pricing](https://cloud.google.com/run/pricing).*
|
||||
|
||||
## Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Make your changes
|
||||
4. Run benchmarks to verify performance
|
||||
5. Submit a pull request
|
||||
|
||||
## License
|
||||
|
||||
MIT License - see [LICENSE](../../LICENSE) for details.
|
||||
Reference in New Issue
Block a user