git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
550 lines
16 KiB
Markdown
550 lines
16 KiB
Markdown
# RuVector Cloud Run GPU Deployment
|
|
|
|
High-performance vector database benchmarks and deployment on Google Cloud Run with GPU acceleration (NVIDIA L4).
|
|
|
|
## Table of Contents
|
|
|
|
- [Overview](#overview)
|
|
- [Prerequisites](#prerequisites)
|
|
- [Quick Start](#quick-start)
|
|
- [Step-by-Step Tutorial](#step-by-step-tutorial)
|
|
- [Deployment Options](#deployment-options)
|
|
- [Benchmarking](#benchmarking)
|
|
- [Architecture](#architecture)
|
|
- [API Reference](#api-reference)
|
|
- [Troubleshooting](#troubleshooting)
|
|
|
|
## Overview
|
|
|
|
This example provides:
|
|
|
|
- **GPU-Accelerated Benchmarks**: SIMD (AVX-512, AVX2, NEON) and CUDA optimized operations
|
|
- **Cloud Run Deployment**: Scalable, serverless deployment with GPU support
|
|
- **Multiple Deployment Models**:
|
|
- Single-node benchmark service
|
|
- Attention/GNN inference service
|
|
- Raft consensus cluster (3+ nodes)
|
|
- Primary-replica replication
|
|
|
|
### Supported RuVector Capabilities
|
|
|
|
| Capability | Description | Cloud Run Support |
|
|
|------------|-------------|-------------------|
|
|
| **Core Vector Search** | HNSW indexing, k-NN search | ✅ Full GPU |
|
|
| **Attention Mechanisms** | Multi-head attention layers | ✅ Full GPU |
|
|
| **GNN Inference** | Graph neural network forward pass | ✅ Full GPU |
|
|
| **Raft Consensus** | Distributed consensus protocol | ✅ Multi-service |
|
|
| **Replication** | Primary-replica data replication | ✅ Multi-service |
|
|
| **Quantization** | INT8/PQ compression | ✅ GPU optimized |
|
|
|
|
## Prerequisites
|
|
|
|
### Required Tools
|
|
|
|
```bash
|
|
# Google Cloud CLI
|
|
curl https://sdk.cloud.google.com | bash
|
|
gcloud init
|
|
|
|
# Docker
|
|
# Install from: https://docs.docker.com/get-docker/
|
|
|
|
# Rust (for local development)
|
|
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
|
```
|
|
|
|
### GCP Setup
|
|
|
|
```bash
|
|
# Authenticate
|
|
gcloud auth login
|
|
|
|
# Set project
|
|
gcloud config set project YOUR_PROJECT_ID
|
|
|
|
# Enable required APIs
|
|
gcloud services enable \
|
|
run.googleapis.com \
|
|
containerregistry.googleapis.com \
|
|
cloudbuild.googleapis.com \
|
|
compute.googleapis.com
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### 1. One-Command Deployment
|
|
|
|
```bash
|
|
cd examples/google-cloud
|
|
|
|
# Setup and deploy
|
|
./deploy.sh setup
|
|
./deploy.sh build Dockerfile.gpu latest
|
|
./deploy.sh push latest
|
|
./deploy.sh deploy latest true # true = GPU enabled
|
|
|
|
# Run benchmark
|
|
./deploy.sh benchmark ruvector-benchmark quick
|
|
```
|
|
|
|
### 2. View Results
|
|
|
|
```bash
|
|
# Get service URL
|
|
gcloud run services describe ruvector-benchmark \
|
|
--region=us-central1 \
|
|
--format='value(status.url)'
|
|
|
|
# Test endpoints
|
|
curl $URL/health
|
|
curl $URL/info
|
|
curl -X POST $URL/benchmark/quick
|
|
```
|
|
|
|
## Step-by-Step Tutorial
|
|
|
|
### Step 1: Project Setup
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone https://github.com/ruvnet/ruvector.git
|
|
cd ruvector/examples/google-cloud
|
|
|
|
# Set environment variables
|
|
export GCP_PROJECT_ID="your-project-id"
|
|
export GCP_REGION="us-central1"
|
|
|
|
# Run setup
|
|
./deploy.sh setup
|
|
```
|
|
|
|
### Step 2: Build the Docker Image
|
|
|
|
**Option A: Local Build (faster iteration)**
|
|
|
|
```bash
|
|
# Build locally
|
|
./deploy.sh build Dockerfile.gpu latest
|
|
|
|
# Push to Container Registry
|
|
./deploy.sh push latest
|
|
```
|
|
|
|
**Option B: Cloud Build (no local Docker required)**
|
|
|
|
```bash
|
|
# Build in the cloud
|
|
./deploy.sh build-cloud Dockerfile.gpu latest
|
|
```
|
|
|
|
### Step 3: Deploy to Cloud Run
|
|
|
|
**Basic Deployment (with GPU)**
|
|
|
|
```bash
|
|
./deploy.sh deploy latest true
|
|
```
|
|
|
|
**Custom Configuration**
|
|
|
|
```bash
|
|
# High-memory configuration for large vector sets
|
|
MEMORY=16Gi CPU=8 ./deploy.sh deploy latest true
|
|
|
|
# Scale settings
|
|
MIN_INSTANCES=1 MAX_INSTANCES=20 ./deploy.sh deploy latest true
|
|
```
|
|
|
|
### Step 4: Run Benchmarks
|
|
|
|
```bash
|
|
# Quick benchmark (128d, 10k vectors)
|
|
./deploy.sh benchmark ruvector-benchmark quick
|
|
|
|
# Distance computation benchmark
|
|
./deploy.sh benchmark ruvector-benchmark distance
|
|
|
|
# HNSW index benchmark
|
|
./deploy.sh benchmark ruvector-benchmark hnsw
|
|
|
|
# Full benchmark suite
|
|
./deploy.sh benchmark ruvector-benchmark full
|
|
```
|
|
|
|
### Step 5: View Results
|
|
|
|
```bash
|
|
# Get all results
|
|
./deploy.sh results ruvector-benchmark
|
|
|
|
# View logs
|
|
./deploy.sh logs ruvector-benchmark
|
|
|
|
# Check service status
|
|
./deploy.sh status
|
|
```
|
|
|
|
## Deployment Options
|
|
|
|
### 1. Single-Node Benchmark Service
|
|
|
|
Best for: Development, testing, single-user benchmarks
|
|
|
|
```bash
|
|
./deploy.sh deploy latest true
|
|
```
|
|
|
|
### 2. Attention/GNN Service
|
|
|
|
Best for: Neural network inference, embedding generation
|
|
|
|
```bash
|
|
./deploy.sh deploy-attention latest
|
|
```
|
|
|
|
**Features:**
|
|
- 16GB memory for large models
|
|
- 3-layer GNN with 8 attention heads
|
|
- Optimized for batch inference
|
|
|
|
### 3. Raft Consensus Cluster
|
|
|
|
Best for: High availability, consistent distributed state
|
|
|
|
```bash
|
|
# Deploy 3-node cluster
|
|
CLUSTER_SIZE=3 ./deploy.sh deploy-raft
|
|
|
|
# Deploy 5-node cluster for higher fault tolerance
|
|
CLUSTER_SIZE=5 ./deploy.sh deploy-raft
|
|
```
|
|
|
|
**Architecture:**
|
|
```
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Node 1 │◄───►│ Node 2 │◄───►│ Node 3 │
|
|
│ (Leader) │ │ (Follower) │ │ (Follower) │
|
|
└─────────────┘ └─────────────┘ └─────────────┘
|
|
│ │ │
|
|
└──────────────────┴───────────────────┘
|
|
Raft Consensus
|
|
```
|
|
|
|
**Configuration:**
|
|
```bash
|
|
# Environment variables for Raft nodes
|
|
RUVECTOR_NODE_ID=0 # Node identifier (0, 1, 2, ...)
|
|
RUVECTOR_CLUSTER_SIZE=3 # Total cluster size
|
|
RUVECTOR_RAFT_ELECTION_TIMEOUT=150 # Election timeout (ms)
|
|
RUVECTOR_RAFT_HEARTBEAT_INTERVAL=50 # Heartbeat interval (ms)
|
|
```
|
|
|
|
### 4. Primary-Replica Replication
|
|
|
|
Best for: Read scaling, geographic distribution
|
|
|
|
```bash
|
|
# Deploy with 3 replicas
|
|
./deploy.sh deploy-replication 3
|
|
```
|
|
|
|
**Architecture:**
|
|
```
|
|
┌─────────────┐
|
|
Writes───►│ Primary │
|
|
└──────┬──────┘
|
|
│ Replication
|
|
┌────────────────┼────────────────┐
|
|
▼ ▼ ▼
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Replica 1 │ │ Replica 2 │ │ Replica 3 │
|
|
└─────────────┘ └─────────────┘ └─────────────┘
|
|
│ │ │
|
|
└────────────────┴────────────────┘
|
|
Reads (load balanced)
|
|
```
|
|
|
|
**Configuration:**
|
|
```bash
|
|
# Primary node
|
|
RUVECTOR_MODE=primary
|
|
RUVECTOR_REPLICATION_FACTOR=3
|
|
RUVECTOR_SYNC_MODE=async # or "sync" for strong consistency
|
|
|
|
# Replica nodes
|
|
RUVECTOR_MODE=replica
|
|
RUVECTOR_PRIMARY_URL=https://ruvector-primary-xxx.run.app
|
|
```
|
|
|
|
## Benchmarking
|
|
|
|
### Available Benchmarks
|
|
|
|
| Benchmark | Description | Dimensions | Vector Count |
|
|
|-----------|-------------|------------|--------------|
|
|
| `quick` | Fast sanity check | 128 | 10,000 |
|
|
| `distance` | Distance computation | configurable | configurable |
|
|
| `hnsw` | HNSW index search | configurable | configurable |
|
|
| `gnn` | GNN forward pass | 256 | 10,000 nodes |
|
|
| `cuda` | CUDA kernel perf | - | - |
|
|
| `quantization` | INT8/PQ compression | configurable | configurable |
|
|
|
|
### Running Benchmarks via API
|
|
|
|
```bash
|
|
# Quick benchmark
|
|
curl -X POST https://YOUR-SERVICE-URL/benchmark/quick
|
|
|
|
# Custom distance benchmark
|
|
curl -X POST "https://YOUR-SERVICE-URL/benchmark/distance?dims=768&num_vectors=100000&batch_size=64"
|
|
|
|
# Custom HNSW benchmark
|
|
curl -X POST "https://YOUR-SERVICE-URL/benchmark/hnsw?dims=768&num_vectors=100000&k=10"
|
|
|
|
# Full custom benchmark
|
|
curl -X POST https://YOUR-SERVICE-URL/benchmark \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"dims": 768,
|
|
"num_vectors": 100000,
|
|
"num_queries": 1000,
|
|
"k": 10,
|
|
"benchmark_type": "hnsw"
|
|
}'
|
|
```
|
|
|
|
### Expected Performance
|
|
|
|
**NVIDIA L4 GPU (Cloud Run default):**
|
|
|
|
| Operation | Dimensions | Vectors | P99 Latency | QPS |
|
|
|-----------|------------|---------|-------------|-----|
|
|
| L2 Distance | 128 | 10k | 0.5ms | 2,000 |
|
|
| L2 Distance | 768 | 100k | 5ms | 200 |
|
|
| HNSW Search | 128 | 100k | 1ms | 1,000 |
|
|
| HNSW Search | 768 | 1M | 10ms | 100 |
|
|
| GNN Forward | 256 | 10k nodes | 15ms | 66 |
|
|
|
|
### SIMD Capabilities
|
|
|
|
The benchmark automatically detects and uses:
|
|
|
|
| Architecture | SIMD | Vector Width | Speedup |
|
|
|--------------|------|--------------|---------|
|
|
| x86_64 | AVX-512 | 16 floats | 8-16x |
|
|
| x86_64 | AVX2 | 8 floats | 4-8x |
|
|
| x86_64 | SSE4.1 | 4 floats | 2-4x |
|
|
| ARM64 | NEON | 4 floats | 2-4x |
|
|
|
|
## Architecture
|
|
|
|
### System Components
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Cloud Run │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
|
│ │ HTTP Server │ │ Benchmark │ │ SIMD/GPU Runtime │ │
|
|
│ │ (Axum) │ │ Engine │ │ AVX-512 │ CUDA │ NEON │ │
|
|
│ └──────┬──────┘ └──────┬──────┘ └────────────────┬────────┘ │
|
|
│ │ │ │ │
|
|
│ ┌──────┴────────────────┴──────────────────────────┴────────┐ │
|
|
│ │ RuVector Core │ │
|
|
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────────────┐ │ │
|
|
│ │ │ HNSW │ │ GNN │ │ Quant │ │ Attention │ │ │
|
|
│ │ │ Index │ │ Layers │ │ INT8 │ │ Multi-Head │ │ │
|
|
│ │ └────────┘ └────────┘ └────────┘ └────────────────┘ │ │
|
|
│ └───────────────────────────────────────────────────────────┘ │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ NVIDIA L4 GPU │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### File Structure
|
|
|
|
```
|
|
examples/google-cloud/
|
|
├── Cargo.toml # Rust dependencies
|
|
├── Dockerfile.gpu # GPU-optimized Docker image
|
|
├── cloudrun.yaml # Cloud Run service configs
|
|
├── deploy.sh # Deployment automation
|
|
├── README.md # This file
|
|
└── src/
|
|
├── main.rs # CLI entry point
|
|
├── benchmark.rs # Benchmark implementations
|
|
├── simd.rs # SIMD-optimized operations
|
|
├── cuda.rs # GPU/CUDA operations
|
|
├── report.rs # Report generation
|
|
└── server.rs # HTTP server for Cloud Run
|
|
```
|
|
|
|
## API Reference
|
|
|
|
### Endpoints
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/` | API info and available endpoints |
|
|
| GET | `/health` | Health check |
|
|
| GET | `/info` | System information (GPU, SIMD, memory) |
|
|
| POST | `/benchmark` | Run custom benchmark |
|
|
| POST | `/benchmark/quick` | Run quick benchmark |
|
|
| POST | `/benchmark/distance` | Run distance benchmark |
|
|
| POST | `/benchmark/hnsw` | Run HNSW benchmark |
|
|
| GET | `/results` | Get all benchmark results |
|
|
| POST | `/results/clear` | Clear stored results |
|
|
|
|
### Health Check Response
|
|
|
|
```json
|
|
{
|
|
"status": "healthy",
|
|
"version": "0.1.0",
|
|
"gpu_available": true,
|
|
"gpu_name": "NVIDIA L4",
|
|
"simd_capability": "AVX2",
|
|
"uptime_secs": 3600
|
|
}
|
|
```
|
|
|
|
### Benchmark Request
|
|
|
|
```json
|
|
{
|
|
"dims": 768,
|
|
"num_vectors": 100000,
|
|
"num_queries": 1000,
|
|
"k": 10,
|
|
"benchmark_type": "hnsw"
|
|
}
|
|
```
|
|
|
|
### Benchmark Response
|
|
|
|
```json
|
|
{
|
|
"status": "success",
|
|
"message": "Benchmark completed",
|
|
"result": {
|
|
"name": "hnsw_768d_100000v",
|
|
"operation": "hnsw_search",
|
|
"dimensions": 768,
|
|
"num_vectors": 100000,
|
|
"mean_time_ms": 2.5,
|
|
"p50_ms": 2.1,
|
|
"p95_ms": 3.8,
|
|
"p99_ms": 5.2,
|
|
"qps": 400.0,
|
|
"memory_mb": 585.9,
|
|
"gpu_enabled": true
|
|
}
|
|
}
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**1. GPU not detected**
|
|
|
|
```bash
|
|
# Check GPU availability
|
|
gcloud run services describe ruvector-benchmark \
|
|
--region=us-central1 \
|
|
--format='yaml(spec.template.metadata.annotations)'
|
|
|
|
# Ensure GPU annotations are present:
|
|
# run.googleapis.com/gpu-type: nvidia-l4
|
|
# run.googleapis.com/gpu-count: "1"
|
|
```
|
|
|
|
**2. Container fails to start**
|
|
|
|
```bash
|
|
# Check logs
|
|
./deploy.sh logs ruvector-benchmark 200
|
|
|
|
# Common causes:
|
|
# - Missing CUDA libraries (use nvidia/cuda base image)
|
|
# - Memory limit too low (increase MEMORY env var)
|
|
# - Health check failing (check /health endpoint)
|
|
```
|
|
|
|
**3. Slow cold starts**
|
|
|
|
```bash
|
|
# Set minimum instances
|
|
MIN_INSTANCES=1 ./deploy.sh deploy latest true
|
|
|
|
# Enable startup CPU boost (already in cloudrun.yaml)
|
|
```
|
|
|
|
**4. Out of memory**
|
|
|
|
```bash
|
|
# Increase memory allocation
|
|
MEMORY=16Gi ./deploy.sh deploy latest true
|
|
|
|
# Or reduce vector count in benchmark
|
|
curl -X POST "$URL/benchmark?num_vectors=50000"
|
|
```
|
|
|
|
### Performance Optimization
|
|
|
|
1. **Enable CPU boost for cold starts**
|
|
```yaml
|
|
run.googleapis.com/startup-cpu-boost: "true"
|
|
```
|
|
|
|
2. **Disable CPU throttling**
|
|
```yaml
|
|
run.googleapis.com/cpu-throttling: "false"
|
|
```
|
|
|
|
3. **Use Gen2 execution environment**
|
|
```yaml
|
|
run.googleapis.com/execution-environment: gen2
|
|
```
|
|
|
|
4. **Tune concurrency based on workload**
|
|
- CPU-bound: Lower concurrency (10-20)
|
|
- Memory-bound: Medium concurrency (50-80)
|
|
- I/O-bound: Higher concurrency (100+)
|
|
|
|
### Cleanup
|
|
|
|
```bash
|
|
# Remove all RuVector services
|
|
./deploy.sh cleanup
|
|
|
|
# Remove specific service
|
|
gcloud run services delete ruvector-benchmark --region=us-central1
|
|
|
|
# Remove container images
|
|
gcloud container images delete gcr.io/PROJECT_ID/ruvector-benchmark
|
|
```
|
|
|
|
## Cost Estimation
|
|
|
|
| Configuration | vCPU | Memory | GPU | Cost/hour |
|
|
|---------------|------|--------|-----|-----------|
|
|
| Basic | 2 | 4GB | None | ~$0.10 |
|
|
| GPU Standard | 4 | 8GB | L4 | ~$0.80 |
|
|
| GPU High-Mem | 8 | 16GB | L4 | ~$1.20 |
|
|
| Raft Cluster (3) | 6 | 12GB | None | ~$0.30 |
|
|
|
|
*Costs are approximate and vary by region. See [Cloud Run Pricing](https://cloud.google.com/run/pricing).*
|
|
|
|
## Contributing
|
|
|
|
1. Fork the repository
|
|
2. Create a feature branch
|
|
3. Make your changes
|
|
4. Run benchmarks to verify performance
|
|
5. Submit a pull request
|
|
|
|
## License
|
|
|
|
MIT License - see [LICENSE](../../LICENSE) for details.
|