# RuVector Cloud Run GPU Deployment

High-performance vector database benchmarks and deployment on Google Cloud Run with GPU acceleration (NVIDIA L4).

## Table of Contents

- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Quick Start](#quick-start)
- [Step-by-Step Tutorial](#step-by-step-tutorial)
- [Deployment Options](#deployment-options)
- [Benchmarking](#benchmarking)
- [Architecture](#architecture)
- [API Reference](#api-reference)
- [Troubleshooting](#troubleshooting)

## Overview

This example provides:

- **GPU-Accelerated Benchmarks**: SIMD (AVX-512, AVX2, NEON) and CUDA optimized operations
- **Cloud Run Deployment**: Scalable, serverless deployment with GPU support
- **Multiple Deployment Models**:
  - Single-node benchmark service
  - Attention/GNN inference service
  - Raft consensus cluster (3+ nodes)
  - Primary-replica replication

### Supported RuVector Capabilities

| Capability | Description | Cloud Run Support |
|------------|-------------|-------------------|
| **Core Vector Search** | HNSW indexing, k-NN search | ✅ Full GPU |
| **Attention Mechanisms** | Multi-head attention layers | ✅ Full GPU |
| **GNN Inference** | Graph neural network forward pass | ✅ Full GPU |
| **Raft Consensus** | Distributed consensus protocol | ✅ Multi-service |
| **Replication** | Primary-replica data replication | ✅ Multi-service |
| **Quantization** | INT8/PQ compression | ✅ GPU optimized |

## Prerequisites

### Required Tools

```bash
# Google Cloud CLI
curl https://sdk.cloud.google.com | bash
gcloud init

# Docker
# Install from: https://docs.docker.com/get-docker/

# Rust (for local development)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

### GCP Setup

```bash
# Authenticate
gcloud auth login

# Set project
gcloud config set project YOUR_PROJECT_ID

# Enable required APIs
gcloud services enable \
    run.googleapis.com \
    containerregistry.googleapis.com \
    cloudbuild.googleapis.com \
    compute.googleapis.com
```

## Quick Start

### 1. One-Command Deployment

```bash
cd examples/google-cloud

# Setup and deploy
./deploy.sh setup
./deploy.sh build Dockerfile.gpu latest
./deploy.sh push latest
./deploy.sh deploy latest true  # true = GPU enabled

# Run benchmark
./deploy.sh benchmark ruvector-benchmark quick
```

### 2. View Results

```bash
# Get service URL
gcloud run services describe ruvector-benchmark \
    --region=us-central1 \
    --format='value(status.url)'

# Test endpoints
curl $URL/health
curl $URL/info
curl -X POST $URL/benchmark/quick
```

## Step-by-Step Tutorial

### Step 1: Project Setup

```bash
# Clone the repository
git clone https://github.com/ruvnet/ruvector.git
cd ruvector/examples/google-cloud

# Set environment variables
export GCP_PROJECT_ID="your-project-id"
export GCP_REGION="us-central1"

# Run setup
./deploy.sh setup
```

### Step 2: Build the Docker Image

**Option A: Local Build (faster iteration)**

```bash
# Build locally
./deploy.sh build Dockerfile.gpu latest

# Push to Container Registry
./deploy.sh push latest
```

**Option B: Cloud Build (no local Docker required)**

```bash
# Build in the cloud
./deploy.sh build-cloud Dockerfile.gpu latest
```

### Step 3: Deploy to Cloud Run

**Basic Deployment (with GPU)**

```bash
./deploy.sh deploy latest true
```

**Custom Configuration**

```bash
# High-memory configuration for large vector sets
MEMORY=16Gi CPU=8 ./deploy.sh deploy latest true

# Scale settings
MIN_INSTANCES=1 MAX_INSTANCES=20 ./deploy.sh deploy latest true
```

### Step 4: Run Benchmarks

```bash
# Quick benchmark (128d, 10k vectors)
./deploy.sh benchmark ruvector-benchmark quick

# Distance computation benchmark
./deploy.sh benchmark ruvector-benchmark distance

# HNSW index benchmark
./deploy.sh benchmark ruvector-benchmark hnsw

# Full benchmark suite
./deploy.sh benchmark ruvector-benchmark full
```

### Step 5: View Results

```bash
# Get all results
./deploy.sh results ruvector-benchmark

# View logs
./deploy.sh logs ruvector-benchmark

# Check service status
./deploy.sh status
```

## Deployment Options

### 1. Single-Node Benchmark Service

Best for: Development, testing, single-user benchmarks

```bash
./deploy.sh deploy latest true
```

### 2. Attention/GNN Service

Best for: Neural network inference, embedding generation

```bash
./deploy.sh deploy-attention latest
```

**Features:**
- 16GB memory for large models
- 3-layer GNN with 8 attention heads
- Optimized for batch inference

### 3. Raft Consensus Cluster

Best for: High availability, consistent distributed state

```bash
# Deploy 3-node cluster
CLUSTER_SIZE=3 ./deploy.sh deploy-raft

# Deploy 5-node cluster for higher fault tolerance
CLUSTER_SIZE=5 ./deploy.sh deploy-raft
```

**Architecture:**
```
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Node 1    │◄───►│   Node 2    │◄───►│   Node 3    │
│  (Leader)   │     │  (Follower) │     │  (Follower) │
└─────────────┘     └─────────────┘     └─────────────┘
       │                  │                   │
       └──────────────────┴───────────────────┘
                    Raft Consensus
```

**Configuration:**
```bash
# Environment variables for Raft nodes
RUVECTOR_NODE_ID=0              # Node identifier (0, 1, 2, ...)
RUVECTOR_CLUSTER_SIZE=3         # Total cluster size
RUVECTOR_RAFT_ELECTION_TIMEOUT=150  # Election timeout (ms)
RUVECTOR_RAFT_HEARTBEAT_INTERVAL=50 # Heartbeat interval (ms)
```

### 4. Primary-Replica Replication

Best for: Read scaling, geographic distribution

```bash
# Deploy with 3 replicas
./deploy.sh deploy-replication 3
```

**Architecture:**
```
                    ┌─────────────┐
          Writes───►│   Primary   │
                    └──────┬──────┘
                           │ Replication
          ┌────────────────┼────────────────┐
          ▼                ▼                ▼
    ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
    │  Replica 1  │  │  Replica 2  │  │  Replica 3  │
    └─────────────┘  └─────────────┘  └─────────────┘
          │                │                │
          └────────────────┴────────────────┘
                      Reads (load balanced)
```

**Configuration:**
```bash
# Primary node
RUVECTOR_MODE=primary
RUVECTOR_REPLICATION_FACTOR=3
RUVECTOR_SYNC_MODE=async  # or "sync" for strong consistency

# Replica nodes
RUVECTOR_MODE=replica
RUVECTOR_PRIMARY_URL=https://ruvector-primary-xxx.run.app
```

## Benchmarking

### Available Benchmarks

| Benchmark | Description | Dimensions | Vector Count |
|-----------|-------------|------------|--------------|
| `quick` | Fast sanity check | 128 | 10,000 |
| `distance` | Distance computation | configurable | configurable |
| `hnsw` | HNSW index search | configurable | configurable |
| `gnn` | GNN forward pass | 256 | 10,000 nodes |
| `cuda` | CUDA kernel perf | - | - |
| `quantization` | INT8/PQ compression | configurable | configurable |

### Running Benchmarks via API

```bash
# Quick benchmark
curl -X POST https://YOUR-SERVICE-URL/benchmark/quick

# Custom distance benchmark
curl -X POST "https://YOUR-SERVICE-URL/benchmark/distance?dims=768&num_vectors=100000&batch_size=64"

# Custom HNSW benchmark
curl -X POST "https://YOUR-SERVICE-URL/benchmark/hnsw?dims=768&num_vectors=100000&k=10"

# Full custom benchmark
curl -X POST https://YOUR-SERVICE-URL/benchmark \
    -H "Content-Type: application/json" \
    -d '{
        "dims": 768,
        "num_vectors": 100000,
        "num_queries": 1000,
        "k": 10,
        "benchmark_type": "hnsw"
    }'
```

### Expected Performance

**NVIDIA L4 GPU (Cloud Run default):**

| Operation | Dimensions | Vectors | P99 Latency | QPS |
|-----------|------------|---------|-------------|-----|
| L2 Distance | 128 | 10k | 0.5ms | 2,000 |
| L2 Distance | 768 | 100k | 5ms | 200 |
| HNSW Search | 128 | 100k | 1ms | 1,000 |
| HNSW Search | 768 | 1M | 10ms | 100 |
| GNN Forward | 256 | 10k nodes | 15ms | 66 |

### SIMD Capabilities

The benchmark automatically detects and uses:

| Architecture | SIMD | Vector Width | Speedup |
|--------------|------|--------------|---------|
| x86_64 | AVX-512 | 16 floats | 8-16x |
| x86_64 | AVX2 | 8 floats | 4-8x |
| x86_64 | SSE4.1 | 4 floats | 2-4x |
| ARM64 | NEON | 4 floats | 2-4x |

## Architecture

### System Components

```
┌─────────────────────────────────────────────────────────────────┐
│                        Cloud Run                                 │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐ │
│  │ HTTP Server │  │  Benchmark  │  │    SIMD/GPU Runtime     │ │
│  │   (Axum)    │  │   Engine    │  │  AVX-512 │ CUDA │ NEON  │ │
│  └──────┬──────┘  └──────┬──────┘  └────────────────┬────────┘ │
│         │                │                          │          │
│  ┌──────┴────────────────┴──────────────────────────┴────────┐ │
│  │                    RuVector Core                          │ │
│  │  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────────────┐  │ │
│  │  │  HNSW  │  │  GNN   │  │ Quant  │  │  Attention     │  │ │
│  │  │ Index  │  │ Layers │  │  INT8  │  │  Multi-Head    │  │ │
│  │  └────────┘  └────────┘  └────────┘  └────────────────┘  │ │
│  └───────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│                      NVIDIA L4 GPU                              │
└─────────────────────────────────────────────────────────────────┘
```

### File Structure

```
examples/google-cloud/
├── Cargo.toml              # Rust dependencies
├── Dockerfile.gpu          # GPU-optimized Docker image
├── cloudrun.yaml           # Cloud Run service configs
├── deploy.sh               # Deployment automation
├── README.md               # This file
└── src/
    ├── main.rs             # CLI entry point
    ├── benchmark.rs        # Benchmark implementations
    ├── simd.rs             # SIMD-optimized operations
    ├── cuda.rs             # GPU/CUDA operations
    ├── report.rs           # Report generation
    └── server.rs           # HTTP server for Cloud Run
```

## API Reference

### Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/` | API info and available endpoints |
| GET | `/health` | Health check |
| GET | `/info` | System information (GPU, SIMD, memory) |
| POST | `/benchmark` | Run custom benchmark |
| POST | `/benchmark/quick` | Run quick benchmark |
| POST | `/benchmark/distance` | Run distance benchmark |
| POST | `/benchmark/hnsw` | Run HNSW benchmark |
| GET | `/results` | Get all benchmark results |
| POST | `/results/clear` | Clear stored results |

### Health Check Response

```json
{
    "status": "healthy",
    "version": "0.1.0",
    "gpu_available": true,
    "gpu_name": "NVIDIA L4",
    "simd_capability": "AVX2",
    "uptime_secs": 3600
}
```

### Benchmark Request

```json
{
    "dims": 768,
    "num_vectors": 100000,
    "num_queries": 1000,
    "k": 10,
    "benchmark_type": "hnsw"
}
```

### Benchmark Response

```json
{
    "status": "success",
    "message": "Benchmark completed",
    "result": {
        "name": "hnsw_768d_100000v",
        "operation": "hnsw_search",
        "dimensions": 768,
        "num_vectors": 100000,
        "mean_time_ms": 2.5,
        "p50_ms": 2.1,
        "p95_ms": 3.8,
        "p99_ms": 5.2,
        "qps": 400.0,
        "memory_mb": 585.9,
        "gpu_enabled": true
    }
}
```

## Troubleshooting

### Common Issues

**1. GPU not detected**

```bash
# Check GPU availability
gcloud run services describe ruvector-benchmark \
    --region=us-central1 \
    --format='yaml(spec.template.metadata.annotations)'

# Ensure GPU annotations are present:
# run.googleapis.com/gpu-type: nvidia-l4
# run.googleapis.com/gpu-count: "1"
```

**2. Container fails to start**

```bash
# Check logs
./deploy.sh logs ruvector-benchmark 200

# Common causes:
# - Missing CUDA libraries (use nvidia/cuda base image)
# - Memory limit too low (increase MEMORY env var)
# - Health check failing (check /health endpoint)
```

**3. Slow cold starts**

```bash
# Set minimum instances
MIN_INSTANCES=1 ./deploy.sh deploy latest true

# Enable startup CPU boost (already in cloudrun.yaml)
```

**4. Out of memory**

```bash
# Increase memory allocation
MEMORY=16Gi ./deploy.sh deploy latest true

# Or reduce vector count in benchmark
curl -X POST "$URL/benchmark?num_vectors=50000"
```

### Performance Optimization

1. **Enable CPU boost for cold starts**
   ```yaml
   run.googleapis.com/startup-cpu-boost: "true"
   ```

2. **Disable CPU throttling**
   ```yaml
   run.googleapis.com/cpu-throttling: "false"
   ```

3. **Use Gen2 execution environment**
   ```yaml
   run.googleapis.com/execution-environment: gen2
   ```

4. **Tune concurrency based on workload**
   - CPU-bound: Lower concurrency (10-20)
   - Memory-bound: Medium concurrency (50-80)
   - I/O-bound: Higher concurrency (100+)

### Cleanup

```bash
# Remove all RuVector services
./deploy.sh cleanup

# Remove specific service
gcloud run services delete ruvector-benchmark --region=us-central1

# Remove container images
gcloud container images delete gcr.io/PROJECT_ID/ruvector-benchmark
```

## Cost Estimation

| Configuration | vCPU | Memory | GPU | Cost/hour |
|---------------|------|--------|-----|-----------|
| Basic | 2 | 4GB | None | ~$0.10 |
| GPU Standard | 4 | 8GB | L4 | ~$0.80 |
| GPU High-Mem | 8 | 16GB | L4 | ~$1.20 |
| Raft Cluster (3) | 6 | 12GB | None | ~$0.30 |

*Costs are approximate and vary by region. See [Cloud Run Pricing](https://cloud.google.com/run/pricing).*

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run benchmarks to verify performance
5. Submit a pull request

## License

MIT License - see [LICENSE](../../LICENSE) for details.