Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
263
npm/packages/ruvbot/docs/adr/ADR-013-gcp-deployment.md
Normal file
263
npm/packages/ruvbot/docs/adr/ADR-013-gcp-deployment.md
Normal file
@@ -0,0 +1,263 @@
|
||||
# ADR-013: Google Cloud Platform Deployment Architecture
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Date
|
||||
2026-01-27
|
||||
|
||||
## Context
|
||||
|
||||
RuvBot needs a production-ready deployment option that:
|
||||
1. Minimizes operational costs for low-traffic scenarios
|
||||
2. Scales automatically with demand
|
||||
3. Provides persistence for sessions, memory, and learning data
|
||||
4. Secures API keys and credentials
|
||||
5. Supports multi-tenant deployments
|
||||
|
||||
## Decision
|
||||
|
||||
Deploy RuvBot on Google Cloud Platform using serverless and managed services optimized for cost.
|
||||
|
||||
### Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Google Cloud Platform │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Cloud │ │ Cloud │ │ Cloud │ │
|
||||
│ │ Build │───▶│ Registry │───▶│ Run │ │
|
||||
│ │ (CI/CD) │ │ (Images) │ │ (App) │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────┬───────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────────────────────┼────────────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ┌──────▼──────┐ ┌────────────────▼───────────┐ │ │
|
||||
│ │ Secret │ │ Cloud SQL │ │ │
|
||||
│ │ Manager │ │ (PostgreSQL) │ │ │
|
||||
│ │ │ │ db-f1-micro │ │ │
|
||||
│ └─────────────┘ └────────────────────────────┘ │ │
|
||||
│ │ │
|
||||
│ ┌─────────────┐ ┌────────────────────────────┐ │ │
|
||||
│ │ Cloud │ │ Memorystore │ │ │
|
||||
│ │ Storage │ │ (Redis) - Optional │ │ │
|
||||
│ │ (Files) │ │ Basic tier │ │ │
|
||||
│ └─────────────┘ └────────────────────────────┘ │ │
|
||||
│ │ │
|
||||
│ └────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Cost Optimization Strategy
|
||||
|
||||
| Service | Configuration | Monthly Cost | Notes |
|
||||
|---------|--------------|--------------|-------|
|
||||
| Cloud Run | 0-10 instances, 512Mi RAM | ~$0-5 | Free tier: 2M requests |
|
||||
| Cloud SQL | db-f1-micro, 10GB SSD | ~$10-15 | Smallest instance |
|
||||
| Secret Manager | 3-5 secrets | ~$0.18 | $0.06/secret/month |
|
||||
| Cloud Storage | Standard, lifecycle policies | ~$0.02/GB | Auto-tiering |
|
||||
| Cloud Build | Free tier | ~$0 | 120 min/day free |
|
||||
| **Total (low traffic)** | | **~$15-20/month** | |
|
||||
|
||||
### Service Configuration
|
||||
|
||||
#### Cloud Run (Compute)
|
||||
|
||||
```yaml
|
||||
# Serverless container configuration
|
||||
resources:
|
||||
cpu: "1"
|
||||
memory: "512Mi"
|
||||
scaling:
|
||||
minInstances: 0 # Scale to zero when idle
|
||||
maxInstances: 10 # Limit for cost control
|
||||
concurrency: 80 # Requests per instance
|
||||
features:
|
||||
cpuIdle: true # Reduce CPU when idle (cost savings)
|
||||
startupCpuBoost: true # Faster cold starts
|
||||
timeout: 300s # 5 minutes for long operations
|
||||
```
|
||||
|
||||
#### Cloud SQL (Database)
|
||||
|
||||
```hcl
|
||||
# Cost-optimized PostgreSQL
|
||||
tier = "db-f1-micro" # 0.6GB RAM, shared CPU
|
||||
disk_size = 10 # Minimum SSD
|
||||
availability = "ZONAL" # Single zone (cheaper)
|
||||
backup_retention = 7 # 7 days
|
||||
|
||||
# Extensions enabled
|
||||
- uuid-ossp # UUID generation
|
||||
- pgcrypto # Cryptographic functions
|
||||
- pg_trgm # Text search (trigram similarity)
|
||||
```
|
||||
|
||||
#### Secret Manager
|
||||
|
||||
Securely stores:
|
||||
- `anthropic-api-key` - Anthropic API credentials
|
||||
- `openrouter-api-key` - OpenRouter API credentials
|
||||
- `database-url` - PostgreSQL connection string
|
||||
|
||||
#### Cloud Storage
|
||||
|
||||
```hcl
|
||||
# Automatic cost optimization
|
||||
lifecycle_rules = [
|
||||
{ age = 30, action = "SetStorageClass", class = "NEARLINE" },
|
||||
{ age = 90, action = "SetStorageClass", class = "COLDLINE" }
|
||||
]
|
||||
```
|
||||
|
||||
### Deployment Options
|
||||
|
||||
#### Option 1: Quick Deploy (gcloud CLI)
|
||||
|
||||
```bash
|
||||
# Set environment variables
|
||||
export ANTHROPIC_API_KEY="sk-ant-..."
|
||||
export PROJECT_ID="my-project"
|
||||
|
||||
# Run deployment script
|
||||
./deploy/gcp/deploy.sh --project-id $PROJECT_ID
|
||||
```
|
||||
|
||||
#### Option 2: Infrastructure as Code (Terraform)
|
||||
|
||||
```bash
|
||||
cd deploy/gcp/terraform
|
||||
|
||||
terraform init
|
||||
terraform plan -var="project_id=my-project" -var="anthropic_api_key=sk-ant-..."
|
||||
terraform apply
|
||||
```
|
||||
|
||||
#### Option 3: CI/CD (Cloud Build)
|
||||
|
||||
```yaml
|
||||
# Trigger on push to main branch
|
||||
trigger:
|
||||
branch: main
|
||||
included_files:
|
||||
- "npm/packages/ruvbot/**"
|
||||
|
||||
# cloudbuild.yaml handles build and deploy
|
||||
```
|
||||
|
||||
### Multi-Tenant Configuration
|
||||
|
||||
For multiple tenants:
|
||||
|
||||
```hcl
|
||||
# Separate Cloud SQL databases
|
||||
resource "google_sql_database" "tenant" {
|
||||
for_each = var.tenants
|
||||
name = "ruvbot_${each.key}"
|
||||
instance = google_sql_database_instance.ruvbot.name
|
||||
}
|
||||
|
||||
# Row-Level Security in PostgreSQL
|
||||
ALTER TABLE sessions ENABLE ROW LEVEL SECURITY;
|
||||
CREATE POLICY tenant_isolation ON sessions
|
||||
USING (tenant_id = current_setting('app.tenant_id')::uuid);
|
||||
```
|
||||
|
||||
### Scaling Considerations
|
||||
|
||||
| Traffic Level | Cloud Run Instances | Cloud SQL | Estimated Cost |
|
||||
|---------------|---------------------|-----------|----------------|
|
||||
| Low (<1K req/day) | 0-1 | db-f1-micro | ~$15/month |
|
||||
| Medium (<10K req/day) | 1-3 | db-g1-small | ~$40/month |
|
||||
| High (<100K req/day) | 3-10 | db-custom | ~$150/month |
|
||||
| Enterprise | 10-100 | Regional HA | ~$500+/month |
|
||||
|
||||
### Security Configuration
|
||||
|
||||
```hcl
|
||||
# Service account with minimal permissions
|
||||
roles = [
|
||||
"roles/secretmanager.secretAccessor",
|
||||
"roles/cloudsql.client",
|
||||
"roles/storage.objectAdmin",
|
||||
"roles/logging.logWriter",
|
||||
"roles/monitoring.metricWriter",
|
||||
]
|
||||
|
||||
# Network security
|
||||
ip_configuration {
|
||||
ipv4_enabled = false # Production: use private IP
|
||||
private_network = google_compute_network.vpc.id
|
||||
}
|
||||
```
|
||||
|
||||
### Health Monitoring
|
||||
|
||||
```yaml
|
||||
# Cloud Run health checks
|
||||
startup_probe:
|
||||
http_get:
|
||||
path: /health
|
||||
port: 8080
|
||||
initial_delay_seconds: 5
|
||||
timeout_seconds: 3
|
||||
period_seconds: 10
|
||||
|
||||
liveness_probe:
|
||||
http_get:
|
||||
path: /health
|
||||
port: 8080
|
||||
timeout_seconds: 3
|
||||
period_seconds: 30
|
||||
```
|
||||
|
||||
### File Structure
|
||||
|
||||
```
|
||||
deploy/
|
||||
├── gcp/
|
||||
│ ├── cloudbuild.yaml # CI/CD pipeline
|
||||
│ ├── deploy.sh # Quick deployment script
|
||||
│ └── terraform/
|
||||
│ └── main.tf # Infrastructure as code
|
||||
├── init-db.sql # Database schema
|
||||
├── Dockerfile # Container image
|
||||
└── docker-compose.yml # Local development
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Cost-effective**: ~$15-20/month for low traffic
|
||||
- **Serverless**: Scale to zero when not in use
|
||||
- **Managed services**: No infrastructure maintenance
|
||||
- **Security**: Secret Manager, IAM, VPC support
|
||||
- **Observability**: Built-in logging and monitoring
|
||||
|
||||
### Negative
|
||||
- **Cold starts**: First request after idle ~2-3 seconds
|
||||
- **Vendor lock-in**: GCP-specific services
|
||||
- **Complexity**: Multiple services to configure
|
||||
|
||||
### Trade-offs
|
||||
- **Cloud SQL vs Firestore**: SQL chosen for complex queries, Row-Level Security
|
||||
- **Cloud Run vs GKE**: Run chosen for simplicity, lower cost
|
||||
- **db-f1-micro vs larger**: Cost vs performance trade-off
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
| Option | Pros | Cons | Estimated Cost |
|
||||
|--------|------|------|----------------|
|
||||
| GKE + Postgres | Full control, predictable | Complex, expensive | ~$100+/month |
|
||||
| App Engine | Simple deployment | Less flexible | ~$30/month |
|
||||
| Firebase + Functions | Easy scaling | No SQL, vendor lock | ~$20/month |
|
||||
| **Cloud Run + SQL** | **Balanced** | **Some complexity** | **~$15/month** |
|
||||
|
||||
## References
|
||||
|
||||
- [Cloud Run Pricing](https://cloud.google.com/run/pricing)
|
||||
- [Cloud SQL Pricing](https://cloud.google.com/sql/pricing)
|
||||
- [Terraform GCP Provider](https://registry.terraform.io/providers/hashicorp/google/latest/docs)
|
||||
- [Cloud Build CI/CD](https://cloud.google.com/build/docs)
|
||||
Reference in New Issue
Block a user