Files
wifi-densepose/npm/packages/ruvbot/docs/adr/ADR-002-multi-tenancy-design.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

23 KiB

ADR-002: Multi-tenancy Design

Status: Accepted Date: 2026-01-27 Decision Makers: RuVector Architecture Team Technical Area: Security, Data Architecture


Context and Problem Statement

RuvBot must serve multiple organizations (tenants) and users within each organization while maintaining strict data isolation. A breach of tenant boundaries would:

  1. Violate privacy and compliance requirements (GDPR, SOC2, HIPAA)
  2. Expose sensitive business information
  3. Destroy trust in the platform
  4. Create legal liability

The multi-tenancy design must address:

  • Data Isolation: No cross-tenant data access
  • Authentication: Identity verification at multiple levels
  • Authorization: Fine-grained permission control
  • Resource Limits: Fair usage and cost allocation
  • Audit Trails: Complete visibility into access patterns

Decision Drivers

Security Requirements

Requirement Criticality Description
Zero cross-tenant leakage Critical No tenant can access another tenant's data
Row-level security Critical Database enforces isolation, not just application
Token-based auth High Stateless, revocable authentication
RBAC + ABAC High Role and attribute-based access control
Audit logging High All data access logged with tenant context

Operational Requirements

Requirement Target Description
Tenant provisioning < 30s New tenant setup time
User provisioning < 5s New user creation time
Quota enforcement Real-time Immediate limit enforcement
Data export < 1h for 1GB GDPR data portability
Data deletion < 24h GDPR right to erasure

Decision Outcome

Adopt Hierarchical Multi-tenancy with RLS and JWT Claims

We implement a three-level hierarchy with PostgreSQL Row-Level Security (RLS) as the primary isolation mechanism.

+---------------------------+
|        ORGANIZATION       |  Billing entity, security boundary
|---------------------------|
| id: UUID                  |
| name: string              |
| plan: Plan                |
| settings: OrgSettings     |
| quotas: ResourceQuotas    |
+-------------+-------------+
              |
              | 1:N
              v
+---------------------------+
|         WORKSPACE         |  Project/team boundary
|---------------------------|
| id: UUID                  |
| orgId: UUID (FK)          |
| name: string              |
| settings: WorkspaceSettings|
+-------------+-------------+
              |
              | 1:N
              v
+---------------------------+
|           USER            |  Individual identity
|---------------------------|
| id: UUID                  |
| workspaceId: UUID (FK)    |
| email: string             |
| roles: Role[]             |
| preferences: Preferences  |
+---------------------------+

Tenant Isolation Layers

Layer 1: Network Isolation

Internet
    |
    v
+---+---+
|  WAF  |  Rate limiting, DDoS protection
+---+---+
    |
    v
+---+---+
| LB/TLS|  TLS termination, tenant routing
+---+---+
    |
    +--------+--------+--------+
    |        |        |        |
+---v---+ +---v---+ +---v---+ +---v---+
| Org A | | Org B | | Org C | | Org D |  Virtual host routing
+-------+ +-------+ +-------+ +-------+

Layer 2: Authentication & Authorization

// JWT token structure with tenant claims
interface RuvBotToken {
  // Standard claims
  sub: string;          // User ID
  iat: number;          // Issued at
  exp: number;          // Expiration

  // Tenant claims (always present)
  org_id: string;       // Organization ID
  workspace_id: string; // Workspace ID

  // Permission claims
  roles: Role[];        // User roles
  permissions: string[];// Explicit permissions

  // Resource claims
  quotas: {
    sessions: number;
    messages_per_day: number;
    memory_mb: number;
  };
}

// Role hierarchy
enum Role {
  ORG_OWNER = 'org:owner',
  ORG_ADMIN = 'org:admin',
  WORKSPACE_ADMIN = 'workspace:admin',
  MEMBER = 'member',
  VIEWER = 'viewer',
  API_KEY = 'api_key',
}

// Permission matrix
const PERMISSIONS: Record<Role, string[]> = {
  'org:owner': ['*'],
  'org:admin': ['org:read', 'org:write', 'workspace:*', 'user:*', 'billing:read'],
  'workspace:admin': ['workspace:read', 'workspace:write', 'user:read', 'user:invite'],
  'member': ['session:*', 'memory:read', 'memory:write', 'skill:execute'],
  'viewer': ['session:read', 'memory:read'],
  'api_key': ['session:create', 'session:read'],
};

Layer 3: Database Row-Level Security

-- Enable RLS on all tenant-scoped tables
ALTER TABLE conversations ENABLE ROW LEVEL SECURITY;
ALTER TABLE memories ENABLE ROW LEVEL SECURITY;
ALTER TABLE sessions ENABLE ROW LEVEL SECURITY;
ALTER TABLE skills ENABLE ROW LEVEL SECURITY;
ALTER TABLE trajectories ENABLE ROW LEVEL SECURITY;

-- Create tenant context function
CREATE OR REPLACE FUNCTION current_tenant_id()
RETURNS UUID AS $$
BEGIN
  RETURN current_setting('app.current_org_id', true)::UUID;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

CREATE OR REPLACE FUNCTION current_workspace_id()
RETURNS UUID AS $$
BEGIN
  RETURN current_setting('app.current_workspace_id', true)::UUID;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

-- RLS policies for conversations
CREATE POLICY conversations_isolation ON conversations
  FOR ALL
  USING (org_id = current_tenant_id())
  WITH CHECK (org_id = current_tenant_id());

-- RLS policies for memories (workspace-level)
CREATE POLICY memories_isolation ON memories
  FOR ALL
  USING (
    org_id = current_tenant_id()
    AND workspace_id = current_workspace_id()
  );

-- Read-only policy for cross-workspace memory sharing
CREATE POLICY memories_shared_read ON memories
  FOR SELECT
  USING (
    org_id = current_tenant_id()
    AND is_shared = true
  );

Layer 4: Vector Store Isolation

// Namespace isolation in RuVector
interface VectorNamespace {
  // Namespace format: {org_id}/{workspace_id}/{collection}
  // Example: "550e8400-e29b/.../episodic"

  encode(orgId: string, workspaceId: string, collection: string): string;
  decode(namespace: string): { orgId: string; workspaceId: string; collection: string };
  validate(namespace: string, token: RuvBotToken): boolean;
}

// Vector store with tenant isolation
class TenantIsolatedVectorStore {
  constructor(
    private store: RuVectorAdapter,
    private tenantContext: TenantContext
  ) {}

  async search(query: Float32Array, options: SearchOptions): Promise<SearchResult[]> {
    const namespace = this.getNamespace(options.collection);

    // Validate namespace matches token claims
    if (!this.validateNamespace(namespace)) {
      throw new TenantIsolationError('Namespace mismatch');
    }

    return this.store.search(query, { ...options, namespace });
  }

  private getNamespace(collection: string): string {
    return `${this.tenantContext.orgId}/${this.tenantContext.workspaceId}/${collection}`;
  }

  private validateNamespace(namespace: string): boolean {
    const { orgId, workspaceId } = VectorNamespace.decode(namespace);
    return (
      orgId === this.tenantContext.orgId &&
      workspaceId === this.tenantContext.workspaceId
    );
  }
}

Data Partitioning Strategy

PostgreSQL Partitioning

-- Partition conversations by org_id for isolation and performance
CREATE TABLE conversations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  org_id UUID NOT NULL,
  workspace_id UUID NOT NULL,
  session_id UUID NOT NULL,
  user_id UUID NOT NULL,
  content TEXT NOT NULL,
  role VARCHAR(20) NOT NULL,
  embedding_id UUID,
  metadata JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY LIST (org_id);

-- Create partition per organization
CREATE OR REPLACE FUNCTION create_org_partition(org_id UUID)
RETURNS void AS $$
DECLARE
  partition_name TEXT;
BEGIN
  partition_name := 'conversations_' || replace(org_id::text, '-', '_');
  EXECUTE format(
    'CREATE TABLE IF NOT EXISTS %I PARTITION OF conversations FOR VALUES IN (%L)',
    partition_name,
    org_id
  );
END;
$$ LANGUAGE plpgsql;

-- Indexes per partition
CREATE INDEX CONCURRENTLY conversations_session_idx
  ON conversations (session_id, created_at DESC);
CREATE INDEX CONCURRENTLY conversations_user_idx
  ON conversations (user_id, created_at DESC);
CREATE INDEX CONCURRENTLY conversations_embedding_idx
  ON conversations (embedding_id) WHERE embedding_id IS NOT NULL;

Vector Store Partitioning

// HNSW index per tenant for isolation and independent scaling
interface TenantVectorIndex {
  orgId: string;
  workspaceId: string;
  collection: 'episodic' | 'semantic' | 'skills';

  // Index configuration (can vary per tenant plan)
  config: {
    dimensions: number;     // 384 for MiniLM, 1536 for larger models
    m: number;              // HNSW connections (16-32)
    efConstruction: number; // Build quality (100-200)
    efSearch: number;       // Query quality (50-100)
  };

  // Usage metrics
  metrics: {
    vectorCount: number;
    memoryUsageMB: number;
    avgSearchLatencyMs: number;
    lastOptimized: Date;
  };
}

// Index lifecycle management
class TenantIndexManager {
  async provisionTenant(orgId: string): Promise<void> {
    // Create default workspaces indices
    await this.createIndex(orgId, 'default', 'episodic');
    await this.createIndex(orgId, 'default', 'semantic');
    await this.createIndex(orgId, 'default', 'skills');
  }

  async deleteTenant(orgId: string): Promise<void> {
    // Delete all indices for org (GDPR deletion)
    const indices = await this.listIndices(orgId);
    await Promise.all(indices.map(idx => this.deleteIndex(idx.id)));

    // Log deletion for audit
    await this.auditLog.record({
      action: 'tenant_deletion',
      orgId,
      indexCount: indices.length,
      timestamp: new Date(),
    });
  }

  async optimizeIndex(indexId: string): Promise<OptimizationResult> {
    // Background optimization with tenant resource limits
    const index = await this.getIndex(indexId);
    const quota = await this.getQuota(index.orgId);

    if (index.metrics.memoryUsageMB > quota.maxVectorMemoryMB) {
      // Apply quantization to reduce memory
      return this.compressIndex(indexId, 'product_quantization');
    }

    return this.rebalanceIndex(indexId);
  }
}

Authentication Flows

OAuth2/OIDC Flow

+--------+                               +--------+
|  User  |                               |  IdP   |
+---+----+                               +---+----+
    |                                        |
    |  1. Login request                      |
    +--------------------------------------->|
    |                                        |
    |  2. Redirect to IdP                    |
    |<---------------------------------------+
    |                                        |
    |  3. Authenticate + consent             |
    +--------------------------------------->|
    |                                        |
    |  4. Auth code redirect                 |
    |<---------------------------------------+
    |                                        |
    |                   +--------+           |
    |  5. Auth code     | RuvBot |           |
    +------------------>|  Auth  |           |
    |                   +---+----+           |
    |                       |                |
    |  6. Exchange code     |                |
    |                       +--------------->|
    |                       |                |
    |  7. ID + Access token |                |
    |                       |<---------------+
    |                       |                |
    |  8. Create session,   |
    |     issue RuvBot JWT  |
    |<----------------------+
    |                       |
    |  9. Authenticated     |
    +<----------------------+

API Key Authentication

// API key structure
interface APIKey {
  id: string;
  keyHash: string;            // SHA-256 hash of actual key
  prefix: string;             // First 8 chars for identification
  orgId: string;
  workspaceId: string;
  name: string;
  permissions: string[];
  rateLimit: RateLimitConfig;
  expiresAt: Date | null;
  lastUsedAt: Date | null;
  createdBy: string;
  createdAt: Date;
}

// API key validation middleware
async function validateAPIKey(req: Request): Promise<TenantContext> {
  const authHeader = req.headers.authorization;
  if (!authHeader?.startsWith('Bearer ')) {
    throw new AuthenticationError('Missing authorization header');
  }

  const key = authHeader.slice(7);
  const prefix = key.slice(0, 8);
  const keyHash = crypto.createHash('sha256').update(key).digest('hex');

  // Lookup by prefix, then verify hash (timing-safe)
  const apiKey = await db.apiKeys.findByPrefix(prefix);
  if (!apiKey || !crypto.timingSafeEqual(
    Buffer.from(apiKey.keyHash),
    Buffer.from(keyHash)
  )) {
    throw new AuthenticationError('Invalid API key');
  }

  // Check expiration
  if (apiKey.expiresAt && apiKey.expiresAt < new Date()) {
    throw new AuthenticationError('API key expired');
  }

  // Update last used (async, don't block)
  db.apiKeys.updateLastUsed(apiKey.id).catch(console.error);

  return {
    orgId: apiKey.orgId,
    workspaceId: apiKey.workspaceId,
    userId: apiKey.createdBy,
    roles: [Role.API_KEY],
    permissions: apiKey.permissions,
  };
}

Resource Quotas and Rate Limiting

Quota Configuration

// Plan-based quota tiers
interface ResourceQuotas {
  // Session limits
  maxConcurrentSessions: number;
  maxSessionDurationMinutes: number;
  maxTurnsPerSession: number;

  // Memory limits
  maxMemoriesPerWorkspace: number;
  maxVectorStorageMB: number;
  maxEmbeddingsPerDay: number;

  // Compute limits
  maxLLMTokensPerDay: number;
  maxSkillExecutionsPerDay: number;
  maxBackgroundJobsPerHour: number;

  // Rate limits
  requestsPerMinute: number;
  requestsPerHour: number;
  burstLimit: number;
}

const PLAN_QUOTAS: Record<Plan, ResourceQuotas> = {
  free: {
    maxConcurrentSessions: 2,
    maxSessionDurationMinutes: 30,
    maxTurnsPerSession: 50,
    maxMemoriesPerWorkspace: 1000,
    maxVectorStorageMB: 50,
    maxEmbeddingsPerDay: 500,
    maxLLMTokensPerDay: 10000,
    maxSkillExecutionsPerDay: 100,
    maxBackgroundJobsPerHour: 10,
    requestsPerMinute: 20,
    requestsPerHour: 500,
    burstLimit: 5,
  },
  pro: {
    maxConcurrentSessions: 10,
    maxSessionDurationMinutes: 120,
    maxTurnsPerSession: 500,
    maxMemoriesPerWorkspace: 50000,
    maxVectorStorageMB: 1000,
    maxEmbeddingsPerDay: 10000,
    maxLLMTokensPerDay: 500000,
    maxSkillExecutionsPerDay: 5000,
    maxBackgroundJobsPerHour: 200,
    requestsPerMinute: 100,
    requestsPerHour: 5000,
    burstLimit: 20,
  },
  enterprise: {
    maxConcurrentSessions: -1, // Unlimited
    maxSessionDurationMinutes: -1,
    maxTurnsPerSession: -1,
    maxMemoriesPerWorkspace: -1,
    maxVectorStorageMB: -1,
    maxEmbeddingsPerDay: -1,
    maxLLMTokensPerDay: -1,
    maxSkillExecutionsPerDay: -1,
    maxBackgroundJobsPerHour: -1,
    requestsPerMinute: 500,
    requestsPerHour: 20000,
    burstLimit: 50,
  },
};

Rate Limiter Implementation

// Token bucket rate limiter with Redis backend
class TenantRateLimiter {
  constructor(private redis: Redis) {}

  async checkLimit(
    tenantId: string,
    action: string,
    config: RateLimitConfig
  ): Promise<RateLimitResult> {
    const key = `ratelimit:${tenantId}:${action}`;
    const now = Date.now();
    const windowMs = config.windowMs || 60000;

    // Lua script for atomic rate limit check
    const result = await this.redis.eval(`
      local key = KEYS[1]
      local now = tonumber(ARGV[1])
      local window = tonumber(ARGV[2])
      local limit = tonumber(ARGV[3])
      local burst = tonumber(ARGV[4])

      -- Remove expired entries
      redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

      -- Count current requests
      local count = redis.call('ZCARD', key)

      -- Check burst limit (recent 1s)
      local burstCount = redis.call('ZCOUNT', key, now - 1000, now)

      if burstCount >= burst then
        return {0, limit - count, burst - burstCount, now + 1000}
      end

      if count >= limit then
        local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
        local retryAfter = oldest[2] + window - now
        return {0, 0, burst - burstCount, retryAfter}
      end

      -- Add current request
      redis.call('ZADD', key, now, now .. ':' .. math.random())
      redis.call('PEXPIRE', key, window)

      return {1, limit - count - 1, burst - burstCount - 1, 0}
    `, 1, key, now, windowMs, config.limit, config.burstLimit);

    const [allowed, remaining, burstRemaining, retryAfter] = result as number[];

    return {
      allowed: allowed === 1,
      remaining,
      burstRemaining,
      retryAfter: retryAfter > 0 ? Math.ceil(retryAfter / 1000) : 0,
      limit: config.limit,
    };
  }
}

Audit Logging

// Comprehensive audit trail
interface AuditEvent {
  id: string;
  timestamp: Date;

  // Tenant context
  orgId: string;
  workspaceId: string;
  userId: string;

  // Event details
  action: AuditAction;
  resource: AuditResource;
  resourceId: string;

  // Request context
  requestId: string;
  ipAddress: string;
  userAgent: string;

  // Change tracking
  before?: Record<string, unknown>;
  after?: Record<string, unknown>;

  // Outcome
  status: 'success' | 'failure' | 'denied';
  errorCode?: string;
  errorMessage?: string;
}

type AuditAction =
  | 'create' | 'read' | 'update' | 'delete'
  | 'login' | 'logout' | 'token_refresh'
  | 'export' | 'import'
  | 'share' | 'unshare'
  | 'invite' | 'remove'
  | 'skill_execute' | 'memory_recall'
  | 'quota_exceeded' | 'rate_limited';

type AuditResource =
  | 'user' | 'session' | 'conversation'
  | 'memory' | 'skill' | 'agent'
  | 'workspace' | 'organization'
  | 'api_key' | 'webhook';

// Audit logger with async persistence
class AuditLogger {
  private buffer: AuditEvent[] = [];
  private flushInterval: NodeJS.Timeout;

  constructor(
    private storage: AuditStorage,
    private config: { batchSize: number; flushMs: number }
  ) {
    this.flushInterval = setInterval(() => this.flush(), config.flushMs);
  }

  async log(event: Omit<AuditEvent, 'id' | 'timestamp'>): Promise<void> {
    const fullEvent: AuditEvent = {
      ...event,
      id: crypto.randomUUID(),
      timestamp: new Date(),
    };

    this.buffer.push(fullEvent);

    if (this.buffer.length >= this.config.batchSize) {
      await this.flush();
    }
  }

  private async flush(): Promise<void> {
    if (this.buffer.length === 0) return;

    const events = this.buffer.splice(0, this.buffer.length);
    await this.storage.batchInsert(events);
  }

  async query(filter: AuditFilter): Promise<AuditEvent[]> {
    // Ensure tenant isolation in queries
    if (!filter.orgId) {
      throw new Error('orgId required for audit queries');
    }
    return this.storage.query(filter);
  }
}

GDPR Compliance

Data Export

// Personal data export for GDPR Article 15
class DataExporter {
  async exportUserData(
    orgId: string,
    userId: string
  ): Promise<DataExportResult> {
    const export = {
      metadata: {
        userId,
        orgId,
        exportedAt: new Date(),
        format: 'json',
        version: '1.0',
      },
      data: {} as Record<string, unknown>,
    };

    // Collect all user data across contexts
    const [
      profile,
      sessions,
      conversations,
      memories,
      preferences,
      auditLogs,
    ] = await Promise.all([
      this.exportProfile(userId),
      this.exportSessions(userId),
      this.exportConversations(userId),
      this.exportMemories(userId),
      this.exportPreferences(userId),
      this.exportAuditLogs(userId),
    ]);

    export.data = {
      profile,
      sessions,
      conversations,
      memories,
      preferences,
      auditLogs,
    };

    // Generate downloadable archive
    const archivePath = await this.createArchive(export);

    // Log export for audit
    await this.auditLogger.log({
      orgId,
      workspaceId: '*',
      userId,
      action: 'export',
      resource: 'user',
      resourceId: userId,
      status: 'success',
    });

    return {
      downloadUrl: await this.generateSignedUrl(archivePath),
      expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000), // 24h
      sizeBytes: await this.getFileSize(archivePath),
    };
  }
}

Data Deletion

// Right to erasure (GDPR Article 17)
class DataDeleter {
  async deleteUserData(
    orgId: string,
    userId: string,
    options: DeletionOptions = {}
  ): Promise<DeletionResult> {
    const jobId = crypto.randomUUID();

    // Start deletion job (may take time for large datasets)
    await this.jobQueue.enqueue('data-deletion', {
      jobId,
      orgId,
      userId,
      options,
    });

    return {
      jobId,
      status: 'pending',
      estimatedCompletionTime: await this.estimateCompletionTime(userId),
    };
  }

  async executeDeletion(job: DeletionJob): Promise<void> {
    const { orgId, userId, options } = job.data;

    // Order matters: delete dependent data first
    const steps = [
      { name: 'sessions', fn: () => this.deleteSessions(userId) },
      { name: 'conversations', fn: () => this.deleteConversations(userId) },
      { name: 'memories', fn: () => this.deleteMemories(userId, options.preserveShared) },
      { name: 'embeddings', fn: () => this.deleteEmbeddings(userId) },
      { name: 'trajectories', fn: () => this.deleteTrajectories(userId) },
      { name: 'preferences', fn: () => this.deletePreferences(userId) },
      { name: 'audit_logs', fn: () => this.anonymizeAuditLogs(userId) }, // Anonymize, not delete
      { name: 'profile', fn: () => this.deleteProfile(userId) },
    ];

    for (const step of steps) {
      try {
        const result = await step.fn();
        await this.updateProgress(job.id, step.name, 'completed', result);
      } catch (error) {
        await this.updateProgress(job.id, step.name, 'failed', error);
        throw error; // Fail job, require manual intervention
      }
    }

    // Final audit entry (anonymized user reference)
    await this.auditLogger.log({
      orgId,
      workspaceId: '*',
      userId: 'DELETED_USER',
      action: 'delete',
      resource: 'user',
      resourceId: userId.slice(0, 8) + '...',
      status: 'success',
    });
  }
}

Consequences

Benefits

  1. Strong Isolation: RLS + namespace isolation prevents cross-tenant access
  2. Compliance Ready: GDPR, SOC2, HIPAA requirements addressed
  3. Scalable Quotas: Per-tenant resource limits enable fair usage
  4. Audit Trail: Complete visibility for security and compliance
  5. Flexible Auth: OAuth2 + API keys support various use cases

Risks and Mitigations

Risk Probability Impact Mitigation
RLS bypass via SQL injection Low Critical Parameterized queries, ORM only
Token theft Medium High Short expiry, refresh rotation
Quota gaming (multiple accounts) Medium Medium Device fingerprinting, email verification
Audit log tampering Low High Append-only storage, checksums

  • ADR-001: Architecture Overview
  • ADR-003: Persistence Layer (RLS implementation details)

Revision History

Version Date Author Changes
1.0 2026-01-27 RuVector Architecture Team Initial version