# ADR-002: Multi-tenancy Design **Status:** Accepted **Date:** 2026-01-27 **Decision Makers:** RuVector Architecture Team **Technical Area:** Security, Data Architecture --- ## Context and Problem Statement RuvBot must serve multiple organizations (tenants) and users within each organization while maintaining strict data isolation. A breach of tenant boundaries would: 1. Violate privacy and compliance requirements (GDPR, SOC2, HIPAA) 2. Expose sensitive business information 3. Destroy trust in the platform 4. Create legal liability The multi-tenancy design must address: - **Data Isolation**: No cross-tenant data access - **Authentication**: Identity verification at multiple levels - **Authorization**: Fine-grained permission control - **Resource Limits**: Fair usage and cost allocation - **Audit Trails**: Complete visibility into access patterns --- ## Decision Drivers ### Security Requirements | Requirement | Criticality | Description | |-------------|-------------|-------------| | Zero cross-tenant leakage | Critical | No tenant can access another tenant's data | | Row-level security | Critical | Database enforces isolation, not just application | | Token-based auth | High | Stateless, revocable authentication | | RBAC + ABAC | High | Role and attribute-based access control | | Audit logging | High | All data access logged with tenant context | ### Operational Requirements | Requirement | Target | Description | |-------------|--------|-------------| | Tenant provisioning | < 30s | New tenant setup time | | User provisioning | < 5s | New user creation time | | Quota enforcement | Real-time | Immediate limit enforcement | | Data export | < 1h for 1GB | GDPR data portability | | Data deletion | < 24h | GDPR right to erasure | --- ## Decision Outcome ### Adopt Hierarchical Multi-tenancy with RLS and JWT Claims We implement a three-level hierarchy with PostgreSQL Row-Level Security (RLS) as the primary isolation mechanism. ``` +---------------------------+ | ORGANIZATION | Billing entity, security boundary |---------------------------| | id: UUID | | name: string | | plan: Plan | | settings: OrgSettings | | quotas: ResourceQuotas | +-------------+-------------+ | | 1:N v +---------------------------+ | WORKSPACE | Project/team boundary |---------------------------| | id: UUID | | orgId: UUID (FK) | | name: string | | settings: WorkspaceSettings| +-------------+-------------+ | | 1:N v +---------------------------+ | USER | Individual identity |---------------------------| | id: UUID | | workspaceId: UUID (FK) | | email: string | | roles: Role[] | | preferences: Preferences | +---------------------------+ ``` --- ## Tenant Isolation Layers ### Layer 1: Network Isolation ``` Internet | v +---+---+ | WAF | Rate limiting, DDoS protection +---+---+ | v +---+---+ | LB/TLS| TLS termination, tenant routing +---+---+ | +--------+--------+--------+ | | | | +---v---+ +---v---+ +---v---+ +---v---+ | Org A | | Org B | | Org C | | Org D | Virtual host routing +-------+ +-------+ +-------+ +-------+ ``` ### Layer 2: Authentication & Authorization ```typescript // JWT token structure with tenant claims interface RuvBotToken { // Standard claims sub: string; // User ID iat: number; // Issued at exp: number; // Expiration // Tenant claims (always present) org_id: string; // Organization ID workspace_id: string; // Workspace ID // Permission claims roles: Role[]; // User roles permissions: string[];// Explicit permissions // Resource claims quotas: { sessions: number; messages_per_day: number; memory_mb: number; }; } // Role hierarchy enum Role { ORG_OWNER = 'org:owner', ORG_ADMIN = 'org:admin', WORKSPACE_ADMIN = 'workspace:admin', MEMBER = 'member', VIEWER = 'viewer', API_KEY = 'api_key', } // Permission matrix const PERMISSIONS: Record = { 'org:owner': ['*'], 'org:admin': ['org:read', 'org:write', 'workspace:*', 'user:*', 'billing:read'], 'workspace:admin': ['workspace:read', 'workspace:write', 'user:read', 'user:invite'], 'member': ['session:*', 'memory:read', 'memory:write', 'skill:execute'], 'viewer': ['session:read', 'memory:read'], 'api_key': ['session:create', 'session:read'], }; ``` ### Layer 3: Database Row-Level Security ```sql -- Enable RLS on all tenant-scoped tables ALTER TABLE conversations ENABLE ROW LEVEL SECURITY; ALTER TABLE memories ENABLE ROW LEVEL SECURITY; ALTER TABLE sessions ENABLE ROW LEVEL SECURITY; ALTER TABLE skills ENABLE ROW LEVEL SECURITY; ALTER TABLE trajectories ENABLE ROW LEVEL SECURITY; -- Create tenant context function CREATE OR REPLACE FUNCTION current_tenant_id() RETURNS UUID AS $$ BEGIN RETURN current_setting('app.current_org_id', true)::UUID; END; $$ LANGUAGE plpgsql SECURITY DEFINER; CREATE OR REPLACE FUNCTION current_workspace_id() RETURNS UUID AS $$ BEGIN RETURN current_setting('app.current_workspace_id', true)::UUID; END; $$ LANGUAGE plpgsql SECURITY DEFINER; -- RLS policies for conversations CREATE POLICY conversations_isolation ON conversations FOR ALL USING (org_id = current_tenant_id()) WITH CHECK (org_id = current_tenant_id()); -- RLS policies for memories (workspace-level) CREATE POLICY memories_isolation ON memories FOR ALL USING ( org_id = current_tenant_id() AND workspace_id = current_workspace_id() ); -- Read-only policy for cross-workspace memory sharing CREATE POLICY memories_shared_read ON memories FOR SELECT USING ( org_id = current_tenant_id() AND is_shared = true ); ``` ### Layer 4: Vector Store Isolation ```typescript // Namespace isolation in RuVector interface VectorNamespace { // Namespace format: {org_id}/{workspace_id}/{collection} // Example: "550e8400-e29b/.../episodic" encode(orgId: string, workspaceId: string, collection: string): string; decode(namespace: string): { orgId: string; workspaceId: string; collection: string }; validate(namespace: string, token: RuvBotToken): boolean; } // Vector store with tenant isolation class TenantIsolatedVectorStore { constructor( private store: RuVectorAdapter, private tenantContext: TenantContext ) {} async search(query: Float32Array, options: SearchOptions): Promise { const namespace = this.getNamespace(options.collection); // Validate namespace matches token claims if (!this.validateNamespace(namespace)) { throw new TenantIsolationError('Namespace mismatch'); } return this.store.search(query, { ...options, namespace }); } private getNamespace(collection: string): string { return `${this.tenantContext.orgId}/${this.tenantContext.workspaceId}/${collection}`; } private validateNamespace(namespace: string): boolean { const { orgId, workspaceId } = VectorNamespace.decode(namespace); return ( orgId === this.tenantContext.orgId && workspaceId === this.tenantContext.workspaceId ); } } ``` --- ## Data Partitioning Strategy ### PostgreSQL Partitioning ```sql -- Partition conversations by org_id for isolation and performance CREATE TABLE conversations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), org_id UUID NOT NULL, workspace_id UUID NOT NULL, session_id UUID NOT NULL, user_id UUID NOT NULL, content TEXT NOT NULL, role VARCHAR(20) NOT NULL, embedding_id UUID, metadata JSONB, created_at TIMESTAMPTZ DEFAULT NOW() ) PARTITION BY LIST (org_id); -- Create partition per organization CREATE OR REPLACE FUNCTION create_org_partition(org_id UUID) RETURNS void AS $$ DECLARE partition_name TEXT; BEGIN partition_name := 'conversations_' || replace(org_id::text, '-', '_'); EXECUTE format( 'CREATE TABLE IF NOT EXISTS %I PARTITION OF conversations FOR VALUES IN (%L)', partition_name, org_id ); END; $$ LANGUAGE plpgsql; -- Indexes per partition CREATE INDEX CONCURRENTLY conversations_session_idx ON conversations (session_id, created_at DESC); CREATE INDEX CONCURRENTLY conversations_user_idx ON conversations (user_id, created_at DESC); CREATE INDEX CONCURRENTLY conversations_embedding_idx ON conversations (embedding_id) WHERE embedding_id IS NOT NULL; ``` ### Vector Store Partitioning ```typescript // HNSW index per tenant for isolation and independent scaling interface TenantVectorIndex { orgId: string; workspaceId: string; collection: 'episodic' | 'semantic' | 'skills'; // Index configuration (can vary per tenant plan) config: { dimensions: number; // 384 for MiniLM, 1536 for larger models m: number; // HNSW connections (16-32) efConstruction: number; // Build quality (100-200) efSearch: number; // Query quality (50-100) }; // Usage metrics metrics: { vectorCount: number; memoryUsageMB: number; avgSearchLatencyMs: number; lastOptimized: Date; }; } // Index lifecycle management class TenantIndexManager { async provisionTenant(orgId: string): Promise { // Create default workspaces indices await this.createIndex(orgId, 'default', 'episodic'); await this.createIndex(orgId, 'default', 'semantic'); await this.createIndex(orgId, 'default', 'skills'); } async deleteTenant(orgId: string): Promise { // Delete all indices for org (GDPR deletion) const indices = await this.listIndices(orgId); await Promise.all(indices.map(idx => this.deleteIndex(idx.id))); // Log deletion for audit await this.auditLog.record({ action: 'tenant_deletion', orgId, indexCount: indices.length, timestamp: new Date(), }); } async optimizeIndex(indexId: string): Promise { // Background optimization with tenant resource limits const index = await this.getIndex(indexId); const quota = await this.getQuota(index.orgId); if (index.metrics.memoryUsageMB > quota.maxVectorMemoryMB) { // Apply quantization to reduce memory return this.compressIndex(indexId, 'product_quantization'); } return this.rebalanceIndex(indexId); } } ``` --- ## Authentication Flows ### OAuth2/OIDC Flow ``` +--------+ +--------+ | User | | IdP | +---+----+ +---+----+ | | | 1. Login request | +--------------------------------------->| | | | 2. Redirect to IdP | |<---------------------------------------+ | | | 3. Authenticate + consent | +--------------------------------------->| | | | 4. Auth code redirect | |<---------------------------------------+ | | | +--------+ | | 5. Auth code | RuvBot | | +------------------>| Auth | | | +---+----+ | | | | | 6. Exchange code | | | +--------------->| | | | | 7. ID + Access token | | | |<---------------+ | | | | 8. Create session, | | issue RuvBot JWT | |<----------------------+ | | | 9. Authenticated | +<----------------------+ ``` ### API Key Authentication ```typescript // API key structure interface APIKey { id: string; keyHash: string; // SHA-256 hash of actual key prefix: string; // First 8 chars for identification orgId: string; workspaceId: string; name: string; permissions: string[]; rateLimit: RateLimitConfig; expiresAt: Date | null; lastUsedAt: Date | null; createdBy: string; createdAt: Date; } // API key validation middleware async function validateAPIKey(req: Request): Promise { const authHeader = req.headers.authorization; if (!authHeader?.startsWith('Bearer ')) { throw new AuthenticationError('Missing authorization header'); } const key = authHeader.slice(7); const prefix = key.slice(0, 8); const keyHash = crypto.createHash('sha256').update(key).digest('hex'); // Lookup by prefix, then verify hash (timing-safe) const apiKey = await db.apiKeys.findByPrefix(prefix); if (!apiKey || !crypto.timingSafeEqual( Buffer.from(apiKey.keyHash), Buffer.from(keyHash) )) { throw new AuthenticationError('Invalid API key'); } // Check expiration if (apiKey.expiresAt && apiKey.expiresAt < new Date()) { throw new AuthenticationError('API key expired'); } // Update last used (async, don't block) db.apiKeys.updateLastUsed(apiKey.id).catch(console.error); return { orgId: apiKey.orgId, workspaceId: apiKey.workspaceId, userId: apiKey.createdBy, roles: [Role.API_KEY], permissions: apiKey.permissions, }; } ``` --- ## Resource Quotas and Rate Limiting ### Quota Configuration ```typescript // Plan-based quota tiers interface ResourceQuotas { // Session limits maxConcurrentSessions: number; maxSessionDurationMinutes: number; maxTurnsPerSession: number; // Memory limits maxMemoriesPerWorkspace: number; maxVectorStorageMB: number; maxEmbeddingsPerDay: number; // Compute limits maxLLMTokensPerDay: number; maxSkillExecutionsPerDay: number; maxBackgroundJobsPerHour: number; // Rate limits requestsPerMinute: number; requestsPerHour: number; burstLimit: number; } const PLAN_QUOTAS: Record = { free: { maxConcurrentSessions: 2, maxSessionDurationMinutes: 30, maxTurnsPerSession: 50, maxMemoriesPerWorkspace: 1000, maxVectorStorageMB: 50, maxEmbeddingsPerDay: 500, maxLLMTokensPerDay: 10000, maxSkillExecutionsPerDay: 100, maxBackgroundJobsPerHour: 10, requestsPerMinute: 20, requestsPerHour: 500, burstLimit: 5, }, pro: { maxConcurrentSessions: 10, maxSessionDurationMinutes: 120, maxTurnsPerSession: 500, maxMemoriesPerWorkspace: 50000, maxVectorStorageMB: 1000, maxEmbeddingsPerDay: 10000, maxLLMTokensPerDay: 500000, maxSkillExecutionsPerDay: 5000, maxBackgroundJobsPerHour: 200, requestsPerMinute: 100, requestsPerHour: 5000, burstLimit: 20, }, enterprise: { maxConcurrentSessions: -1, // Unlimited maxSessionDurationMinutes: -1, maxTurnsPerSession: -1, maxMemoriesPerWorkspace: -1, maxVectorStorageMB: -1, maxEmbeddingsPerDay: -1, maxLLMTokensPerDay: -1, maxSkillExecutionsPerDay: -1, maxBackgroundJobsPerHour: -1, requestsPerMinute: 500, requestsPerHour: 20000, burstLimit: 50, }, }; ``` ### Rate Limiter Implementation ```typescript // Token bucket rate limiter with Redis backend class TenantRateLimiter { constructor(private redis: Redis) {} async checkLimit( tenantId: string, action: string, config: RateLimitConfig ): Promise { const key = `ratelimit:${tenantId}:${action}`; const now = Date.now(); const windowMs = config.windowMs || 60000; // Lua script for atomic rate limit check const result = await this.redis.eval(` local key = KEYS[1] local now = tonumber(ARGV[1]) local window = tonumber(ARGV[2]) local limit = tonumber(ARGV[3]) local burst = tonumber(ARGV[4]) -- Remove expired entries redis.call('ZREMRANGEBYSCORE', key, 0, now - window) -- Count current requests local count = redis.call('ZCARD', key) -- Check burst limit (recent 1s) local burstCount = redis.call('ZCOUNT', key, now - 1000, now) if burstCount >= burst then return {0, limit - count, burst - burstCount, now + 1000} end if count >= limit then local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES') local retryAfter = oldest[2] + window - now return {0, 0, burst - burstCount, retryAfter} end -- Add current request redis.call('ZADD', key, now, now .. ':' .. math.random()) redis.call('PEXPIRE', key, window) return {1, limit - count - 1, burst - burstCount - 1, 0} `, 1, key, now, windowMs, config.limit, config.burstLimit); const [allowed, remaining, burstRemaining, retryAfter] = result as number[]; return { allowed: allowed === 1, remaining, burstRemaining, retryAfter: retryAfter > 0 ? Math.ceil(retryAfter / 1000) : 0, limit: config.limit, }; } } ``` --- ## Audit Logging ```typescript // Comprehensive audit trail interface AuditEvent { id: string; timestamp: Date; // Tenant context orgId: string; workspaceId: string; userId: string; // Event details action: AuditAction; resource: AuditResource; resourceId: string; // Request context requestId: string; ipAddress: string; userAgent: string; // Change tracking before?: Record; after?: Record; // Outcome status: 'success' | 'failure' | 'denied'; errorCode?: string; errorMessage?: string; } type AuditAction = | 'create' | 'read' | 'update' | 'delete' | 'login' | 'logout' | 'token_refresh' | 'export' | 'import' | 'share' | 'unshare' | 'invite' | 'remove' | 'skill_execute' | 'memory_recall' | 'quota_exceeded' | 'rate_limited'; type AuditResource = | 'user' | 'session' | 'conversation' | 'memory' | 'skill' | 'agent' | 'workspace' | 'organization' | 'api_key' | 'webhook'; // Audit logger with async persistence class AuditLogger { private buffer: AuditEvent[] = []; private flushInterval: NodeJS.Timeout; constructor( private storage: AuditStorage, private config: { batchSize: number; flushMs: number } ) { this.flushInterval = setInterval(() => this.flush(), config.flushMs); } async log(event: Omit): Promise { const fullEvent: AuditEvent = { ...event, id: crypto.randomUUID(), timestamp: new Date(), }; this.buffer.push(fullEvent); if (this.buffer.length >= this.config.batchSize) { await this.flush(); } } private async flush(): Promise { if (this.buffer.length === 0) return; const events = this.buffer.splice(0, this.buffer.length); await this.storage.batchInsert(events); } async query(filter: AuditFilter): Promise { // Ensure tenant isolation in queries if (!filter.orgId) { throw new Error('orgId required for audit queries'); } return this.storage.query(filter); } } ``` --- ## GDPR Compliance ### Data Export ```typescript // Personal data export for GDPR Article 15 class DataExporter { async exportUserData( orgId: string, userId: string ): Promise { const export = { metadata: { userId, orgId, exportedAt: new Date(), format: 'json', version: '1.0', }, data: {} as Record, }; // Collect all user data across contexts const [ profile, sessions, conversations, memories, preferences, auditLogs, ] = await Promise.all([ this.exportProfile(userId), this.exportSessions(userId), this.exportConversations(userId), this.exportMemories(userId), this.exportPreferences(userId), this.exportAuditLogs(userId), ]); export.data = { profile, sessions, conversations, memories, preferences, auditLogs, }; // Generate downloadable archive const archivePath = await this.createArchive(export); // Log export for audit await this.auditLogger.log({ orgId, workspaceId: '*', userId, action: 'export', resource: 'user', resourceId: userId, status: 'success', }); return { downloadUrl: await this.generateSignedUrl(archivePath), expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000), // 24h sizeBytes: await this.getFileSize(archivePath), }; } } ``` ### Data Deletion ```typescript // Right to erasure (GDPR Article 17) class DataDeleter { async deleteUserData( orgId: string, userId: string, options: DeletionOptions = {} ): Promise { const jobId = crypto.randomUUID(); // Start deletion job (may take time for large datasets) await this.jobQueue.enqueue('data-deletion', { jobId, orgId, userId, options, }); return { jobId, status: 'pending', estimatedCompletionTime: await this.estimateCompletionTime(userId), }; } async executeDeletion(job: DeletionJob): Promise { const { orgId, userId, options } = job.data; // Order matters: delete dependent data first const steps = [ { name: 'sessions', fn: () => this.deleteSessions(userId) }, { name: 'conversations', fn: () => this.deleteConversations(userId) }, { name: 'memories', fn: () => this.deleteMemories(userId, options.preserveShared) }, { name: 'embeddings', fn: () => this.deleteEmbeddings(userId) }, { name: 'trajectories', fn: () => this.deleteTrajectories(userId) }, { name: 'preferences', fn: () => this.deletePreferences(userId) }, { name: 'audit_logs', fn: () => this.anonymizeAuditLogs(userId) }, // Anonymize, not delete { name: 'profile', fn: () => this.deleteProfile(userId) }, ]; for (const step of steps) { try { const result = await step.fn(); await this.updateProgress(job.id, step.name, 'completed', result); } catch (error) { await this.updateProgress(job.id, step.name, 'failed', error); throw error; // Fail job, require manual intervention } } // Final audit entry (anonymized user reference) await this.auditLogger.log({ orgId, workspaceId: '*', userId: 'DELETED_USER', action: 'delete', resource: 'user', resourceId: userId.slice(0, 8) + '...', status: 'success', }); } } ``` --- ## Consequences ### Benefits 1. **Strong Isolation**: RLS + namespace isolation prevents cross-tenant access 2. **Compliance Ready**: GDPR, SOC2, HIPAA requirements addressed 3. **Scalable Quotas**: Per-tenant resource limits enable fair usage 4. **Audit Trail**: Complete visibility for security and compliance 5. **Flexible Auth**: OAuth2 + API keys support various use cases ### Risks and Mitigations | Risk | Probability | Impact | Mitigation | |------|-------------|--------|------------| | RLS bypass via SQL injection | Low | Critical | Parameterized queries, ORM only | | Token theft | Medium | High | Short expiry, refresh rotation | | Quota gaming (multiple accounts) | Medium | Medium | Device fingerprinting, email verification | | Audit log tampering | Low | High | Append-only storage, checksums | --- ## Related Decisions - **ADR-001**: Architecture Overview - **ADR-003**: Persistence Layer (RLS implementation details) --- ## Revision History | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0 | 2026-01-27 | RuVector Architecture Team | Initial version |